Geo Tests: Reading Your Results
Getting Started
After your geo test has completed and your results have been finalized, you’ll get comprehensive results detailing all of the test’s findings.
This article covers interpreting your geo test results.
Click here to read about the concepts and terms involved with geo testing.
Click here for how-to instructions on setting up your own geo test.
Accessing Your Results
When your test’s results are ready, you can view them by going to My Tests section in the bottom of the main Geo Tests page and clicking the Finished tab.
Here, you will see an archive of all of your completed geo tests, alongside some high-level details. Clicking either the test’s name or the Detail arrow at the right of its row will take you to a more detailed view.
Note: If your test has ended but the results are still being processed, it will still appear under the Active tab with the label Results Pending.
Test results can also be downloaded as PDF via the link at the bottom-right of any test's results page.
Business Impact
The first section of your results is Business Impact, which shows details about primary metrics that your test has uncovered. It is divided into three main sections, along with a recommendation for how to optimize your future budgeting based on the test’s findings.
Conversions
In the first panel, you’ll see the actual conversions that the test yielded versus the amount our data science anticipated via its prediction model.
- Actual results are what we know to have happened because test conditions were implemented. These come from the transaction data provided by your integrated platforms (i.e. Shopify). It's important to note that we count all transactions from this data, not just what your platform is reporting to you.
- Predicted results Predicted results are what we project would have happened without test conditions. These are based on the past two years of sales data for both the markets in your test and the markets your test was initially modeled on. This timeframe lets us account for regular seasonal patterns in your annual business (i.e. Christmas).
Click here for more on how markets are selected and modeled for your geo tests.
For a holdout test, if predicted results are higher than the observed results, this proves the channel is having an incremental impact. Customers did not see your media, so they did not make a purchase.
Scale tests are the exact opposite. If observed results are higher than predicted results, that’s when we know a channel is making an impact. If predicted results are higher, the results are inconclusive.
Confidence Levels
The confidence level in your test results measures how likely the positive effect is attributed to your media and not random chance. Generally, higher levels come from tests that show the tactic having a high contribution to overall business on a consistent, week-by-week basis during the test.
Measured combines multiple factors to consistently achieve high confidence scores in test results:
- Industry-leading market selection modeling
- Accurate inputs from your business's MIM
- Pre-verified test designs
Those elements allow for accurate testing in as little as 10-20% of the country, so high confidence tests can be run without a significant disruption to your business.
Historically, the average confidence level for holdout tests with positive effect is 87%, and for scale tests it's 80%. However, even tests with lower confidence levels or without positive effect are just as valuable. The findings those tests provide will also be able to calibrate your media mix model in the future.
Calculating Confidence
Measured uses industry-standard methods to find your test's confidence level. For any test, there are initially two different scenarios that are run against each other:
- A baseline hypothesis that assumes there will be no change from normal business during the test
- A positive hypothesis that assumes test conditions will cause a change in the number of transactions
The goal is to prove the baseline hypothesis wrong. For each test, our platform reports the probability that the results we found would still happen under baseline conditions. The lower that probability is, the less likely the results are due to random chance.
The confidence level you see with your results is calculated as 1 minus that probability level. For instance, if a 3% probability was found that your results would still have occurred without the test, it would be 1 minus 3%, or a confidence level of 97%.
Incremental ROAS and CPO
In the third panel, your test results are summarized in a key high-level metric: incremental ROAS or CPO. You can switch between the two via the toggle in the upper-right of the Business Impact section.
The main metric you see here is based on the incrementality percentage found in the test. Below that, you’ll see the test’s metric versus what was arrived at via the previous incrementality percentage.
It is important to take this updated metric within the context of your entire portfolio. Note that the Media Plan Optimizer will take your new test results into account and automatically help you create a better budget plan across your tactics for future spending.
Spending
The Spending section lays out a simple view of how much your test cost to run.
- Holdout tests will always have a scale cost of zero, since they are based on withholding media instead of testing an increased budget.
- Scale tests will show the amount that budgets were increased to test the effectiveness of higher spending for your tactic.
How It Works
To see details on how your test results were calculated, the How It Works section gives thorough breakdowns of key factors.
Test Adjustment Factor
The top section shown above walks you through the math of how your test’s incrementality was determined. This is applied to the non-incremental versions of your metrics to produce the true, accurate number.
There are multiple steps to this equation:
- First, contribution is determined as a percentage by taking the number of orders initially predicted in test markets, then adding or subtracting the actual orders the test observed. That result is then divided by the initial number predicted in test markets.
- Then, incremental conversions are found by multiplying the total actual orders across untested markets by the established contribution percentage.
- Finally, the test adjustment factor is determined by dividing the incremental conversions by the vendor orders across all markets, whether or not they were part of the test.
Since the test markets are precisely matched with similar markets where test conditions weren't implemented, we know the contribution in your results is representative of these untested markets as well.
In most cases, you'll see incrementality below 100%, meaning your integrated platform has over-reported your true number of conversions. If it is above 100%, the platform is instead under-reporting.
Actual vs Predicted Graph
In this section, you will also see a visual breakdown of how the predicted conversions we modeled had lined up with your actual conversions in the year preceding the test. You can interact with the graph by hovering over a point to see the conversions from that time period.
This historical information shows the strength of our data modeling, and how we arrived at the predicted conversions that formed the basis of the test you are viewing the results for.
You may notice that the difference between the actual and predicted lines in the graph is very small. The closeness of those lines reflects the high accuracy of your model.
Calibrating Your MIM
We calibrate your Measured Incrementality Model using your test results and their confidence rating in the context of all your other data and previous tests. This is how long-term value is achieved from your tests.
Note: With the current version of MIM, holdout tests with positive effect are used for calibration.
The impact of a test on your media mix model is determined by two key factors:
- The confidence rating you receive with your results
- How recently the test was run (with newer tests having the most weight)
The higher the confidence rating, and the more recent the test, the greater the impact of the results.
By extension, this means that higher confidence ratings lead to closer alignment between the test result and your calibrated MIM due to the higher impact of the results. Conversely, the lower the confidence score, the less it will weigh into your media mix model.
If your test does not produce a positive effect, it is still valuable. Those results do not necessarily mean your media had no impact. With upcoming MIM updates, tests without positive effect will still calibrate your media mix model, and will often result in less incremental credit attached to the tested media than before.
FAQs
How do you use my sales data to find a test’s predicted results?
Our data science team uses lasso regression modeling to predict the conversions for the markets in your test. The predictions are based on the last two years of sales patterns for the initial markets your test was modeled on.
How much does a tactic need to contribute to be tested?
Geo tests require a substantial impact to measure. Your media will usually need to contribute 1-3% of your overall business, depending on your data. Just remember that a lack of positive results is still a valuable test outcome and with upcoming MIM updates, they will calibrate your media mix model.