Geo Tests: Concepts and Overview

Getting Started

Geo testing is at the root of incrementality measurements. By performing real world experiments, Measured can say which tactics are actually leading to changes in your marketing metrics with high accuracy. We encourage all clients to run their own geo tests to get precise insights into the effectiveness of their media spending.

This article covers the geo testing concepts and methodology.

Click here for how-to instructions on setting up your own geo test.

Click here to learn how to interpret your geo test results.

What Is Geo Testing?

Geo testing at Measured is primarily used to help you understand how much incremental value a tactic adds to your business so that you can optimize your future spending with confidence.

Metrics become incremental when they represent events that rely on a specific prior event happening first. For instance, say there was a sale that wouldn’t have happened if the customer hadn’t seen a certain piece of media first. Because media exposure had a part in that sale taking place, that sale is incremental. 

Geo testing takes the above example and applies it broadly across multiple markets. By applying test conditions to some markets and leaving others unaffected, conclusions can be drawn as to the effectiveness of media spend that would not be possible otherwise.

Click here to learn about Measured's Incrementality Model.

How Does Geo Testing Work?

Geo refers to the geographic areas (i.e. markets) that are being tested. These can represent a state, city, zip code, or more specific designated market area (DMA), but no more than one type within the same test.

Tests can run for as long as 8 weeks, but 4 weeks (excluding the test’s start and end dates) is often enough to draw conclusions. Conversions are measured on a weekly basis. Results of the test will refer to the entire test, and not a specific area targeted as part of the test.

  • Holdout tests take existing campaigns and purposefully withhold media spending in certain areas within it. Results are compared against a realistic prediction of what would have happened without test conditions.
  • Scale tests take an existing campaign and scale the budgets upwards in certain areas to test their performance against unaffected areas.

Automatic & Manual Tests

There are two types of test execution: automatic and manual. Automatic tests handle necessary actions in your ad platforms for you. This differs by type of test.

Note: We will do our best to ensure that test setup can be done automatically, but please be aware that factors in your ad platforms beyond our control may lead to you needing to take steps yourself to edit the campaigns in the test. As we run initial tests to determine conditions, we will let you know as soon as possible which options are available.

For automatic holdout tests:

  • The markets in which media will be withheld will be automatically excluded from your existing campaigns.
  • If you create a new campaign during the test, you will be contacted to make sure that test conditions will be applied, and then the appropriate markets will again be automatically excluded.
  • When the test ends, your campaigns will revert back to normal with no action needed on your end.

For automatic scale tests:

  • The necessary campaigns in the test will automatically be duplicated, and these duplicates will have a budget applied that is scaled for test purposes.
  • The original campaigns will automatically have markets involved in the test excluded.
  • If you create a new campaign during the test, a new scale budget will be provided to you. Once approved, a duplicate will be created and test markets will be excluded in the original campaign.
  • When the test ends, geo exclusions will be removed and your duplicate campaigns created for test purposes will be disabled.

For manual tests, you will need to perform the actions listed above yourself. In that case, your customer success partner will assist you during the process to ensure everything is set up correctly.

How Markets Are Selected


Careful attention is put into which markets will be used for testing purposes. First, markets are identified which contribute roughly 50% of the total conversions or revenue. These markets are used for modeling purposes, and will not have test conditions applied.

After excluding those large markets from consideration for testing, hundreds of combinations of potential test markets are considered. Each of those is then put through hundreds of test simulations to determine its accuracy for testing.

For each simulation, we hold out a part of the period used to train the prediction model and then compare predicted orders to actual orders. This process gives us the model's accuracy.

The simulations also determine important test parameters:

  • The lowest amount that your media can contribute to your overall revenue
  • The length of time your test will need to run
  • The portion of your total orders from each group of testable markets
  • The number of tests you can run simultaneously given the available market groups

After the simulations are complete, we rank the accuracy of all market combinations, and choose the most accurate available for testing (accounting for any other concurrent tests).

Updates to the Market Selection Process

Measured’s market selection process will refresh every three months. With each update, we discard the oldest three months of data and replace it with the most recent three. The steps in the process will otherwise remain the same.

Though this process will not affect your active tests, it may produce new parameters for your upcoming tests. For instance, the minimum length for a test you have scheduled may increase. If changes like this occur, you will be alerted to adjust and reapprove the affected test.

If you attempt to schedule a test that will launch after the next market selection update, you will also be alerted since it is likely that your test will need to be reconfigured before launch. 

During a Geo Test

While a test is running, it’s important to adhere to the conditions of the test. For instance, if someone in an area where a holdout test was running saw media that was supposed to be withheld, it would negatively impact the test results. Be careful not to make extensive changes to your ad accounts or media strategy during a test period.

It is also recommended to not run promotions during a test period, as they stimulate additional demand. However, this doesn’t apply to seasonality (i.e. Christmas) since there’s also a natural increase in competition — testing during highly seasonal periods is the only way to verify how much effect that season is truly having.

You should also refrain from reducing your normal campaign spending during a geo test. The spending in a regular campaign may increase slightly as it absorbs the budget from the markets being tested. This potential increase will make it more likely to get a good result with an incremental percentage.

The geographical area covered by a test will vary based on the test’s purpose and requirements. We will aim to test on the smallest percentage of your sales that will yield results applicable to the rest of the country.

Monitoring During a Test

While your test is underway, we will keep watch over necessary components of the test’s operation whenever possible. For instance, we will make sure that campaigns are not added or removed from test markets in a manner that would negatively affect the test’s outcome. Other monitored factors differ by type of test, all with the goal of making sure your results are not invalidated.

This monitoring is possible via API connections that we can establish with the following ad platforms:

  • Google Ads
  • Facebook (Meta)
  • Pinterest
  • TikTok
  • Snapchat

For those platforms, mid-test changes will be monitored for both automatic and manual tests and your customer success partner will be alerted as necessary. The key difference is that with automatic tests, the appropriate remedy will happen without action needed from you, while manual tests will require you to take steps yourself in your ad platforms.

Please be aware that platforms beyond those listed above can be included in a geo test, but we do not have the capability to monitor them in the same fashion. Because of this, it is crucial to be aware of test conditions and keep your customer success partner informed of any changes to your campaigns that happen during the test.

After a Test Is Complete

When your test results are available, there will be two distinct categories:

  • Predicted results are what can be assumed to have happened if test conditions were not implemented. These are based on the past two years of data for both your conversions and the tested markets. Lasso regression modeling is used to predict the conversions for the markets in your test by basing them on the sales patterns of the initial markets your test was modeled on. This lets us accurately project what would continue to happen under regular circumstances.
  • Actual results are what we know to have happened because test conditions were implemented. These come from the transaction data provided by your integrated platforms (i.e. Shopify). It's important to note that we count all transactions from this data, not just what your platform is reporting to you.

Note that your results may change up to 7 days after the test ends as we allow any lagging vendor-reported conversions that are attributed to ads from the test to come in.

For a holdout test, if predicted results are higher than the observed results, this proves the channel is having an incremental impact (seen below). Customers did not see your media, so they did not make a purchase.

Scale tests are the exact opposite. If observed results are higher than predicted results, that’s when we know a channel is making an impact. If predicted results are higher, the results are inconclusive.

Your test results will provide two important metrics which are closely related but ultimately distinct:

  • Contribution is the percent of total sales or orders that your media is responsible for
  • Incrementality is the portion of vendor-reported conversions that are caused by media exposure

Since the test markets are precisely matched with similar markets where test conditions weren't implemented, we know the contribution in your results is representative of these untested markets as well.

While contribution can be determined by your basic transaction data, incrementality is more complex since it involves finding the true results from your ad platforms, distilled from what they’ve already reported to you. In most cases, you'll see incrementality below 100%, meaning your integrated platform has over-reported your true number of conversions. If it is above 100%, the platform is instead under-reporting.

If your results show an incremental impact, you can act on this discovery by taking your new incrementality coefficient and applying it to similar channels on your Cross-Channel Dashboard. Reach out to your customer success partner if you have any questions about this process. 

Multi-Tactic Testing

What Are Multi-Tactic Tests?

In addition to testing a single tactic, multi-tactic testing is also available. The process for designing and running a multi-tactic test is the same as for any single-tactic test, which you can learn about here.

The key difference from a single-tactic test lies in the results and how they are processed. Initially, test results will show the aggregated contribution for all of the tactics involved. After that, the Measured Incrementality Model is used to uncover insights on an individual tactic level.

The algorithm we apply uses the combined knowledge of every geo test Measured has run to find the relative differences between the tactics in a multi-tactic test. Those differences are then applied to the initial test results, leading to the precise calibration for each tactic involved. Results will be shared by your customer success partner.

Why Use Multi-Tactic Tests?

As black-box campaign types in ad platforms become more prominent (i.e. Google PMAX and Meta’s Advantage+), testing individual campaign types (or tactics, at Measured) becomes less reliable. This is because those tactics are now competing with each other for the same audience and inventory.

Multi-tactic testing is a solution for this issue. By leveraging the thousands of geo tests we’ve already run on both these new tactics and other tactics, we’ve developed a methodology that allows reliable measurement of multiple tactics at the same time, leading to tactic-level insights that marketers can confidently use for decision-making.

FAQs

Why is geo testing preferred over other kinds of tests?

Geo testing uses transactional data instead of user-level tracking, so it’s not influenced by attribution or privacy settings in external ad platforms. This lets testing methods be applied uniformly across all tactics and channels.

How confident can I be in the results I get?

We aim for at least a 95% correlation between the test markets and the initial market that we use to design the specifics of your test. This gives us extreme confidence in the generalizations we make when applying patterns in your results across the entire country.

What if my test results tell me that my media isn’t having an impact?

A negative result is possible, and this is why testing is necessary. This information is still actionable in terms of planning your future media spending. Measured uses a data threshold to ensure confidence in the results, and if we sense results will be negative before that threshold is met, we will recommend extending the test duration.

Why are only some of my tactics testable?

To run a holdout test, a tactic must meet the threshold for the minimum impact it’s already having prior to testing. For scale tests, we require at least six months of consistent spend data on the tactic, and the tactic must be scalable. If those conditions are not met, the tactic is ineligible for testing.

How much of my total sales will factor into a test?

The markets included in a test can account for 7-40% of national sales, approaching the higher end only if we need that data to draw conclusions. The percentage may vary over the test’s duration, but it doesn’t affect our confidence in applying insights on a national level.

How are factors that impact marketing performance accounted for in my tests?

Factors like creative content and fatigue, ad frequency, and audience saturation are all accounted for in the performance data used for your test designs. By basing tests on a broad timeframe of your data, patterns that are unique to your business get highlighted. They will impact the ROAS we based your test design on.

Why do some of my test's parameters change over time?

Every quarter, we update our prediction model with the last two years of your transactional data. Your newest quarter's worth of data is brought in while the oldest one is removed. This will not affect active tests. If there is an impact on your upcoming scheduled tests, you will be alerted to review the changes.


How did we do?


Powered by HelpDocs (opens in a new tab)