Skip to main content
All CollectionsData Transformation
How to avoid double-counting
How to avoid double-counting

Learn how to use custom dimensions to avoid inflated metric values when working with multiple data sources from the same origin in Funnel.

Peter Lundberg avatar
Written by Peter Lundberg
Updated over a week ago

When we have multiple Data Sources using different breakdowns of the same underlying data, we need to find a way to ensure we don’t accidentally count metrics values multiple times.

The best way to do this is to use Funnel’s custom dimensions to replicate the behaviour of many advertising platforms. For example in the Facebook Ads interface you have to use a breakdown selector to ensure double-counting of metrics can’t happen since it only allows viewing of one breakdown at a time. We can produce a similar mechanism in Funnel and apply it to all our data sources with custom dimensions that work as filters.

A scenario that produces double-counting

Consider this example, where we have multiple Data Sources in Funnel from the same Data Origin (ad accounts, web analytics views, etc.):


Here, the Google Ads account is represented with one data source in Funnel, while the Facebook Ads account and the Google Analytics Views is represented by two Data Sources; each with different breakdowns of the data (Geo and Device respectively).

If we don’t take any measures to avoid it, it will be easy to accidentally end up with double-counting of the data in these data sources:


This is due to the same metric (Facebook Ads Spend) being summed from the same Data Origin twice. Funnel detects this and shows a warning:

How to avoid double-counting using a Dimension Filter

To avoid this we’ll create custom dimensions that will help us work with one breakdown of the data at a time. Thus we're in control of what breakdowns to use in destinations or dashboards.

Creating a Country/Geo perspective

In order to also be able to safely work with our country data, we'll create a deduplicated country perspective of the data. We'll call it "Has Geo?" to make it easy to use in Dimension Filters. We'll label with "Yes" when the term “country” is part of their Data Source definition label:

Since this is true for only one Data Source in Funnel for each Data Origin we now have a way to ensure we work with deduplicated data for our analysis. Adding the new Dimension field in the Data Explorer shows how we have tagged these rows:

Creating a Device perspective

Similar to the above, we should create a "Has Device?" custom dimension to handle looking at the data using the device breakdown. This can be repeated for the different types of analysis you need to do.

Use the Dimension Filters

Adding this as a Dimension Filter avoids the double counting and the warning is no longer shown in Data Explorer:

As long as we make sure to include the filter Dimensions in our data warehouse exports, Looker Studio share, etc, we can apply the same logic there to make sure we don't double-count our data. You can also apply a filter on the share so it applies to all usage in the destination:



Change Management

Working with Custom Dimensions for filtering allows keeping the logic separate from the usage in filters and naming conventions. You should establish a practice to test and edit existing Dimension Filters as you add more Data Sources to Funnel. You may also need to adjust the rules depending on what Data Definition Labels values the Data Sources you use have. Note that Data Definition Labels can change when a connector is updated.

Alternatives

There are multiple strategies on how to setup the filter rules even if the easiest is working with Definition Labels as described above.

Using a Data Source naming convention

For more control or complex setups one can instead use a strategy that leverages a clear naming convention of Data Sources (you can update the name in Funnel). This makes the naming of the Data Source independent of the Dimension filters usage in destinations. Sometimes it also makes sense to forgo the Dimension Filter step and instead work directly with 'Data Source name contains' type filters.

Using one dimension filter

Instead of having a Dimension per analysis ("Has Geo" and "Has Device" above) one can instead create one filter Dimension named for example "Report Type" that can via rules have the value 'Geo' or 'Device'. This collects all the logic in one rule which might be easier to manage depending on your setup.

Using Separated workspaces

One can also work with multiple workspaces and keep Data Sources for one type of analysis in each workspace. This helps if this work is very distinct, but can cause overhead when re-using other set up or wanting to quickly switch between different types of analysis.

Did this answer your question?