All Collections
Data Transformation
How to create a data model to deduplicate data and avoid double-counting
How to create a data model to deduplicate data and avoid double-counting

Learn how to use custom dimensions for working with multiple data sources in Funnel representing the same origin source.

Sven Hamberg avatar
Written by Sven Hamberg
Updated over a week ago

When we have multiple datasets that are different breakdowns of the same underlying data, we need to find a way to ensure we don’t accidentally double-count the values in the datasets. One way to do this is to use Funnel’s custom dimensions to replicate the behavior of many advertising platforms. Facebook Ads, as an example, has a breakdown selector that ensures double-counting of metrics can’t happen since it only allows viewing of one breakdown at a time:

We can produce a similar mechanism in Funnel and apply it to all our data sources with custom dimensions.

A scenario that produces double-counting

Consider this example, where we have multiple data sources in Funnel representing the same origin sources (ad accounts, web analytics views, etc):

Here, the Google Ads account is represented with one data source in Funnel, while the Facebook Ads account and the Google Analytics Views is represented by two data sources; each with different breakdowns of the data (Geo and Device, respectively.

If we don’t take any measures to avoid it, it will be easy to accidentally end up with double-counting of the data in these data sources:

Note that the Google Analytics sessions do not match up. It still double-counts, however. The reason for this is that some dimensions from Google Analytics affect the metric totals, leading to not capturing all sessions from the connected View.

Adapting our data model to avoid double-counting

To make it easy to avoid double-counting, we’ll create custom dimensions that will help us work with a deduplicated dataset. We’ll use a strategy that labels each data source with whether it can be used for a certain analysis.

Starting with a generic dimension for deduplicating data

As a first step, we'll create a custom dimension that will let us filter out duplicate rows of data. We'll call it "Is deduplicated?":

This dimension is configured to label one data source with "Yes" per origin source (ad account, web analytics view, etc) and all others with "No". By applying this when working with our data, we can always rest assured we're not double counting:

Note how each data source only shows up as one row in the table, and the filtered totals are lower than the non-duplicated totals that take all data sources into account. As long as we use this dimension and filter on "Yes", all values except the unfiltered "All" row will be deduplicated:

Creating a Country perspective

In order to also be able to safely work with our country data, we'll create a deduplicated country perspective of the data. We'll create a dimension that labels data with "Yes" when the term “country” is part of their Data Source definition label:

Since this is true for only one data source in Funnel for each origin source we now have a way to ensure we work with deduplicated data for our analysis:

And, by adding a Country breakdown to our query we can safely perform our country analysis without risk of double-counting the totals:

Creating a Device perspective

Applying the same logic to “Device” allows us to do the same for our device analysis. Do note that the number of Sessions from Google Analytics is lower now, due to the reason mentioned above:

And, by adding a Device breakdown to our query we can safely perform our device analysis without risk of double-counting the totals, just like we did for Country:

Use these deduplication dimensions in your analysis tool of choice

As long as we make sure to include these dimensions in our data warehouse exports, Data Studio views, etc, we can apply the same logic there to make sure we don't double-count our data. Just remember to add deduplication dimensions and edit existing ones as you add more data sources to Funnel.

Did this answer your question?