What is data sampling?
Dimension cardinality – the number of unique values for a dimension – or the data amount of a data set can lead to too complex queries. To handle this, Google Analytics reads as much as it can until it hits a limit. The read data points are then used to estimate the rest of the data. This happens in the Google Analytics UI as well as in the API. In the UI, for many reports, you will see a yellow shield showing a warning when you hover over it. The warning in the Google Analytics UI will specify the percentage of data points that were read. Other reasons for receiving sampled data is when querying very recent data or historical data. Sampling of data can be avoided by making smaller queries, using fewer or less complex dimensions.
The data amount for the queried date range is one reason your data can be sampled. The thresholds are listed below.
Multi-Channel Funnels reports
All other reports
Analytics Standard: 500k sessions
Analytics 360: 1M sessions
Analytics 360 using resource based quota: 100M sessions
Read more about the thresholds in this Google help article.
Checking for sampling in Funnel
To check if your Google Analytics data is sampled in Funnel, go to Data Explorer in the Funnel app and select the field Samples Read Rate. A percentage of 80% means that the data is sampled and that 20% of your data is based on estimates. A percentage of 70% means that 30% of your data is based on estimates.
If the data is not sampled, you will see the value 100%.
Actions we take
We always ask for the highest precision when querying data, we do this by sending the LARGE or HIGHER_PRECISION (same thing but for different APIs) parameter value. This action will result in slightly slower but more accurate reports.
We split reports that exceed the sampling threshold, into multiple smaller reports. Lowest date range currently supported by GA is one day.
We ask for todays data separately to not get sampled data for nearby dates, this is because intraday queries always can be sampled.
We automatically use resource based quota for GA360 users (not available for the MCF report or for data farther back than 1 year) where we have encountered sampled data.
Actions you can take
Use less complex reports
The number of dimensions in data source and the complexity (cardinality) of the dimensions greatly impacts sampling level. Therefore not selecting unnecessary dimensions helps against sampling.
Using custom tables
If you are a GA360 user you can ask Google Analytics to setup a custom table that includes the dimensions and metrics (and segments) you want to have unsampled. The set of fields and segments should then match what you have selected for the data source you have in Funnel.
Read more about custom tables in this Google help article.