Before we go into the best practices it is important to understand the nature of the data that is collected from different marketing platforms. For example, most of the advertising platforms already have aggregated their metrics to a particular day, Google Ads gives the number of clicks on that day instead of each click so the data is not raw but aggregated per day. Since the metrics are already aggregated, the value may occasionally be updated one or more days after a click occurs. This data takes a few days to finalize due to conversions occurring days after the initial click and invalid traffic that needs to be removed.
Below are the things to keep in mind when configuring Shares
Use case: As a digital marketer, start by defining the business question which you want to answer, then share only the fields which are required. If your use case is small with 5-10 business questions then you can probably include all your fields in one Share.
But, if the scope of the business need is wide then it is better to divide these Shares into different business areas. A business intelligence analyst might split their Shares by source types, to follow their organizational data warehouse architecture.
Stakeholders: Think about which teams/processes in your organization will be dependent on the shared data. If changes performed by one team to the Share configuration can affect the other team, it might be a better idea to split them into two different Shares.
Data enrichment: If you just want to enrich the data already available in your data warehouse with marketing data then it might be best to share the data based on platform type. Harmonize some of the fields in Funnel for platforms like Facebook ads, Google Ads, Tik-Tok, Twitter, and share them to different tables. Later on, this data can be joined using Date.
For companies that want to build robust and reliable data analytics, they must take time to carefully plan on how to organize their Shares and what to include in them. For a BI team, it is critical to understand which metrics are coming from which sources for better data governance which can avoid future headaches.
When you have i) growing business needs with multiple KPIs to track, ii) data is managed by another team, and iii) reporting is centralized it’s almost always a good idea to split your Shares into platform types and share them into different tables.
Some of the advantages when this is done right:
General Sanity: Building an analytics stack is already a complex procedure, nobody wants to make it even harder by adding fields that may/may not be used, but still have to be maintained.
Dependence: If all my fields are in one table, then all the processes which are dependent on Funnel data will be affected. For example, if the Marketing team wants to change a field that is only for a specific use case, it might affect the bigger analytics team pipeline/workload (which might take hours to run)
Size and Cost: Modern data warehouses are built to handle huge amounts of data, but adding general inefficiencies at an early stage can be costly over a long period. For example, a table with 1000 columns for the same date range will take more space than 10 tables with 100 columns each for the same data range.
Reduce noise: When it comes to data, less is often more. It’s not about the quantity of data shared, it’s about sharing good quality, meaningful data that key business decisions can be based on.
Frequently asked questions regarding Funnel Shares
How-to share data that has changed since the last run (deltas)?
Funnel makes sure that whenever you look at the shared data you get the correct picture of what is happening across different sources which are connected i.e. a single source of truth. Our Shares by default will update time ranges where data has changed. A change in shared data may be triggered by the addition of (i) a new data source (where we typically collect 2 years of historical data), (ii) updated data in a data source (within the look-back window), or (iii) changes in dimension/metric rules within Funnel that affect their data.
In general, it's not possible to identify individual records that change, but only time ranges. The reason is that depending on their data source configuration, data is already aggregated from the source e.g. Facebook Ads data by date and campaign.
But, If you would like to get only the data that has changed from the last run, then you have to implement that logic in your data pipeline. This can be achieved in multiple different ways depending on your ingestion pipeline. The two most common ways to achieve this is a) Include a RunId when setting up a file Share, that creates unique files for each run. b) Think Funnel data is continuously updated, then compare the latest Share data with what you have in your data warehouse to get the difference.
Both these approaches have their limitations and require considerable engineering work, using a direct Share to your data warehouse from Funnel can help mitigate this.
Where can I get the primary keys to the data rows?
Most of the advertising platforms already have aggregated their metrics to a day, For example, Google Ads gives the number of clicks on that day instead of each click so the data is not raw but aggregated on a day from the source. There are no real "primary keys" in this data and a single logical "conversion" may move both between dates and between ads/campaigns in their reporting as it evolves until the data is settled.
If you’d like, you can combine different dimensions to create a unique key that could be possible to identify a set of observations, though make sure to carefully select all the fields, as this process is error-prone and could lead to wrong results.
My data is shared by each table per source type, how do I join them?
It is a best practice to harmonize fields like "campaign" and "impressions" across their sources in Funnel before sharing those, but if sharing a large number of fields we recommend having separate Shares with native fields for each platform. After sharing them into different tables this data can be joined using date, as that field would be common across all the Shares.
Which files/tables have changed since the last run?
If you are using Funnel’s file storage options such as Google Cloud Storage, Amazon S3, Azure Blob Storage, and SFTP. To enable better data cataloging, pipeline management, and error handling Funnel provides an option to provide a summary file with a few options of file types such as JSON, CSV, and TXT. The summary file contains the information about the files that were shared and if they were updated or not since the last run.
This information can be used to design a data pipeline, where the next step of the pipeline can be triggered based on the summary file.
If you are using a BigQuery Share you can choose to create a summary table containing the equivalent information.
What does "Only share periods with updated data" mean?
To simplify it a bit we mark each day from each source as ‘clean’ or ‘dirty’ when sharing, we then share data from the first of the ‘dirty’ days. Since we’re sharing monthly files, this, in turn, results in all months from the month with the first ‘dirty’ daily source file. We have an article describing it here.
This option is enabled by default for new Shares and means that Funnel will try to reduce the number of files that are shared. It works like this:
If account configuration is changed (e.g. custom dimensions or metrics, new data sources) all data is shared.
If the Share configuration is changed all data is shared.
Otherwise, start from the month that contains the oldest new data, so if Adwords data has changed in Jan 2017, shared data will start from 2017-01-01.
Hope this article helps you to plan and optimally configure your data Shares.