All Collections
Data Warehouse Sharing
Best practices when setting up a Data Warehouse Share in Funnel
Best practices when setting up a Data Warehouse Share in Funnel

Data warehouse best practices, data warehouse FAQs

Ishan Shekhar avatar
Written by Ishan Shekhar
Updated over a week ago

Before we go into the best practices it is important to understand the nature of the data that is collected from different marketing platforms. For example, most of the advertising platforms already have aggregated their metrics to a particular day, Google Ads gives the number of clicks on that day instead of each click so the data is not raw but aggregated per day. Since the metrics are already aggregated, the value may occasionally be updated one or more days after a click occurs. This data takes a few days to finalize due to conversions occurring days after the initial click and invalid traffic that needs to be removed.

Below are the things to keep in mind when configuring Shares

  • Use case: As a digital marketer, start by defining the business question which you want to answer, then share only the fields which are required. If your use case is small with 5-10 business questions then you can probably include all your fields in one Share.
    But, if the scope of the business need is wide then it is better to divide these Shares into different business areas. A business intelligence analyst might split their Shares by source types, to follow their organizational data warehouse architecture.

  • Stakeholders: Think about which teams/processes in your organization will be dependent on the shared data. If changes performed by one team to the Share configuration can affect the other team, it might be a better idea to split them into two different Shares.

  • Data enrichment: If you just want to enrich the data already available in your data warehouse with marketing data then it might be best to share the data based on platform type. Harmonize some of the fields in Funnel for platforms like Facebook ads, Google Ads, Tik-Tok, X Ads, and share them to different tables. Later on, this data can be joined using Date.

For companies that want to build robust and reliable data analytics, they must take time to carefully plan on how to organize their Shares and what to include in them. For a BI team, it is critical to understand which metrics are coming from which sources for better data governance which can avoid future headaches.

When you have i) growing business needs with multiple KPIs to track, ii) data is managed by another team, and iii) reporting is centralized it’s almost always a good idea to split your Data Shares into platform types and share them into different tables.

Some of the advantages when this is done right:

  • General Sanity: Building an analytics stack is already a complex procedure, nobody wants to make it even harder by adding fields that may/may not be used, but still have to be maintained.

  • Dependence: If all my fields are in one table, then all the processes which are dependent on Funnel data will be affected. For example, if the Marketing team wants to change a field that is only for a specific use case, it might affect the bigger analytics team pipeline/workload (which might take hours to run)

  • Size and Cost: Modern data warehouses are built to handle huge amounts of data, but adding general inefficiencies at an early stage can be costly over a long period. For example, a table with 1000 columns for the same date range will take more space than 10 tables with 100 columns each for the same data range.

  • Reduce noise: When it comes to data, less is often more. It’s not about the quantity of data shared, it’s about sharing good quality, meaningful data that key business decisions can be based on.

Frequently asked questions regarding Funnel Shares

When is a data part exported and when is it skipped?

Funnel makes sure that whenever you look at the shared data you get the correct picture of what is happening across different sources which are connected i.e. a single source of truth. Our Data Shares by default will update time ranges where data has changed.

Changes that may cause a part to be updated:

  1. Adding or editing a Data Source (where we typically collect 2 years of historical data)

    1. If the Data Share contains a field that could contain data from that Data Source

    2. If the Data Share is not actively excluding data from that Data Source or Data Source type with a filter

  2. A Data Source contains updated data for the interval

    1. Same conditions as #1

  3. Changes in dimension/metric rules within Funnel for a field used in the Data Share (or the field uses lookup on a Data Source)

  4. Manually updating the Data Share configuration

  5. Manually updated exchange rates in Funnel

  6. Changing workspace locale and including a non-ISO date dimension in the Data Share

  7. Manually clicking "Force run all parts" for a Data Share

When a change is detected we will process the data and compare it against another layer of cache and if it has not been changed it will not be re-uploaded (unless you are sharing the data to a new location).

All this logic means that you can trust that the files/tables that have been updated in your warehouse has updated rows. You therefore don't have to waste money and compute on processing identical files. We also write what files were updated in a Summary file or table, see the answer concerning that below.

How-to share data that has changed since the last run (deltas)?

In general, it's not possible to identify individual records that change, but only time ranges. The reason is that depending on their data source configuration, data is already aggregated from the source e.g. Facebook Ads data by date and campaign.

But, If you would like to get only the data that has changed from the last run, then you have to implement that logic in your data pipeline. This can be achieved in multiple different ways depending on your ingestion pipeline. The two most common ways to achieve this is

  1. Include a RunId when setting up a Data Share to a file storage, that creates unique files for each run. Then you can compare the snapshots against each other.

  2. Let Funnel continuously update the data, then compare the latest Data Share data with what you have in your data warehouse to get the difference.

Both these approaches have their limitations and require considerable engineering work, using a direct Data Share to your data warehouse from Funnel can help mitigate this.

Where can I get the primary keys to the data rows?

Most of the advertising platforms already have aggregated their metrics to a day, For example, Google Ads gives the number of clicks on that day instead of each click so the data is not raw but aggregated on a day from the source. There are no real "primary keys" in this data and a single logical "conversion" may move both between dates and between ads/campaigns in their reporting as it evolves until the data is settled.
If you’d like, you can combine different dimensions to create a unique key that could be possible to identify a set of observations, though make sure to carefully select all the fields, as this process is error-prone and could lead to wrong results.

My data is shared with one table per Data Source type, how do I join them?

It is a best practice to harmonize fields like "campaign" and "impressions" across their sources in Funnel before sharing those, but if sharing a large number of fields we recommend having separate Data Shares with native fields for each platform. After sharing them into different tables this data can be joined using date, as that field would be common across all the Shares.

Which files/tables have changed since the last run?

If you are using Funnel’s file storage options such as Google Cloud Storage, Amazon S3, Azure Blob Storage, and SFTP. To enable better data cataloging, pipeline management, and error handling Funnel provides an option to provide a summary file with a few options of file types such as JSON, CSV, and TXT. The summary file contains the information about the files that were shared and if they were updated or not since the last run.
This information can be used to design a data pipeline, where the next step of the pipeline can be triggered based on the summary file. The summary file will always be written last so it's possible to listen to changes to this file to start your pipe. Read more in our help article here.

If you are using a BigQuery or Snowflake Data Share you can choose to create a summary table containing the equivalent information.
Read more about the BigQuery Summary table here.

Hope this article helps you to plan and optimally configure your data Shares!

Did this answer your question?