All Collections
Data Connectors
General info on data sources
Retrieval and storage of data from each source
Retrieval and storage of data from each source

Why does Funnel store the data it downloads, and why does Funnel manage the schedule of the downloads?

Ilona Norman avatar
Written by Ilona Norman
Updated over a week ago

Funnel integrates a large number of different source platforms, with varying characteristics and limitations. The biggest category of Funnel source platforms are "online advertising data", such as Facebook Ads and Google Ads, and the following will focus on that perspective. For these sources Funnel has opted to download data on a schedule and to store the data internally before serving it to Funnel customers. As Funnel keeps expanding into other areas different choices may be made for those integrations in the future. 

Typical properties of online advertising platforms

Online advertising data have some typical properties that are important to understand as they come with some important implications for how to design a service like Funnel.

1) The data is "eventually consistent" by nature

E.g the final advertising cost for each day may not be settled until several days later due to the cost being calculated based on the number of orders, removal of fees for fraudulent clicks, volume discounts etc. Order attribution is also frequently attributed to the time of the last click, where a window of up to 30 days between the click and order is not uncommon. This means getting the data for "yesterday" is usually an early estimation, rather than the final numbers.
One example of this is how Google describes the expected data freshness for Google Ads data.

2) APIs for online advertising platforms often have strict rate limits

Rate limits (e.g requests per second) and quotas (requests per day) typically apply to the individual advertising account and/or user used for connecting the advertising account. In addition to this, rate limits and quotas apply to the application making the requests for data; Funnel, in this case. In order to retrieve the data for a single "report" a large number of API requests may be needed as there are limits to the number of columns and rows retrieved with each request.

3) Online advertising APIs are often asynchronous

The initial request will "order" a report, that becomes available a few seconds (best case), or (more commonly) minutes later.

4) There are always issues of instability

While large platforms (Google, Facebook, ...) have excellent availability, the internet is always a source of intermittent problems, and smaller platforms may have a harder time providing a very high uptime.

5) Lack of version controlled APIs

Many smaller advertising platforms do not strictly version control their APIs or notify customers and third parties like Funnel when they perform breaking changes. Funnel spends considerable resources on monitoring and adjusting the connections to this type of sources.

6) Credentials are prone to breaking

Credentials for API access frequently "break". A lot of times Oauth or other tokens are invalidated based on time, or when the connecting user changes their password. While Funnel can notify customers of the need to reconnect, with large customers connecting hundreds, or thousands of sources at any given time there is a risk of not having current credentials. In addition, for large advertisers the people holding the updated credentials may be spread out over several departments or even time zones.

7) Data granularity differs wildly between platforms

While some platforms offer very granular data, and a large set of breakdowns (age group, country, device ...) a lot of smaller platform do not, and common strategies include "encoding" these in other fields. An example of this is to include codes in campaign names, e.g "SE_FURNITURE_myrealcampaign" for a campaign running in Sweden and promoting furniture.

Design implications of these properties

With the above characteristics in mind, Funnel has opted to download data from the source platforms on a schedule that is adapted to each platform. And to store the data in the Funnel AWS environment before serving or exporting the data to the final destinations.

Benefits with storing data

  • Managing quotas, rate limits, retries of temporary and API-version errors is easier with a scheduled download. If Funnel would only fetch data “on-demand” the risk of not being able to retrieve the data at all from the source when it’s needed would be much higher.

  • Managing eventual consistency by retrieving data in sliding time windows of N days of updates based on the characteristics of each source platform

  • Managing availability of data where the credentials are not up-to-date. Without a scheduled download all credentials (often hundreds per Funnel customer) must be valid each time data is requested. E.g customers trying to view their data may have to call colleagues and ask them to re-authenticate before they can run their report.

  • Allowing the "business users" in each customer organization to manage the rules of decoding their campaign naming strategy. E.g "begins with SE" means Sweden, and performing augmentation and (depending on destination) grouping/aggregation based on those rules. Which is easier and faster to do on stored data as rule changes can be applied without resulting in new downloads. Rule changes are frequent especially while “experimenting”, doing the initial setup, or adapting to changes in naming and tagging strategy in the source data.

  • Serving the same data to multiple destinations does not consume more source platform quota. E.g a Funnel customer using the Funnel Connector for Google Sheets in addition to having a feed to their Data Warehouse will not risk starving the Data Warehouse.

  • Serving the same data to many end users of on-demand reports (like Google Data Studio, or Funnel Dashboards) does not consume more source platform quota. I.e lower risk of Google Data Studio not being able to get data at the end of the working day due to colleagues already running too many reports.

  • Increased availability of the whole dataset at the technical level. Funnel 99.95 vs (100-(N*99.XY)) for N source locations, where N is often > 10. 

Downsides with storing data

  • An added delay of reporting data of some 1-4 hours depending on the source platform. Note again that the most recent data is often a low quality approximation even at the source, and many platforms recommend not retrieving data for "yesterday" until several hours into the the day.

  • Funnel is potentially doing unnecessary up-front work in collecting data that no-one will ask for or look at

  • Up-front decisions/configuration of what breakdowns and metrics are needed may be necessary at the time the datasource is connected to Funnel instead of on-demand when the report is requested

For the above reasons we believe the foundation for Funnel will continue to be based on scheduled retrieval and storing data, while the future evolutions may also include more flexible options and hybrid solutions.

Did this answer your question?