Overview

Databricks is built in your Azure, Google or AWS environment, all of which we support and can work with data in their respective warehouses. The examples below are for when you set it up in AWS and store the files in S3.

We refer to Databricks for any up to date instructions about how to setup this on their side.

Setting up the S3 Export

Make sure to select CSV as the Field format and have data/ in the beginning of the File name template to get all the CSV data files in their own folder. This example also assumes that you don't set Header format to none in Edit field format.

Our entire help article can be found here.

Importing data from S3 into Databricks

You can set up a default delta table in Databricks like this:

CREATE TABLE IF NOT EXISTS example_table;

You can then do a one time import of the files like this:

COPY INTO example_table
FROM 's3://funnel-s3-export-bucket/path/data'
FILEFORMAT = CSV
FORMAT_OPTIONS ('header' = 'true', 'escape' = '"')
COPY_OPTIONS ('mergeSchema' = 'true')

And preview it like this:

SELECT * FROM my_table LIMIT 20;

The data can now be found in the Databricks Data Explorer, but this was only a one time import.

Setting up a pipeline from S3 into Databricks

We refer to Databricks documentation for setting this up but if you are using an Auto Loader for S3 in SQL, it will look something like this.

CREATE STREAMING LIVE TABLE streaming_table
AS
SELECT
  *
FROM
  cloud_files(
    's3://funnel-s3-export-bucket/path/data',
    'csv',
    map(
      'header', 'true',
      'escape', '"'
      'cloudFiles.inferColumnTypes', 'true',
    )
  )

And then use that notebook as the source code when creating the continuous pipeline.

Exporting Funnel data to Amazon S3

Amazon S3 bucket configuration

What is the export summary file?

Query S3 data using Athena with automatic schema update (Glue)

Exporting to MongoDB Atlas

Exporting to Databricks

Overview

Setting up the S3 Export

Importing data from S3 into Databricks

Setting up a pipeline from S3 into Databricks