All Collections
Data Warehouse Sharing
Amazon S3 Connector
AWS Lambda Function for Redshift import
AWS Lambda Function for Redshift import

Run the import of Funnel data from S3 to Redshift in your own VPC

Ilona Norman avatar
Written by Ilona Norman
Updated over a week ago

Overview

If you would like to access your Funnel data in Amazon Redshift, you can do so by setting up an S3 export and importing the exported file into Redshift. Funnel's Data Warehouse connector can export all your data to S3, so once you have a file in your bucket all you need to do is to configure a Lambda to periodically import the data into Redshift.

Setup Redshift

Once your export has run, you'll find a number of files in your S3 bucket, with names such as funnel_2018_01.tsv and so on, one file per month. Along with these you'll also find a file called schema.funnel_data_{startYYYY}_{startMM}.sql , which contains the CREATE TABLE syntax for a table that can hold your exported data. Run that query manually in Redshift and then continue to set up you Lambda import function.

Setup Lambda

We have created an example Lambda module that should provide the above for you, all you need to do is setup a Lambda function in AWS. The file redshift-import.zip should be ready to upload to AWS Lambda.

The steps needed in Lambda are

  1. Create a new Lambda function (Blank function)

  2. Configure Lambda:
    Runtime: Node.js 10.x
    Code entry: Upload (redshift-import.zip)
    Environment variables: USER , PASSWORD , HOST , PORT , DATABASE   and SQL  (see below)
    Handler: redshift-import.handler
    Advanced: timeout >= 1 minute (see below)
    VPC: according to your setup to access Redshift

  3. Configure triggers (See below)

Environment variables

USER, PASSWORD, HOST, PORT, DATABASE should be the connection details to Redshift as 5 different environment variables.

SQL should have the full SQL script you would like to run (all on one line). E.g:

set search_path to [SCHEMA]; begin; delete from funnel_data; copy funnel_data from 's3://[BUCKET]/funnel_data_' gzip compupdate off delimiter as '\t' credentials '[CREDENTIALS]'; commit;

Notes 

  1. In this example we have used the TSV format for the S3 export in Funnel, but you might as well use CSV as long as you adjust the delimiter in the above SQL accordingly. You might also need to adjust the Redshift table name and copy file pattern according to how you set up your export (funnel_data and funnel_data_ respectively, in the above example).

  2. If your bucket resides in another region then your Redshift cluster you will have to define region in the copy query (e.g. region 'us-west-2').

  3. The default option for Funnel exports are gzip files. The gzip flag must be removed from the COPY-command if the files are exported without compression.

  4. The COPY-command needs an authorisation parameter added instead of [CREDENTIALS] to access data in S3. You can read more about the different authorisation options here.

Configure Triggers

You may use any of the available Lambda trigger events available in AWS. The most common setups are using a CloudWatch Event with a schedule such as rate(1 day) or an S3 trigger to run the Lambda function when an export completes.
Each time an export is complete a file is created in the same location as the data files. The format of this file can be configured; you can choose between json, csv, or plain text files. The file will be called  summary.funnel_data_{startYYYY}_{startMM}.json or something similar depending on the format you chose.
You can setup an S3 trigger on this file like so:

Timeout settings

The longest runtime you can configure in Lambda is 5 minutes. If you have a lot of data this may not be long enough to complete the COPY operation in Redshift. In that case you will see a timeout message in the Lambda. However, our tests indicate that (with Node.js in Lambda) the query will still run to completion within Redshift, as Lambda/Node will not signal any abort to the database. If you don't care about the timeout message setting the timeout to 1 minute should be more than enough to start the COPY operation within Redshift.

Multiple exports in same database

To handle multiple exports from Funnel and import them into the same Redshift database you either need to work with different Schemas (e.g. set search path to [SCHEMA]  before running our provided SQL) or adjust the SQL used to create the tables and copy the data to use unique table names.

Did this answer your question?