All Collections
Data Warehouse Sharing
Amazon S3 Connector
Query S3 data using Athena with automatic schema update (Glue)
Query S3 data using Athena with automatic schema update (Glue)

Learn how to retrieve schema information automatically using aws glue and query data from s3 using Athena

Ishan Shekhar avatar
Written by Ishan Shekhar
Updated over a week ago

  • The Athena Query instance and the AWS S3 bucket should be in the same region

  • The default option for the funnel data export is compressed, using a compressed export format will help to keep the cost low when querying a large amount of data.

  • We recommend using TSV as an output file format, which works better with the default delimited format which Athena uses.
    โ€‹
    Read more about How to set up an export to AWS S3.

Get started with setting up a crawler

  1. Connect a new data source by pressing the Connect data source button, then you should be able to see an option where you can choose the location of data and how to read its metadata catalog. Select the Query data in Amazon S3 for the location, and AWS glue data catalog for metadata.

  2. AWS glue data catalog would require a crawler that will retrieve schema information automatically, select that option and press Connect to AWS Glue which will open a new window.

  3. Setup a new crawler for your data
    - Choose an appropriate name for your crawler like 'funnel-data-s3-crawler'.
    - Select Data stores as the source type
    - Choose S3 as a data store, and leave the Connection blank
    - Select Specified path in my account, and paste the path where is data is located in the include path
    - Select no, if you don't have any other data store
    - Create a new IAM role with an appropriate name for the crawler to access the data files
    - Choose a daily frequency for the crawler to run
    - Add a database where the output data table will be created, like 'funnel-data'
    - Review and submit, now select the crawler you just created and press Run Crawler to execute it for the first time (this might take a few minutes)

Start querying your data

If you go to Athena you should be able to see the database called 'funnel-data' under the Data source: AwsDataCatalog. Start by selecting your database and modifying your query

Did this answer your question?