Federated query is a new Amazon Athena feature that enables data analysts, engineers, and data scientists to execute SQL queries across data stored in relational, non-relational, object, and custom data sources. With Athena federated query, customers can submit a single SQL query and analyze data from multiple sources running on-premises or hosted on the cloud. Athena executes federated queries using Data Source Connectors that run on AWS Lambda.
As part of this lab, we will explore how to query different data sources from Athena. We will be using existing RDS database which was created as part of the initial setup as datasource-1 and data lake data (stored in S3) as datasource-2.
Create an S3 endpoint to allow S3 access to Athena connecter Lambda, follow steps mentioned here to create S3 endpoint. Make sure to use the same subnet which will be used for Lambda function in next step. This will put a route entry in the subnet routing table to reach to S3 via AWS backbone.
Choose “Connect data source” on the Query Editor
Select the data source as ‘PostgreSQL’ to which you want to connect,
as shown in the following screenshot. Click ‘Next’
Click on ‘Configure new function’ and give a catalog name. And select ‘Configure new lambda function’
Changed string below:
Copy below query and paste in the query editor
SELECT loc.city, count(distinct evt.id) AS events FROM "Postgres_DB"."dms_sample"."sport_location" AS loc JOIN "AwsDataCatalog"."ticketdata"."parquet_sporting_event" AS evt ON loc.id = evt.location_id GROUP BY loc.city ORDER BY loc.city ASC;