RDS Postgres Database is used as a source of ticket sales system for sporting events. It stores transaction information about ticket sales price to selected people and ticket ownership transfer with additional tables for event details. AWS Database Migration Service (DMS) is used for a full data load from the Amazon RDS source to Amazon S3 bucket.
Before the Glue lab starts, you might choose to skip the DMS data migration, instead copy the source data to your S3 bucket directly.
In today’s lab, you will copy the data from a centralized S3 bucket to your AWS account, crawl the dataset with AWS Glue crawler for metadata creation and transform the data with AWS Glue to Query data and create a View with Athena and Build a dashboard with Amazon QuickSight.
Open AWS CloudShell in us-east-1 (N. Virginia) region. It will open a terminal window in the browser. (If there is a pop-up, close it)
We will be launching CloudShell in us-east-1 (N. Virginia) region irrespective of where you are running this whole workshop. By executing the following command you will be copying the data to the correct S3 bucket in whatever region it belongs (It can also be across region).
Issue the following command in the terminal, and replace the bucket name with your own one.
aws s3 cp --recursive --copy-props none s3://aws-dataengineering-day.workshop.aws/data/ s3://<YourBucketName>/tickets/
Open the S3 console and view the data that was copied through CloudShell terminal.
Your S3 bucket name will look like below : BucketName/bucket_folder_name/schema_name/table_name/objects/
In our lab example this becomes: “/<BucketName>/tickets/dms_sample” with a separate path for each table_name
Download one of the files:
Select the check box next to the object name and click Download in the pop-up window.
Click Save File.
Open the file.
Note that column names are included in the file in the first row.
Explore the objects in the S3 directory further.
In the next part of this lab, we will complete the following tasks:
If you If want to re-run the lab by yourself, please follow the lab instruction published in the GitHub: