RDS Postgres Database is used as a source of ticket sales system for sporting events. It stores transaction information about ticket sales price to selected people and ticket ownership transfer with additional tables for event details. AWS Database Migration Service (DMS) is used for a full data load from the Amazon RDS source to Amazon S3 bucket.
Before the Glue lab starts, you might choose to skip the DMS data migration, instead copy the source data to your S3 bucket directly.
In today’s lab, you will copy the data from a centralized S3 bucket to your AWS account, crawl the dataset with AWS Glue crawler for metadata creation and transform the data with AWS Glue to Query data and create a View with Athena and Build a dashboard with Amazon QuickSight.
Make sure you are in the us-east-1 (Virginia) region
In AWS Cloud9, a development environment (or just environment) is a place where you store your development project’s files and where you run the tools to develop your applications. In this tutorial, you create a special kind of environment called an EC2 environment and then work with the files and tools in that environment.
Sign in to the AWS Cloud9 console as follows:
After you sign in to the AWS Cloud9 console, in the top navigation bar, choose US East (N. Virginia) AWS Region to create the environment in. For a list of available AWS Regions, see AWS Cloud9 in the AWS General Reference.
Or:
On the Name environment page, for Name, type a name for your environment. For this tutorial, use my-demo-environment.
For Description, type something about your environment. For this tutorial, use This environment is for the AWS Cloud9 tutorial.
Choose Next step.
On the Configure settings page, for Environment type, choose Create a new instance for environment (EC2).
Choosing Create a new instance for environment (EC2) might result in possible charges to your AWS account for Amazon EC2.
Choosing instance types with more RAM and vCPUs might result in additional charges to your AWS account for Amazon EC2.
For Platform, choose the type of Amazon EC2 instance that AWS Cloud9 will create and then connect to this environment: Amazon Linux or Ubuntu.
For Cost-saving setting, choose the amount of time until AWS Cloud9 shuts down the Amazon EC2 instance for the environment after all web browser instances that are connected to the IDE for the environment have been closed. Or leave the default choice.
Choosing a longer time period might result in more charges to your AWS account.
Leave the default settings for Network settings (advanced).
Choose Next step.
On the Review page, choose Create environment. Wait while AWS Cloud9 creates your environment. This can take several minutes.
After AWS Cloud9 creates your environment, it displays the AWS Cloud9 IDE for the environment.
If AWS Cloud9 doesn’t display the IDE after at least five minutes, there might be a problem with your web browser, your AWS access permissions, the instance, or the associated virtual private cloud (VPC). For possible fixes, see Cannot Open an Environment in Troubleshooting.
Open Cloud9 Console from AWS and you will see the terminal screen in the bottom:
Generate a key pair by issuing the command ssh-keygen
Press enter 3 times to take the default choices.
Upload the public key to your EC2 region:
aws ec2 import-key-pair --key-name "lfworkshop" --public-key-material file://~/.ssh/id_rsa.pub
Issue the following command in the terminal, and replace the bucket
name with your own one.
aws s3 cp --recursive s3://aws-dataengineering-day.workshop.aws/data/ s3://<YourBucketName>/tickets/
Open the S3 console and view the data that was copied from Cloud9 terminal.
Your S3 bucket name will look like below : BucketName/bucket_folder_name/schema_name/table_name/objects/
In our lab example this becomes: “/<BucketName>/tickets/dms_sample” with a separate path for each table_name
Download one of the files:
Select the check box next to the object name and click Download in the pop-up window.
Click Save File.
Open the file.
Explore the objects in the S3 directory further.
In the next part of this lab, we will complete the following tasks:
If you If want to re-run the lab by yourself, please follow the lab instruction published in the GitHub:
https://github.com/aws-samples/data-engineering-for-aws-immersion-day