This immersion day required each student to have their own account. If you are sharing single account with multiple students by creating a multiple IAM users, Account can hit following default service limit:
The Database Migration Services (DMS) hands-on lab provide a scenario, where participant learns to hydrate Amazon S3 data lake with a relation database. To achieve that, participants need a source endpoint and this guide helps instructors set up a PostgreSQL database with public endpoint as the source database.
In this lab, you will complete the following tasks using AWS CloudFormation template:
Relevant information about this lab:
Instructor will provide source database details to participants during main lab to configure source endpoint.
In this section, you are going to create a PostgreSQL RDS instance as data source for AWS Data Migration Service to consume by lab attendees for data migration to Amazon S3 data lake.
Make sure you select the us-east-1 (N. Virginia) region
Sign in to the Console where you will host the source database environment.
Navigate to the AWS CloudFormation page.
Click the Deploy to AWS icons below to stand up the DMS pre-lab 1 workshop infrastructure.
or perform step below 3 manually
Launch a new stack with the AWS CloudFormation template DMSLab_instructor_CFN.json provided with your lab package. Make sure to select us-east-1 (Virginia) region.
e. In Specify stack details, provide a name for Stack Name as “dmslab-instructor”.
f. Click on Next. g. In review page, review all the details, scroll down and check the box to acknowledge the policy and then click on Create Stack.
h. Launch the stack. It may take 15 minutes for the stack to launch. This stack creates a new VPC, Subnets, Security groups, EC2 instance, Route table, Routes, and an RDS Postgres instance.
Please make sure the Postgres database is fully populated before proceed with the DMS lab. It takes 15 to 20 minutes to finish, after the stack is launched.
You can see all resources listed below:
i. Go to the Outputs tabs of AWS CloudFormation stack and note down the instance Endpoint information for your RDS endpoint, which will be similar to information shown in below screenshot
Currently your RDS source end point is not open to connect to outside world for security reason. You need to open RDS security group to accept traffic from intended range of IP address. As it is difficult to determine range of IP address of workshop environment, so to have smooth experience of running lab you can temporally allow inbound traffic from all IP address (0.0.0.0/0 CIDR range).
It is not best practice to allow ALL CIDR range in your database security group. You should never apply open to all IP CIDR range while working on actual workload. If you are in a
self-paced workshop, the better secure way is to whitelist an IP address from DMS lab, ie. add an Elastic IP address of a NAT Gateway to the RDS security group.
Follow below steps to open security group for students to connect with source RDS database for DMS full data and CDC data dump:
Click VPC security groups under Connectivity & security tab as shown below:
In Security group screen, Go to Inbound tab and click on Edit as shown below
Update Inbound rule to “Anywhere” from hard coded value “18.104.22.168/32” , as shown in below screen. Make sure to remove the “Anywhere” inbound rule from security group, as soon as you are done with DMS lab.
If you are running both instructor and student labs in a single AWS account, for example in a self-paced environment, replace the “Anywhere” by an IP address from student lab instead. In this example screenshot, we will allow AutoComplete DMS lab to access the RDS.
Go to VPC NAT gateways Console, and look for the IP address you need to add to the RDS security group.
Click on Save. Now everyone will be able to connect to source RDS instance for lab purpose to ingest data using DMS endpoint.
Make sure to remove “Anywhere” inbound rule from security group as soon as you are done with DMS main lab. Optionally, You can read through the documentation to better understand the source database environment. The GitHub repository for aws-database-migration-samples is located here: https://github.com/aws-samples/aws-database-migration-samples/tree/master/PostgreSQL/sampledb/v1
You can follow below instruction to setup SQL Workbench to access your Postgres Database from SQL client: https://aws.amazon.com/getting-started/tutorials/create-connect-postgresql-db/
In SQL Workbench:
Run following query to find out all Schema and table created.
SELECT * FROM pg_catalog.pg_tables;
Ensure the following 2 functions exists. If anything is missing, check the solution at Troubleshoot section.
SELECT * FROM pg_stat_user_functions WHERE funcname in ('generateticketactivity','generatetransferactivity')
Use following query to analyze a table
select * from schemaname.tablename;
select * from dms_sample.player;
select * from dms_sample.player;
Following sections are optional, you only need to execute if you want to show change data capture replication with DMS.
This step is not required at your initial lab environment setup. Once the full table load of DMS lab is completed, you can start to generate extra transactions in source database to demonstrate DMS CDC (Change Data Capture) functionality.
Navigate to Lambda console and you will see a pre-built Lambda function named “GenerateCDCData”.
Click on the function and scroll down. You will see the code for this function. Copy the below query and paste it in the placeholder (value) of this code line:
“ var query_cmd= ”
Run this query first: select dms_sample.generateticketactivity(10);
This query will generate 10 ticket sales in batches of 1-6 tickets to randomly selected people for a random price (within a range.) A record of each transaction is recorded in the ticket_purchase_hist table.
You will see no error in lambda log
Run this query next in the lambda function: select dms_sample.generatetransferactivity(10);
Click on Save and then click on Test to run the function.
When enabling CDC functionality in DMS, only one DMS instance/task should activate “Ongoing replication” to avoid conflicts. When replicating to multiple targets, the processing to fan out the updates should begin with the Amazon S3 bucket, that is the target of the DMS task responsible for Ongoing replication. The process should not begin with the source database, as only one CDC process should be tracking and setting the last committed transaction that was replicated.
The source DB sportstickets setup is interrupted. Some database objects, such as the function generateticketactivity() will be missing.
Go to EC2 console, reboot the instance
DMSLabEC2. It will reload the DB and create any objects that were missing. Due to the re-run issue, the table sporting_event_ticket will be doubled in size at each reboot. You can manually drop the table via the following query, before the reboot. Then wait for 20 minutes before checking the missing DB object again.
DROP TABLE dms_sample.sporting_event_ticket CASCADE
Or you may see ‘No Space left on device’ error from DMSLabEC2 system log
Cause Check the knowledge center here
Resolution Increase the RDS instance disk size, as a quick fix.