The PreLab is only required if you are running the DMS lab outside of an AWS event engine (EE), or in your own AWS Account.
This immersion day requires each student to have their own account. Sharing an account with multiple students by creating IAM users may result in hitting some of the dafult account limits listed below:
The Database Migration Services (DMS) hands-on lab provide a scenario, where participant learns to hydrate Amazon S3 data lake with a relation database. To achieve that, participants need a source endpoint and this guide helps instructors set up a PostgreSQL database with public endpoint as the source database.
In this lab, you will complete the following tasks using AWS CloudFormation template:
Relevant information about this lab:
Instructor will provide source database details to participants during main lab to configure source endpoint.
In this section, you are going to create a PostgreSQL RDS instance as data source for AWS Data Migration Service to consume by lab attendees for data migration to Amazon S3 data lake.
Make sure you select the appropriate region and use the same region throughtout the workshop
Sign in to the Console where you will host the source database environment.
Click the Deploy to AWS icons below to stand up the DMS pre-lab 1 workshop infrastructure.
Please make sure the Postgres database is fully populated before proceed with the DMS lab. It takes 15 to 20 minutes to finish, after the stack is launched.
You can see all resources listed below:
i. Go to the Outputs tabs of AWS CloudFormation stack and note down the instance Endpoint information for your RDS endpoint, which will be similar to information shown in below screenshot
Currently your RDS source end point is not open to connect to outside world for security reason. You need to open RDS security group to accept traffic from intended range of IP address. As it is difficult to determine range of IP address of workshop environment, so to have smooth experience of running lab you can temporally allow inbound traffic from all IP address (0.0.0.0/0 CIDR range).
It is not best practice to allow ALL CIDR range in your database security group. You should never apply open to all IP CIDR range while working on actual workload. If you are in a self-paced workshop, the better secure way is to whitelist an IP address from DMS lab, ie. add an Elastic IP address of a NAT Gateway to the RDS security group.
Follow below steps to open security group for students to connect with source RDS database for DMS full data and CDC data dump:
Click VPC security groups under Connectivity & security tab as shown below:
In Security group screen, Go to Inbound tab and click on Edit as shown below
Update Inbound rule to “Anywhere” from hard coded value “22.214.171.124/32” , as shown in below screen. Make sure to remove the “Anywhere” inbound rule from security group, as soon as you are done with DMS lab.
If you are running both instructor and student labs in a single AWS account, for example in a self-paced environment, replace the “Anywhere” by an IP address from student lab instead. In this example screenshot, we will allow AutoComplete DMS lab to access the RDS.
Go to VPC NAT gateways Console, and look for the IP address you need to add to the RDS security group.
Click on Save. Now everyone will be able to connect to source RDS instance for lab purpose to ingest data using DMS endpoint.
Make sure to remove “Anywhere” inbound rule from security group as soon as you are done with DMS main lab. Optionally, You can read through the documentation to better understand the source database environment. The GitHub repository for aws-database-migration-samples is located here: https://github.com/aws-samples/aws-database-migration-samples/tree/master/PostgreSQL/sampledb/v1
You can follow below instruction to setup SQL Workbench to access your Postgres Database from SQL client: https://aws.amazon.com/getting-started/tutorials/create-connect-postgresql-db/
Database credentials: master/master123
In SQL Workbench: Run following query to find out all Schema and table created.
SELECT * FROM pg_catalog.pg_tables;
Ensure the following 2 functions exists. If anything is missing, check the solution at Troubleshoot section.
SELECT * FROM pg_stat_user_functions WHERE funcname in ('generateticketactivity','generatetransferactivity')
Use following query to analyze a table
select * from schemaname.tablename;
select * from pg_catalog.pg_tables;
select * from dms_sample.player;
Following sections are optional, you only need to execute if you want to show change data capture replication with DMS.
This step is not required at your initial lab environment setup. Once the full table load of DMS lab is completed, you can start to generate extra transactions in source database to demonstrate DMS CDC (Change Data Capture) functionality.
If you are an instructor at AWS Event expand the following section to find the instructions
GenerateCDCData lambda function is deployed in event central account and not team account. Hence you need to federate it to the event central account by navigating Actions → Federated Login as shown below.
Navigate to Lambda console and you will see a pre-built Lambda function named GenerateCDCData.
Click on the function and scroll down. You will see the code for this function. Copy the below query and paste it in the placeholder (value) of this code line:
“ var query_cmd= ”
Run this query first:
This query will generate 10 ticket sales in batches of 1-6 tickets to randomly selected people for a random price (within a range.) A record of each transaction is recorded in the ticket_purchase_hist table.
Click on Save and then click on Test to run the function. You can create an empty event as shown here: You will see no error in lambda log
Once you’ve sold some tickets you can run the generateTransferActivity procedure. The following will transfer tickets from the owner to another person. The whole “batch” of tickets purchased is transferred 80% of the time and 20% of the time an individual ticket is transferred.
Run this query next in the lambda function:
Click on Save and then click on Test to run the function.
When enabling CDC functionality in DMS, only one DMS instance/task should activate “Ongoing replication” to avoid conflicts. When replicating to multiple targets, the processing to fan out the updates should begin with the Amazon S3 bucket, that is the target of the DMS task responsible for Ongoing replication. The process should not begin with the source database, as only one CDC process should be tracking and setting the last committed transaction that was replicated.
The source DB sportstickets setup is interrupted. Some database objects, such as the function generateticketactivity() will be missing.
Go to EC2 console, reboot the instance
DMSLabEC2. It will reload the DB and create any objects that were missing. Due to the re-run issue, the table sporting_event_ticket will be doubled in size at each reboot. You can manually drop the table via the following query, before the reboot. Then wait for 20 minutes before checking the missing DB object again.
DROP TABLE dms_sample.sporting_event_ticket CASCADE
Or you may see ‘No Space left on device’ error from DMSLabEC2 system log
Cause Check the knowledge center here
Resolution Increase the RDS instance disk size, as a quick fix.