DMS PreLab Setup 1

Steps

  1. Create the Instructor Environment, including a RDS Postgres database as the data source.
  2. Changing RDS Security Group
  3. Access Database from SQL Client (Optional)
  4. Generate and Replicate the CDC Data (Optional)

Limit Instruction:

This immersion day required each student to have their own account. If you are sharing single account with multiple students by creating a multiple IAM users, Account can hit following default service limit:

  • VPC – VPCs per Region 5
  • Glue - Number of crawlers per account 50
  • Glue - Number of concurrent jobs runs per account 50
  • Glue - Maximum DPUs used by a role at one time 300
  • S3 – Number of buckets per account 100
  • Athena - Number of DDL queries you can submit at the same time 20
  • Athena - Number of DML queries you can submit at the same time 20
  • RDS – Make sure you have enough disk space available in your RDS instance, if want to run DMS Change Data Capture (CDC) as generating large amount of data can exhaust RDS disk space.
  • DMS - Make sure you have enough disk space available in your DMS replication instance, if want to run DMS Change Data Capture (CDC) as transferring large amount of CDC data can exhaust disk space.

Introduction

The Database Migration Services (DMS) hands-on lab provide a scenario, where participant learns to hydrate Amazon S3 data lake with a relation database. To achieve that, participants need a source endpoint and this guide helps instructors set up a PostgreSQL database with public endpoint as the source database.

In this lab, you will complete the following tasks using AWS CloudFormation template:

  1. Create the source database environment.
  2. Hydrate the source database environment.
  3. Update the source database environment to demonstrate CDC (Change Data Capture) replication within DMS.
  4. Create Lambda function to trigger CDC data which will be replicated to Amazon S3 by DMS CDC endpoint.

Relevant information about this lab:

  • Expected setup time | 15 minutes
  • Source database name | sportstickets
  • Source schema name | dms_sample

Instructor will provide source database details to participants during main lab to configure source endpoint.

Create the Instructor Environment

In this section, you are going to create a PostgreSQL RDS instance as data source for AWS Data Migration Service to consume by lab attendees for data migration to Amazon S3 data lake.

Make sure you select the us-east-1 (N. Virginia) region

  1. Sign in to the Console where you will host the source database environment.

  2. Navigate to the AWS CloudFormation page.

  3. Click the Deploy to AWS icons below to stand up the DMS pre-lab 1 workshop infrastructure.

Launch Template Region
N.Virginia (us-east-1)

or perform step below 3 manually

  1. Launch a new stack with the AWS CloudFormation template DMSLab_instructor_CFN.json provided with your lab package. Make sure to select us-east-1 (Virginia) region.

    • a. On top right corner, Click on “Create Stack” and select “With new resources”.

    • b. In “Create Stack” Page, select “Template is ready” and for template source, select “Upload a template file”.
    • c. Locate the DMSLab_Instructor_CFN.json template from your local machine.
    • d. Click Next.

e. In Specify stack details, provide a name for Stack Name as “dmslab-instructor”.

f. Click on Next. g. In review page, review all the details, scroll down and check the box to acknowledge the policy and then click on Create Stack.

h. Launch the stack. It may take 15 minutes for the stack to launch. This stack creates a new VPC, Subnets, Security groups, EC2 instance, Route table, Routes, and an RDS Postgres instance.

Please make sure the Postgres database is fully populated before proceed with the DMS lab. It takes 15 to 20 minutes to finish, after the stack is launched.

You can see all resources listed below:

i. Go to the Outputs tabs of AWS CloudFormation stack and note down the instance Endpoint information for your RDS endpoint, which will be similar to information shown in below screenshot

Changing RDS Security Group

Currently your RDS source end point is not open to connect to outside world for security reason. You need to open RDS security group to accept traffic from intended range of IP address. As it is difficult to determine range of IP address of workshop environment, so to have smooth experience of running lab you can temporally allow inbound traffic from all IP address (0.0.0.0/0 CIDR range).

It is not best practice to allow ALL CIDR range in your database security group. You should never apply open to all IP CIDR range while working on actual workload. If you are in a self-paced workshop, the better secure way is to whitelist an IP address from DMS lab, ie. add an Elastic IP address of a NAT Gateway to the RDS security group.

Follow below steps to open security group for students to connect with source RDS database for DMS full data and CDC data dump:

  1. Go to the RDS and double click on “dmslabinstance” DB identifier as shown below:

  2. Click VPC security groups under Connectivity & security tab as shown below:

  3. In Security group screen, Go to Inbound tab and click on Edit as shown below

  4. Update Inbound rule to “Anywhere” from hard coded value “72.21.196.67/32” , as shown in below screen. Make sure to remove the “Anywhere” inbound rule from security group, as soon as you are done with DMS lab.

  5. If you are running both instructor and student labs in a single AWS account, for example in a self-paced environment, replace the “Anywhere” by an IP address from student lab instead. In this example screenshot, we will allow AutoComplete DMS lab to access the RDS.

  6. Go to VPC NAT gateways Console, and look for the IP address you need to add to the RDS security group.

    • If you are running the PreLab2, note down the IP address tagged with “dmslab-student”.

    • Or if you are running the AutoComplete DMS Lab, copy the IP address tagged with “auto-dmslab”.

  7. Click on Save. Now everyone will be able to connect to source RDS instance for lab purpose to ingest data using DMS endpoint.

Make sure to remove “Anywhere” inbound rule from security group as soon as you are done with DMS main lab. Optionally, You can read through the documentation to better understand the source database environment. The GitHub repository for aws-database-migration-samples is located here: https://github.com/aws-samples/aws-database-migration-samples/tree/master/PostgreSQL/sampledb/v1

Access Database from SQL Client (Optional)

You can follow below instruction to setup SQL Workbench to access your Postgres Database from SQL client: https://aws.amazon.com/getting-started/tutorials/create-connect-postgresql-db/

In SQL Workbench: Run following query to find out all Schema and table created. SELECT * FROM pg_catalog.pg_tables;

Ensure the following 2 functions exists. If anything is missing, check the solution at Troubleshoot section.

SELECT * FROM pg_stat_user_functions WHERE funcname in ('generateticketactivity','generatetransferactivity')

Use following query to analyze a table select * from schemaname.tablename;

For example: select * from dms_sample.player;

For example: select * from dms_sample.player;

Following sections are optional, you only need to execute if you want to show change data capture replication with DMS.

Generate and Replicate the CDC Data (Optional)

This step is not required at your initial lab environment setup. Once the full table load of DMS lab is completed, you can start to generate extra transactions in source database to demonstrate DMS CDC (Change Data Capture) functionality.

Navigate to Lambda console and you will see a pre-built Lambda function named “GenerateCDCData”.

  1. Click on the function and scroll down. You will see the code for this function. Copy the below query and paste it in the placeholder (value) of this code line: “ var query_cmd= ”” ”

  2. Run this query first: select dms_sample.generateticketactivity(10);

This query will generate 10 ticket sales in batches of 1-6 tickets to randomly selected people for a random price (within a range.) A record of each transaction is recorded in the ticket_purchase_hist table.

  1. Click on Save and then click on Test to run the function. You can create an empty event as shown here:

You will see no error in lambda log

  1. Once you’ve sold some tickets you can run the generateTransferActivity procedure. The following will transfer tickets from the owner to another person. The whole “batch” of tickets purchased is transferred 80% of the time and 20% of the time an individual ticket is transferred.

Run this query next in the lambda function: select dms_sample.generatetransferactivity(10);

Click on Save and then click on Test to run the function.

When enabling CDC functionality in DMS, only one DMS instance/task should activate “Ongoing replication” to avoid conflicts. When replicating to multiple targets, the processing to fan out the updates should begin with the Amazon S3 bucket, that is the target of the DMS task responsible for Ongoing replication. The process should not begin with the source database, as only one CDC process should be tracking and setting the last committed transaction that was replicated.

Troubleshooting

1. Failed to run Lambda function ‘GenerateCDCData’.

Cause

The source DB sportstickets setup is interrupted. Some database objects, such as the function generateticketactivity() will be missing.

Resolution

Go to EC2 console, reboot the instance DMSLabEC2. It will reload the DB and create any objects that were missing. Due to the re-run issue, the table sporting_event_ticket will be doubled in size at each reboot. You can manually drop the table via the following query, before the reboot. Then wait for 20 minutes before checking the missing DB object again.

DROP TABLE dms_sample.sporting_event_ticket CASCADE

2. RDS source database is out of storage space.

Or you may see ‘No Space left on device’ error from DMSLabEC2 system log

Cause Check the knowledge center here

Resolution Increase the RDS instance disk size, as a quick fix.