DMS task getting failed on Oracle on-going replication (Full load works fine) - database-migration

We're using AWS DMS to migrate oracle databases into s3 buckets and after successfully running the full load on Oracle Database 19c Standard Edition 2 hosted in rds, the on-going replication is failing with error:
Failed to add the REDO sequence xxxx; to LogMiner in thread 1;. Replication task could not find the required REDO log on the source database to read changes from. Please check redo log retention settings and retry
I already checked that the archivelog retention hours was set to 24
Have anyone came across the same issue!? Any help will be much appreciated.

We managed to fix the issue after rerunning the grants script as documented in aws dms. We could not find the root cause but some privilege was not assigned at first and impacted the redologs access https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.Oracle.html#CHAP_Source.Oracle.Amazon-Managed

Related

AWS DMS task error for Aurora PostgreSQL migration

I am trying to migrate all the data present in my old RDS Aurora PostgreSQL cluster to the new RDS Aurora PostgreSQL cluster using AWS DMS. I have created the source and target endpoints and tested the connection successfully. However when I am trying to create a migration task in DMS, it is continuously failing with the error:
Last Error ODBC general error. Error executing command; Stream component failed at subtask 0,
component st_0_PWDKKAMFPUY2RHV ; Stream component
'st_0_PWDKKAMFPUY2RHV' terminated [reptask/replicationtask.c:3171] [1022502]
Stop Reason RECOVERABLE_ERROR Error Level RECOVERABLE
Even after enabling CloudWatch logs, I am not able to figure out what's missing? What does the error signify or what am I doing wrong?
I had faced the same error and the issue seems related to database user rights for
Replication Client and Replication Slave
I have fixed it by setting the Replication rights using the below statements in SQL
GRANT REPLICATION CLIENT ON *.* to {dbusername}#'%';
GRANT REPLICATION SLAVE ON *.* to {dbusername}#'%';
Note: replacing {dbusername} with the actual database user name which was being used in DMS Endpoint

Google Cloud SQL Read Replica Failing to replicate

I have created a new read replica from the GCP Cloud SQL Console, using the create read replica option
I am getting following error after creation of replica, replica instance is creating successfully but the replication not starting as expected.
Here is the error message I am getting in the error log.
"2020-05-05T05:11:30.747872Z 4 [ERROR] Slave I/O for channel '': error
connecting to master 'cloudsqlreplica#172.17.112.4:3306' - retry-time:
60 retries: 1, Error_code: 2003"
binlog is already enable on master.
Database version is MySQL 5.7
Auto storage increase is enabled
Automated backups are enabled
Point-in-time recovery is enabled
Please let me know if anyone came across this issue and if you know how to solve this problem.

Starting AWS DMS Replication Task in Terraform

Is there any way to start an AWS Database Migration Service full-load-and-cdc replication task through Terraform? Preferably, this would start automatically upon creation of the task.
The AWS DMS console provides an option to "Start task on create", and the AWS CLI provides a start-replication-task command, but I'm not seeing similar options in the Terraform resources. The aws_dms_replication_task provides the cdc_start_time argument, but I think this may apply only to cdc tasks. I've tried setting this argument to a number of past/current/future timestamps with my full-load-and-cdc replication task, but the task never started (it was merely created and entered the ready state).
I'd be glad to log a feature request to Terraform if this feature is not supported, but wanted to check with the community first to see if I was overlooking an existing way to do this today.
(Note: This question has also been logged to the Terraform Google group.)
I've logged an issue for this feature request:
Terraform AWS Provider #2083: Support Starting AWS Database Migration Service Replication Task

AWS Glue ETL job from AWS Redshift to S3 fails

I am trying out AWS Glue service to ETL some data from redshift to S3. Crawler runs successfully and creates the meta table in data catalog, however when I run the ETL job ( generated by AWS ) it fails after around 20 minutes saying "Resource unavailable".
I cannot see AWS glue logs or error logs created in Cloudwatch. When I try to view them it says "Log stream not found. The log stream jr_xxxxxxxxxx could not be found. Check if it was correctly created and retry."
I would appreciate it if you could provide any guidance to resolve this issue.
So basically, the job you add to Glue will either run if there's not too much traffic in the region your Glue is. If there are no resources available, you need to either manually re-add the job again or you can also bind yourself to events from CloudWatch via SNS.
Also, there are parameters you can pass to the job like maximunRetry and timeout.
If you have a Ressource not available, it won't trigger a retry because the job did not fail, it just didn't even started. But if you set the timeout to let's say 60 minutes, it will trigger an error after that time, decrement your retry pool and re-launch the job.
The closest thing I see to Glue documentation on this is here:
If you encounter errors in AWS Glue, use the following solutions to
help you find the source of the problems and fix them. Note The AWS
Glue GitHub repository contains additional troubleshooting guidance in
AWS Glue Frequently Asked Questions. Error: Resource Unavailable If
AWS Glue returns a resource unavailable message, you can view error
messages or logs to help you learn more about the issue. The following
tasks describe general methods for troubleshooting. • A custom DNS
configuration without reverse lookup can cause AWS Glue to fail. Check
your DNS configuration. If you are using Amazon Route 53 or Microsoft
Active Directory, make sure that there are forward and reverse
lookups. For more information, see Setting Up DNS in Your VPC (p. 23).
• For any connections and development endpoints that you use, check
that your cluster has not run out of elastic network interfaces.
I have recently struggled with Resource Unavailable thrown by Glue Job
Also i was not able to make a direct connection in Glue using RDS -it said "no suitable security group found"
I faced this issue while trying to connect with AWS RDS and Redshift
The problem was with the Security Group that the Redshift was using. There is a need to place a self referencing inbound rule in the Security Group.
For those who dont know what is self referencing inbound rule, follow the steps
1) Go to the Security Group you are using (VPC -> Security Group)
2) In the Inbound Rules select Edit Inbound Rules
3) Add a Rule
a) Type - All Traffic b) Protocol - All c) Port Range - ALL d) Source - custom and in space available write the initial of your security group and select it. e) Save it.
Its done !
if you were missing this condition in your Security Group Inbound Rules
Try creating the connection you will be able to create the connection.
Also job should work this time.

Not able to spin up Amazon EMR cluster with custom hive-site

I am following the instructions in this tutorial to spin up a cluster.
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-dev-create-metastore-outside.html
It is not working and it fails with an error "Bootstrap failure". But I don't see any errors in any logs too. It is making it extremely difficult for me to debug.
I have my mySQL database for metadata in RDS and it is active as well. Log folders do not have any errors. Do you know if this tutorial is complete?
It worked after providing RDS security group access to EMR security group and vice versa .