Issue creating AWS DMS task from boto3 script

Issue creating AWS DMS task from boto3 script - amazon-web-services

Using the below python boto3 script to create the AWS DMS task and start the replication task, but getting the below error:
Error:
botocore.errorfactory.InvalidResourceStateFault: An error occurred (InvalidResourceStateFault) when calling the StartReplicationTask operation: Replication Task cannot be started, invalid state
Python script:
#!/usr/bin/python
import boto3
client_dms = boto3.client('dms')
#Create a replication DMS task
response = client_dms.create_replication_task(
ReplicationTaskIdentifier='test-new1',
ResourceIdentifier='test-new',
ReplicationInstanceArn='arn:aws:dms:us-east-1:xxxxxxxxxx:rep:test1',
SourceEndpointArn='arn:aws:dms:us-east-1:xxxxxxxxxx:endpoint:source',
TargetEndpointArn='arn:aws:dms:us-east-1:xxxxxxxxxx:endpoint:target',
MigrationType='full-load',
TableMappings='{\n \"TableMappings\": [\n {\n \"Type\": \"Include\",\n \"SourceSchema\": \"test\",\n \"SourceTable\": \"table_name\"\n}\n ]\n}\n\n'
)
#Start the task from DMS
response = client_dms.start_replication_task(
ReplicationTaskArn='arn:aws:dms:us-east-1:xxxxxxxxxx:task:test-new',
StartReplicationTaskType='start-replication'
)

Probably have to use waiter for the task to be ready:
ReplicationTaskReady
before you can perform other actions on it.

Related

Getting error while testing AWS Lambda function: "Invalid database identifier"

Hi I'm getting error while testing lambda function like:
{
"errorMessage": "An error occurred (InvalidParameterValue) when calling the DescribeDBInstances operation: Invalid database identifier: <RDS instance id>",
"errorType": "ClientError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 25, in lambda_handler\n db_instances = rdsClient.describe_db_instances(DBInstanceIdentifier=rdsInstanceId)['DBInstances']\n",
" File \"/var/runtime/botocore/client.py\", line 391, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File \"/var/runtime/botocore/client.py\", line 719, in _make_api_call\n raise error_class(parsed_response, operation_name)\n"
]
}
AND here is my lambda code :
import json
import boto3
import logging
import os
#Logging
LOGGER = logging.getLogger()
LOGGER.setLevel(logging.INFO)
#Initialise Boto3 for RDS
rdsClient = boto3.client('rds')
def lambda_handler(event, context):
#log input event
LOGGER.info("RdsAutoRestart Event Received, now checking if event is eligible. Event Details ==> ", event)
#Input event from the SNS topic originated from RDS event notifications
snsMessage = json.loads(event['Records'][0]['Sns']['Message'])
rdsInstanceId = snsMessage['Source ID']
stepFunctionInput = {"rdsInstanceId": rdsInstanceId}
rdsEventId = snsMessage['Event ID']
#Retrieve RDS instance ARN
db_instances = rdsClient.describe_db_instances(DBInstanceIdentifier=rdsInstanceId)['DBInstances']
db_instance = db_instances[0]
rdsInstanceArn = db_instance['DBInstanceArn']
# Filter on the Auto Restart RDS Event. Event code: RDS-EVENT-0154.
if 'RDS-EVENT-0154' in rdsEventId:
#log input event
LOGGER.info("RdsAutoRestart Event detected, now verifying that instance was tagged with auto-restart-protection == yes")
#Verify that instance is tagged with auto-restart-protection tag. The tag is used to classify instances that are required to be terminated once started.
tagCheckPass = 'false'
rdsInstanceTags = rdsClient.list_tags_for_resource(ResourceName=rdsInstanceArn)
for rdsInstanceTag in rdsInstanceTags["TagList"]:
if 'auto-restart-protection' in rdsInstanceTag["Key"]:
if 'yes' in rdsInstanceTag["Value"]:
tagCheckPass = 'true'
#log instance tags
LOGGER.info("RdsAutoRestart verified that the instance is tagged auto-restart-protection = yes, now starting the Step Functions Flow")
else:
tagCheckPass = 'false'
#log instance tags
LOGGER.info("RdsAutoRestart Event detected, now verifying that instance was tagged with auto-restart-protection == yes")
if 'true' in tagCheckPass:
#Initialise StepFunctions Client
stepFunctionsClient = boto3.client('stepfunctions')
# Start StepFunctions WorkFlow
# StepFunctionsArn is stored in an environment variable
stepFunctionsArn = os.environ['STEPFUNCTION_ARN']
stepFunctionsResponse = stepFunctionsClient.start_execution(
stateMachineArn= stepFunctionsArn,
name=event['Records'][0]['Sns']['MessageId'],
input= json.dumps(stepFunctionInput)
)
else:
LOGGER.info("RdsAutoRestart Event detected, and event is not eligible")
return {
'statusCode': 200
}
I'm trying to Stop an Amazon RDS database which starts automatically after 7 days. I'm following this AWS document: Field Notes: Stopping an Automatically Started Database Instance with Amazon RDS | AWS Architecture Blog
Can anyone help me?

The error message is saying: Invalid database identifier: <RDS instance id>"
It seems to be coming from this line:
db_instances = rdsClient.describe_db_instances(DBInstanceIdentifier=rdsInstanceId)['DBInstances']
The error message is saying that the rdsInstanceId variable contains <RDS instance id>, which seems to be an example value rather than a real value.
In looking at the code on Field Notes: Stopping an Automatically Started Database Instance with Amazon RDS | AWS Architecture Blog, it is asking you to create a test event that includes this message:
"Message": "{\"Event Source\":\"db-instance\",\"Event Time\":\"2020-07-09 15:15:03.031\",\"Identifier Link\":\"https://console.aws.amazon.com/rds/home?region=<region>#dbinstance:id=<RDS instance id>\",\"Source ID\":\"<RDS instance id>\",\"Event ID\":\"http://docs.amazonwebservices.com/AmazonRDS/latest/UserGuide/USER_Events.html#RDS-EVENT-0154\",\"Event Message\":\"DB instance started\"}",
If you look closely at that line, it includes this part to identify the Amazon RDS instance:
dbinstance:id=<RDS instance id>
I think that you are expected to modify the provided test event to fill-in your own values for anything in <angled brackets> (such as the Instance Id of your Amazon RDS instance).

Read/write to AWS S3 from Apache Spark Kubernetes container via vpc endpoint giving 400 Bad Request

I am trying to read and write data to AWS S3 from Apache Spark Kubernetes Containervia vpc endpoint
The Kubernetes container is on premise (data center) in US region . Following is the Pyspark code to connect to S3:
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
conf = (
SparkConf()
.setAppName("PySpark S3 Example")
.set("spark.hadoop.fs.s3a.endpoint.region", "us-east-1")
.set("spark.hadoop.fs.s3a.endpoint","<vpc-endpoint>")
.set("spark.hadoop.fs.s3a.access.key", "<access_key>")
.set("spark.hadoop.fs.s3a.secret.key", "<secret_key>")
.set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
.set("spark.driver.extraJavaOptions", "-Dcom.amazonaws.services.s3.enforceV4=true")
.set("spark.executor.extraJavaOptions","-Dcom.amazonaws.services.s3.enableV4=true")
.set("spark.executor.extraJavaOptions", "-Dcom.amazonaws.services.s3.enforceV4=true")
.set("spark.fs.s3a.path.style.access", "true")
.set("spark.hadoop.fs.s3a.server-side-encryption-algorithm","SSE-KMS")
.set("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider")
)
spark = SparkSession.builder.config(conf=conf).getOrCreate()
data = [{"key1": "value1", "key2": "value2"}, {"key1":"val1","key2":"val2"}]
df = spark.createDataFrame(data)
df.write.format("json").mode("append").save("s3a://<bucket-name>/test/")
Exception Raised:
py4j.protocol.Py4JJavaError: An error occurred while calling o91.save.
: org.apache.hadoop.fs.s3a.AWSBadRequestException: doesBucketExist on <bucket-name>
: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: <requestID>;
Any help would be appreciated

unless your hadoop s3a client is region aware (3.3.1+), setting that region option won't work. There's an aws sdk option "aws.region which you can set as as a system property instead.

Why my 'AWS Lambda Invoke Function' task in Azure DevOps Build Pipeline doesn't fail if the Lambda returns 400?

I have this python code inside Lambda:
#This script will run as a Lambda function on AWS.
import time, json
cmdStatus = "Failed"
message = ""
statusCode = 200
def lambda_handler(event, context):
time.sleep(2)
if(cmdStatus=="Failed"):
message = "Command execution failed"
statusCode = 400
elif(cmdStatus=="Success"):
message = "The script execution is successful"
statusCode = 200
else:
message = "The cmd status is: " + cmdStatus
statusCode = 500
return {
'statusCode': statusCode,
'body': json.dumps(message)
}
and I am invoking this Lambda from Azure DevOps Build Pipeline - AWS Lambda Invoke Function.
As you can see in the above code - have intentionally put that cmdStatus to Failed to make that Lambda fail but when executed from Azure DevOps Build Pipeline - the task succeeds. Strange.
How can I make the pipeline to fail in this case? Please help.
Thanks

I have been working with a similar issue myself and it looks like a bug in the task itself. It was reported in 2019 and nothing happened since so I wouldn't hold out much hope.
https://github.com/aws/aws-toolkit-azure-devops/issues/175
My workaround to this issue was to instead use the AWS CLI task with
Command: lambda
Subcommand: invoke
Options and Parameters: --function-name {nameOfYourFunction} response.json
Followed immediately by a bash task with an inline bash script
cat response.json
if grep -q "errorType" "response.json"; then
echo "An error was found"
exit 1
fi
echo "No error was found"

AWS Batch - Access denied 403

I am using AWS Batch with ECS to perform a job which need to send a request to Athena. I use python boto3 to send the query and the get the request status :
start_query_execution : work fine
get_query_execution : have an error !
When I try to get the query execution I have the following error :
{'QueryExecution': {'QueryExecutionId': 'XXXX', 'Query': "SELECT * FROM my_table LIMIT 10 ", 'StatementType': 'DML', 'ResultConfiguration': {'OutputLocation': 's3://my_bucket_name/athena-results/query_id.csv'}, 'QueryExecutionContext': {'Database': 'my_database'}, 'Status': {'State': 'FAILED', 'StateChangeReason': '**Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 4.**. ; S3 Extended Request ID: ....=)'
I have the all permissions to the container role (only to test) :
s3:*
athena : *
glue : *
I face this problem only in container in AWS batch : with the same policy and code in a lambda it's working !
Any help will be appreciated.

In Athena Output location what I have been using Athena bucket name not file name.
As result set will be generated which will have its own id
'ResultConfiguration': {'OutputLocation': 's3://my_bucket_name/athena-results/'}
If ypu are not sure of the bucket for query you can check in query console -->settings

S3A hadoop aws jar always return AccessDeniedException

Could anyone please help me in figure out why do I get below exception? All I'm trying to read some data from local file in my spark program and writing into S3. I have correct secret key and access key specified like this -
Do you think it's related to version mismatch of some library?
SparkConf conf = new SparkConf();
// add more spark related properties
AWSCredentials credentials = DefaultAWSCredentialsProviderChain.getInstance().getCredentials();
conf.set("spark.hadoop.fs.s3a.access.key", credentials.getAWSAccessKeyId());
conf.set("spark.hadoop.fs.s3a.secret.key", credentials.getAWSSecretKey());
The java code is plain vanilla -
protected void process() throws JobException {
JavaRDD<String> linesRDD = _sparkContext.textFile(_jArgs.getFileLocation());
linesRDD.saveAsTextFile("s3a://my.bucket/" + Math.random() + "final.txt");
This is my code and gradle.
Gradle
ext.libs = [
aws: [
lambda: 'com.amazonaws:aws-lambda-java-core:1.2.0',
// The AWS SDK will dynamically import the X-Ray SDK to emit subsegments for downstream calls made by your
// function
//recorderCore: 'com.amazonaws:aws-xray-recorder-sdk-core:1.1.2',
//recorderCoreAwsSdk: 'com.amazonaws:aws-xray-recorder-sdk-aws-sdk:1.1.2',
//recorderCoreAwsSdkInstrumentor: 'com.amazonaws:aws-xray-recorder-sdk-aws-sdk-instrumentor:1.1.2',
// https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk
javaSDK: 'com.amazonaws:aws-java-sdk:1.11.311',
recorderSDK: 'com.amazonaws:aws-java-sdk-dynamodb:1.11.311',
// https://mvnrepository.com/artifact/com.amazonaws/aws-lambda-java-events
lambdaEvents: 'com.amazonaws:aws-lambda-java-events:2.0.2',
snsSDK: 'com.amazonaws:aws-java-sdk-sns:1.11.311',
// https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-emr
emr :'com.amazonaws:aws-java-sdk-emr:1.11.311'
],
//jodaTime: 'joda-time:joda-time:2.7',
//guava : 'com.google.guava:guava:18.0',
jCommander : 'com.beust:jcommander:1.71',
//jackson: 'com.fasterxml.jackson.module:jackson-module-scala_2.11:2.8.8',
jackson: 'com.fasterxml.jackson.core:jackson-databind:2.8.0',
apacheCommons: [
lang3: "org.apache.commons:commons-lang3:3.3.2",
],
spark: [
core: 'org.apache.spark:spark-core_2.11:2.3.0',
hadoopAws: 'org.apache.hadoop:hadoop-aws:2.8.1',
//hadoopClient:'org.apache.hadoop:hadoop-client:2.8.1',
//hadoopCommon:'org.apache.hadoop:hadoop-common:2.8.1',
jackson: 'com.fasterxml.jackson.module:jackson-module-scala_2.11:2.8.8'
],
Exception
2018-04-10 22:14:22.270 | ERROR | | | |c.f.d.p.s.SparkJobEntry-46
Exception found in job for file type : EMAIL
java.nio.file.AccessDeniedException: s3a://my.bucket/0.253592564392344final.txt: getFileStatus on
s3a://my.bucket/0.253592564392344final.txt:
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service:
Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
62622F7F27793DBA; S3 Extended Request ID: BHCZT6BSUP39CdFOLz0uxkJGPH1tPsChYl40a32bYglLImC6PQo+LFtBClnWLWbtArV/z1SOt68=), S3 Extended Request ID: BHCZT6BSUP39CdFOLz0uxkJGPH1tPsChYl40a32bYglLImC6PQo+LFtBClnWLWbtArV/z1SOt68=
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1568) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1436) ~[hadoop-common-2.8.1.jar:na]
at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2040) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) ~[hadoop-mapreduce-client-core-2.6.5.jar:na]
at org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.assertConf(SparkHadoopWriter.scala:283) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:71) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1096) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) ~[spark-core_2.11-2.3.0.jar:2.3.0]

Once you are playing with Hadoop Configuration classes, you need to strip out the spark.hadoop prefix, so just use fs.s3a.access.key, etc.
All the options are defined in the class org.apache.hadoop.fs.s3a.Constants: if you reference them you'll avoid typos too.
One thing to consider is all the source for spark and hadoop is public: there's nothing to stop you taking that stack trace, setting some breakpoints and trying to run this in your IDE. It's what we normally do ourselves when things get bad.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Issue creating AWS DMS task from boto3 script - amazon-web-services

Probably have to use waiter for the task to be ready: ReplicationTaskReady before you can perform other actions on it.

Related

Getting error while testing AWS Lambda function: "Invalid database identifier"

Read/write to AWS S3 from Apache Spark Kubernetes container via vpc endpoint giving 400 Bad Request

Why my 'AWS Lambda Invoke Function' task in Azure DevOps Build Pipeline doesn't fail if the Lambda returns 400?

AWS Batch - Access denied 403

S3A hadoop aws jar always return AccessDeniedException

Categories

Resources