I am using AWS Batch with ECS to perform a job which need to send a request to Athena. I use python boto3 to send the query and the get the request status :
start_query_execution : work fine
get_query_execution : have an error !
When I try to get the query execution I have the following error :
{'QueryExecution': {'QueryExecutionId': 'XXXX', 'Query': "SELECT * FROM my_table LIMIT 10 ", 'StatementType': 'DML', 'ResultConfiguration': {'OutputLocation': 's3://my_bucket_name/athena-results/query_id.csv'}, 'QueryExecutionContext': {'Database': 'my_database'}, 'Status': {'State': 'FAILED', 'StateChangeReason': '**Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 4.**. ; S3 Extended Request ID: ....=)'
I have the all permissions to the container role (only to test) :
s3:*
athena : *
glue : *
I face this problem only in container in AWS batch : with the same policy and code in a lambda it's working !
Any help will be appreciated.
In Athena Output location what I have been using Athena bucket name not file name.
As result set will be generated which will have its own id
'ResultConfiguration': {'OutputLocation': 's3://my_bucket_name/athena-results/'}
If ypu are not sure of the bucket for query you can check in query console -->settings
Related
I am trying to read and write data to AWS S3 from Apache Spark Kubernetes Containervia vpc endpoint
The Kubernetes container is on premise (data center) in US region . Following is the Pyspark code to connect to S3:
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
conf = (
SparkConf()
.setAppName("PySpark S3 Example")
.set("spark.hadoop.fs.s3a.endpoint.region", "us-east-1")
.set("spark.hadoop.fs.s3a.endpoint","<vpc-endpoint>")
.set("spark.hadoop.fs.s3a.access.key", "<access_key>")
.set("spark.hadoop.fs.s3a.secret.key", "<secret_key>")
.set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
.set("spark.driver.extraJavaOptions", "-Dcom.amazonaws.services.s3.enforceV4=true")
.set("spark.executor.extraJavaOptions","-Dcom.amazonaws.services.s3.enableV4=true")
.set("spark.executor.extraJavaOptions", "-Dcom.amazonaws.services.s3.enforceV4=true")
.set("spark.fs.s3a.path.style.access", "true")
.set("spark.hadoop.fs.s3a.server-side-encryption-algorithm","SSE-KMS")
.set("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider")
)
spark = SparkSession.builder.config(conf=conf).getOrCreate()
data = [{"key1": "value1", "key2": "value2"}, {"key1":"val1","key2":"val2"}]
df = spark.createDataFrame(data)
df.write.format("json").mode("append").save("s3a://<bucket-name>/test/")
Exception Raised:
py4j.protocol.Py4JJavaError: An error occurred while calling o91.save.
: org.apache.hadoop.fs.s3a.AWSBadRequestException: doesBucketExist on <bucket-name>
: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: <requestID>;
Any help would be appreciated
unless your hadoop s3a client is region aware (3.3.1+), setting that region option won't work. There's an aws sdk option "aws.region which you can set as as a system property instead.
Using the below python boto3 script to create the AWS DMS task and start the replication task, but getting the below error:
Error:
botocore.errorfactory.InvalidResourceStateFault: An error occurred (InvalidResourceStateFault) when calling the StartReplicationTask operation: Replication Task cannot be started, invalid state
Python script:
#!/usr/bin/python
import boto3
client_dms = boto3.client('dms')
#Create a replication DMS task
response = client_dms.create_replication_task(
ReplicationTaskIdentifier='test-new1',
ResourceIdentifier='test-new',
ReplicationInstanceArn='arn:aws:dms:us-east-1:xxxxxxxxxx:rep:test1',
SourceEndpointArn='arn:aws:dms:us-east-1:xxxxxxxxxx:endpoint:source',
TargetEndpointArn='arn:aws:dms:us-east-1:xxxxxxxxxx:endpoint:target',
MigrationType='full-load',
TableMappings='{\n \"TableMappings\": [\n {\n \"Type\": \"Include\",\n \"SourceSchema\": \"test\",\n \"SourceTable\": \"table_name\"\n}\n ]\n}\n\n'
)
#Start the task from DMS
response = client_dms.start_replication_task(
ReplicationTaskArn='arn:aws:dms:us-east-1:xxxxxxxxxx:task:test-new',
StartReplicationTaskType='start-replication'
)
Probably have to use waiter for the task to be ready:
ReplicationTaskReady
before you can perform other actions on it.
I'm trying to get cloudwatch query with boto3, but I'm getting ResourceNotFoundException.
import boto3
if __name__ == "__main__":
client = boto3.client('logs')
response = client.start_query(
logGroupName='/aws/lambda/My-Stack-Name-SE349DJ',
startTime=123,
endTime=123,
queryString="fields #message",
limit=1
)
I attempted to the above code. And an error message is as follows.
botocore.errorfactory.ResourceNotFoundException: An error occurred (ResourceNotFoundException) when calling the StartQuery operation: Log group '/aws/lambda/My-Stack-Name-SE349DJ' does not exist for account ID '11111111' (Service: AWSLogs; Status Code: 400; Error Code: ResourceNotFoundException; Request ID: xxxxx-xxxx-xxx; Proxy: null)
What I tested are as below.
The log group exists. I tested it with Logs Insights on the aws console. Also I tested after paste the log group as it is.
I added a backslash to test if '/' is a problem (ex. '/aws/lambda/My-Stack-Name-SE349DJ') and InvalidParameterException appears.
The aws account has administrate access privileges in the log group.
I got the same error message when I tested with aws cli.
An error occurred (ResourceNotFoundException) when calling the StartQuery operation: Log group 'XXXXXXXXXXXXXX' does not exist for account ID '11111111' (Service: AWSLogs; Status Code: 400; Error Code: ResourceNotFoundException; Request ID: xxxxx-xxxx-xxx; Proxy: null)
How can I solve this problem?
Actually the reason why I'm trying this is because I need to get more than 500,000 data from the filtered log group, but 10,000 are the maximum. I think It's better to pull it out by changing the start time and end time.
There is a high possibility that there are too many data in certain time, so I think it would be better to run it with boto3 rather than directly. Is there an easy way to extract more than 500,000 pieces of data from the console or other methods?
As #Marcin commented, It was because of the region configuration.
I added these lines before creating an aws client.
from botocore.config import Config
...
my_config = Config(
region_name = 'us-east-2',
)
...
client = boto3.client('logs', config=my_config)
As per the doc, I am trying to create a batch job from Java Code.
I am able to create a job from console with same role and lambda arn, but from code, I am getting 400 Bad Request. Also, I don't see any error message as per this doc
Here is my code snippet -
JobOperation jobOperation = new JobOperation().withLambdaInvoke(new LambdaInvokeOperation()
.withFunctionArn("arn:aws:lambda:eu-west-1:<account_id>:function:s3BatchOperarationsPOCLambda"));
JobManifest manifest = new JobManifest()
.withSpec(new JobManifestSpec().withFormat(JobManifestFormat.S3InventoryReport_CSV_20161130)
.withFields(new String[] { "Bucket", "Key" }))
.withLocation(
new JobManifestLocation().withObjectArn("arn:aws:s3:::<bucket_name>/manifest.csv")
.withETag("e55392fa1ad40a08e40b13b3c000a0aa"));
JobReport jobReport = new JobReport().withBucket(reportBucketName).withPrefix("testreport")
.withFormat(JobReportFormat.Report_CSV_20180820).withEnabled(true).withReportScope("AllTasks");
AWSS3Control s3ControlClient = AWSS3ControlClientBuilder.standard().withRegion(Regions.US_WEST_1).build();
String roleArn = "arn:aws:iam::<account_id>:role/S3-Batch-Role";
String accountId = <account_id>;
s3ControlClient.createJob(new CreateJobRequest().withAccountId(accountId).withOperation(jobOperation)
.withManifest(manifest).withPriority(12).withRoleArn(roleArn).withReport(jobReport)
.withClientRequestToken(uuid).withDescription("S3 job").withConfirmationRequired(false));
} catch (AmazonServiceException e) {
// The call was transmitted successfully, but Amazon S3 couldn't process
// it and returned an error response.
e.printStackTrace();
} catch (SdkClientException e) {
System.out.println("test2" + e.getMessage());
// Amazon S3 couldn't be contacted for a response, or the client
// couldn't parse the response from Amazon S3.
e.printStackTrace();
}
Role has full IAM and s3 batch operation permissions, also lambda has access permission for s3.
Trust policy is also defined for batch operations.
Here is my error log -
(Service: AWSS3Control; Status Code: 400; Error Code: 400 Bad Request; Request ID: null; Proxy: null)
com.amazonaws.services.s3control.model.AWSS3ControlException: null (Service: AWSS3Control; Status Code: 400; Error Code: 400 Bad Request; Request ID: null; Proxy: null)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1811)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1395)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1371)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
at com.amazonaws.services.s3control.AWSS3ControlClient.doInvoke(AWSS3ControlClient.java:1532)
at com.amazonaws.services.s3control.AWSS3ControlClient.invoke(AWSS3ControlClient.java:1499)
at com.amazonaws.services.s3control.AWSS3ControlClient.invoke(AWSS3ControlClient.java:1488)
at com.amazonaws.services.s3control.AWSS3ControlClient.executeCreateJob(AWSS3ControlClient.java:265)
at com.amazonaws.services.s3control.AWSS3ControlClient.createJob(AWSS3ControlClient.java:236)
at com.code.platformintegrationsscheduler.handlers.test.createS3Job(test.java:68)
at com.code.platformintegrationsscheduler.handlers.test.main(test.java:27)
I was stuck with the same issue today and after some debugging and trying out the same operation on CLI, I found that
new JobReport().withBucket(reportBucketName)
takes a bucketArn instead of a bucket name.
The actual issue might be different in your case. I suggest you serialize your request from code and try out the same operation in CLI and match both the requests.
AWS Error messages are often not very helpful when we actually need them.
I got the issue, issue was related to the gradle versions, we need to make sure we have all aws services gradle versions to be same.
In my case -
compile group: 'com.amazonaws', name: 'aws-java-sdk-dynamodb', version: '1.11.844'
compile group: 'com.amazonaws', name: 'aws-java-sdk-iam', version: '1.11.844'
compile group: 'com.amazonaws', name: 'aws-java-sdk-events', version: '1.11.844'
compile group: 'com.amazonaws', name: 'aws-java-sdk-s3', version: '1.11.844'
compile group: 'com.amazonaws', name: 'aws-java-sdk-batch', version: '1.11.844'
compile group: 'com.amazonaws', name: 'aws-java-sdk-s3control', version:'1.11.844'
I am shifting my self-hosted parse to Amazon SNS and have followed all instructions on this link properly:
https://aws.amazon.com/blogs/mobile/migrating-from-parse-push-to-amazon-sns/
Now on step 4: Import Parse Push data when i run the java code
java -jar SNSImportTool.jar -s -f <PATH_TO_EXPORTED_PARSE_INSTALLATION_.JSON_FILE>
--apnsName <APNS_PLATFORM_APP_NAME_STEP_3.2.9>
--gcmName <GCM_PLATFORM_APP_NAME_STEP_3.2.9>
--topicName <SNS_TOPIC_NAME_STEP_3.2.10>
--awsaccess <AWS_KEY_ID_STEP_3.3>
--awssecret <SECRET_ACCESS_KEY_STEP_3.3>
i get the following output:
===========================================
Import from Parse to Amazon SNS
===========================================
Verify platform application: app/GCM/appname_MOBILEHUB_111111
Verify platform application: app/APNS/appname_MOBILEHUB_111111
Created 1 channel topics:SNS
Processing token APA91bEApdQ5PFqX77X53LZZevzw9ghwxuhvBwPu5E4Z1og1TTYTLFHv3oF5-AMszqZDfSBWCH4dn_zYK4Emx21j-30k6u0SGQ4KHbAWuuYbPCzSRm-9hvU
Could not create endpoint: Invalid parameter: PlatformApplicationArn Reason: Wrong number of slashes in relative portion of the ARN. (Service: AmazonSNS; Status Code: 400; Error Code: InvalidParameter; Request ID: 12b597de-1a46-5168-b1af-c661430db91f)
Processing token d806a7e6701df9250d8162e47b15ff78b857b285de38113d1b7f504ea92d46b5
Could not create endpoint: PlatformApplication does not exist (Service: AmazonSNS; Status Code: 404; Error Code: NotFound; Request ID: 419d4827-9238-5d46-a584-1998cb74e16e)
to clarify, for items in json that have deviceType as android i get the following error:
Invalid parameter: PlatformApplicationArn Reason: Wrong number of slashes in relative portion of the ARN
and deviceType as ios i get the following error:
Could not create endpoint: PlatformApplication does not exist
my json file looks like this:
{"results":[{ "channels" : ["SNS"], "_id" : "000f0KsszOH1", "_updated_at" : "2017-06-11T00:03:10.712+0000" , "UniqueId" : "79bb118ec2b71c3f", "appIdentifier" : "com.leApp", "parseVersion" : "1.8.0", "timeZone" : "Asia/Riyadh", "installationId" : "f05f3d8a-f045-35fe-a268-855eda3b2c78", "appVersion" : "2.2.8", "deviceType" : "ios", "_created_at" : "2015-12-15T12:49:23.954+0000", "pushType" : "gcm","deviceToken" : "APA99bEApdQ5PFqX77X53LYYevzw4ghwxuhvBwPu4C3Z1og1TTYTLFHv3oF5-AMszqZDfSBWCH4dn_zYK4Emx21j-30k6u0SGQ4KHbAWuuYbPCzSRm-9hvU", "localeIdentifier" : "en-GB" }]}
please advise how to fix these errors.