Could anyone please help me in figure out why do I get below exception? All I'm trying to read some data from local file in my spark program and writing into S3. I have correct secret key and access key specified like this -
Do you think it's related to version mismatch of some library?
SparkConf conf = new SparkConf();
// add more spark related properties
AWSCredentials credentials = DefaultAWSCredentialsProviderChain.getInstance().getCredentials();
conf.set("spark.hadoop.fs.s3a.access.key", credentials.getAWSAccessKeyId());
conf.set("spark.hadoop.fs.s3a.secret.key", credentials.getAWSSecretKey());
The java code is plain vanilla -
protected void process() throws JobException {
JavaRDD<String> linesRDD = _sparkContext.textFile(_jArgs.getFileLocation());
linesRDD.saveAsTextFile("s3a://my.bucket/" + Math.random() + "final.txt");
This is my code and gradle.
Gradle
ext.libs = [
aws: [
lambda: 'com.amazonaws:aws-lambda-java-core:1.2.0',
// The AWS SDK will dynamically import the X-Ray SDK to emit subsegments for downstream calls made by your
// function
//recorderCore: 'com.amazonaws:aws-xray-recorder-sdk-core:1.1.2',
//recorderCoreAwsSdk: 'com.amazonaws:aws-xray-recorder-sdk-aws-sdk:1.1.2',
//recorderCoreAwsSdkInstrumentor: 'com.amazonaws:aws-xray-recorder-sdk-aws-sdk-instrumentor:1.1.2',
// https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk
javaSDK: 'com.amazonaws:aws-java-sdk:1.11.311',
recorderSDK: 'com.amazonaws:aws-java-sdk-dynamodb:1.11.311',
// https://mvnrepository.com/artifact/com.amazonaws/aws-lambda-java-events
lambdaEvents: 'com.amazonaws:aws-lambda-java-events:2.0.2',
snsSDK: 'com.amazonaws:aws-java-sdk-sns:1.11.311',
// https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-emr
emr :'com.amazonaws:aws-java-sdk-emr:1.11.311'
],
//jodaTime: 'joda-time:joda-time:2.7',
//guava : 'com.google.guava:guava:18.0',
jCommander : 'com.beust:jcommander:1.71',
//jackson: 'com.fasterxml.jackson.module:jackson-module-scala_2.11:2.8.8',
jackson: 'com.fasterxml.jackson.core:jackson-databind:2.8.0',
apacheCommons: [
lang3: "org.apache.commons:commons-lang3:3.3.2",
],
spark: [
core: 'org.apache.spark:spark-core_2.11:2.3.0',
hadoopAws: 'org.apache.hadoop:hadoop-aws:2.8.1',
//hadoopClient:'org.apache.hadoop:hadoop-client:2.8.1',
//hadoopCommon:'org.apache.hadoop:hadoop-common:2.8.1',
jackson: 'com.fasterxml.jackson.module:jackson-module-scala_2.11:2.8.8'
],
Exception
2018-04-10 22:14:22.270 | ERROR | | | |c.f.d.p.s.SparkJobEntry-46
Exception found in job for file type : EMAIL
java.nio.file.AccessDeniedException: s3a://my.bucket/0.253592564392344final.txt: getFileStatus on
s3a://my.bucket/0.253592564392344final.txt:
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service:
Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
62622F7F27793DBA; S3 Extended Request ID: BHCZT6BSUP39CdFOLz0uxkJGPH1tPsChYl40a32bYglLImC6PQo+LFtBClnWLWbtArV/z1SOt68=), S3 Extended Request ID: BHCZT6BSUP39CdFOLz0uxkJGPH1tPsChYl40a32bYglLImC6PQo+LFtBClnWLWbtArV/z1SOt68=
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1568) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1436) ~[hadoop-common-2.8.1.jar:na]
at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2040) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) ~[hadoop-mapreduce-client-core-2.6.5.jar:na]
at org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.assertConf(SparkHadoopWriter.scala:283) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:71) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1096) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) ~[spark-core_2.11-2.3.0.jar:2.3.0]
Once you are playing with Hadoop Configuration classes, you need to strip out the spark.hadoop prefix, so just use fs.s3a.access.key, etc.
All the options are defined in the class org.apache.hadoop.fs.s3a.Constants: if you reference them you'll avoid typos too.
One thing to consider is all the source for spark and hadoop is public: there's nothing to stop you taking that stack trace, setting some breakpoints and trying to run this in your IDE. It's what we normally do ourselves when things get bad.
Related
I am trying to read and write data to AWS S3 from Apache Spark Kubernetes Containervia vpc endpoint
The Kubernetes container is on premise (data center) in US region . Following is the Pyspark code to connect to S3:
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
conf = (
SparkConf()
.setAppName("PySpark S3 Example")
.set("spark.hadoop.fs.s3a.endpoint.region", "us-east-1")
.set("spark.hadoop.fs.s3a.endpoint","<vpc-endpoint>")
.set("spark.hadoop.fs.s3a.access.key", "<access_key>")
.set("spark.hadoop.fs.s3a.secret.key", "<secret_key>")
.set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
.set("spark.driver.extraJavaOptions", "-Dcom.amazonaws.services.s3.enforceV4=true")
.set("spark.executor.extraJavaOptions","-Dcom.amazonaws.services.s3.enableV4=true")
.set("spark.executor.extraJavaOptions", "-Dcom.amazonaws.services.s3.enforceV4=true")
.set("spark.fs.s3a.path.style.access", "true")
.set("spark.hadoop.fs.s3a.server-side-encryption-algorithm","SSE-KMS")
.set("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider")
)
spark = SparkSession.builder.config(conf=conf).getOrCreate()
data = [{"key1": "value1", "key2": "value2"}, {"key1":"val1","key2":"val2"}]
df = spark.createDataFrame(data)
df.write.format("json").mode("append").save("s3a://<bucket-name>/test/")
Exception Raised:
py4j.protocol.Py4JJavaError: An error occurred while calling o91.save.
: org.apache.hadoop.fs.s3a.AWSBadRequestException: doesBucketExist on <bucket-name>
: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: <requestID>;
Any help would be appreciated
unless your hadoop s3a client is region aware (3.3.1+), setting that region option won't work. There's an aws sdk option "aws.region which you can set as as a system property instead.
Using the below python boto3 script to create the AWS DMS task and start the replication task, but getting the below error:
Error:
botocore.errorfactory.InvalidResourceStateFault: An error occurred (InvalidResourceStateFault) when calling the StartReplicationTask operation: Replication Task cannot be started, invalid state
Python script:
#!/usr/bin/python
import boto3
client_dms = boto3.client('dms')
#Create a replication DMS task
response = client_dms.create_replication_task(
ReplicationTaskIdentifier='test-new1',
ResourceIdentifier='test-new',
ReplicationInstanceArn='arn:aws:dms:us-east-1:xxxxxxxxxx:rep:test1',
SourceEndpointArn='arn:aws:dms:us-east-1:xxxxxxxxxx:endpoint:source',
TargetEndpointArn='arn:aws:dms:us-east-1:xxxxxxxxxx:endpoint:target',
MigrationType='full-load',
TableMappings='{\n \"TableMappings\": [\n {\n \"Type\": \"Include\",\n \"SourceSchema\": \"test\",\n \"SourceTable\": \"table_name\"\n}\n ]\n}\n\n'
)
#Start the task from DMS
response = client_dms.start_replication_task(
ReplicationTaskArn='arn:aws:dms:us-east-1:xxxxxxxxxx:task:test-new',
StartReplicationTaskType='start-replication'
)
Probably have to use waiter for the task to be ready:
ReplicationTaskReady
before you can perform other actions on it.
As per the doc, I am trying to create a batch job from Java Code.
I am able to create a job from console with same role and lambda arn, but from code, I am getting 400 Bad Request. Also, I don't see any error message as per this doc
Here is my code snippet -
JobOperation jobOperation = new JobOperation().withLambdaInvoke(new LambdaInvokeOperation()
.withFunctionArn("arn:aws:lambda:eu-west-1:<account_id>:function:s3BatchOperarationsPOCLambda"));
JobManifest manifest = new JobManifest()
.withSpec(new JobManifestSpec().withFormat(JobManifestFormat.S3InventoryReport_CSV_20161130)
.withFields(new String[] { "Bucket", "Key" }))
.withLocation(
new JobManifestLocation().withObjectArn("arn:aws:s3:::<bucket_name>/manifest.csv")
.withETag("e55392fa1ad40a08e40b13b3c000a0aa"));
JobReport jobReport = new JobReport().withBucket(reportBucketName).withPrefix("testreport")
.withFormat(JobReportFormat.Report_CSV_20180820).withEnabled(true).withReportScope("AllTasks");
AWSS3Control s3ControlClient = AWSS3ControlClientBuilder.standard().withRegion(Regions.US_WEST_1).build();
String roleArn = "arn:aws:iam::<account_id>:role/S3-Batch-Role";
String accountId = <account_id>;
s3ControlClient.createJob(new CreateJobRequest().withAccountId(accountId).withOperation(jobOperation)
.withManifest(manifest).withPriority(12).withRoleArn(roleArn).withReport(jobReport)
.withClientRequestToken(uuid).withDescription("S3 job").withConfirmationRequired(false));
} catch (AmazonServiceException e) {
// The call was transmitted successfully, but Amazon S3 couldn't process
// it and returned an error response.
e.printStackTrace();
} catch (SdkClientException e) {
System.out.println("test2" + e.getMessage());
// Amazon S3 couldn't be contacted for a response, or the client
// couldn't parse the response from Amazon S3.
e.printStackTrace();
}
Role has full IAM and s3 batch operation permissions, also lambda has access permission for s3.
Trust policy is also defined for batch operations.
Here is my error log -
(Service: AWSS3Control; Status Code: 400; Error Code: 400 Bad Request; Request ID: null; Proxy: null)
com.amazonaws.services.s3control.model.AWSS3ControlException: null (Service: AWSS3Control; Status Code: 400; Error Code: 400 Bad Request; Request ID: null; Proxy: null)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1811)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1395)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1371)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
at com.amazonaws.services.s3control.AWSS3ControlClient.doInvoke(AWSS3ControlClient.java:1532)
at com.amazonaws.services.s3control.AWSS3ControlClient.invoke(AWSS3ControlClient.java:1499)
at com.amazonaws.services.s3control.AWSS3ControlClient.invoke(AWSS3ControlClient.java:1488)
at com.amazonaws.services.s3control.AWSS3ControlClient.executeCreateJob(AWSS3ControlClient.java:265)
at com.amazonaws.services.s3control.AWSS3ControlClient.createJob(AWSS3ControlClient.java:236)
at com.code.platformintegrationsscheduler.handlers.test.createS3Job(test.java:68)
at com.code.platformintegrationsscheduler.handlers.test.main(test.java:27)
I was stuck with the same issue today and after some debugging and trying out the same operation on CLI, I found that
new JobReport().withBucket(reportBucketName)
takes a bucketArn instead of a bucket name.
The actual issue might be different in your case. I suggest you serialize your request from code and try out the same operation in CLI and match both the requests.
AWS Error messages are often not very helpful when we actually need them.
I got the issue, issue was related to the gradle versions, we need to make sure we have all aws services gradle versions to be same.
In my case -
compile group: 'com.amazonaws', name: 'aws-java-sdk-dynamodb', version: '1.11.844'
compile group: 'com.amazonaws', name: 'aws-java-sdk-iam', version: '1.11.844'
compile group: 'com.amazonaws', name: 'aws-java-sdk-events', version: '1.11.844'
compile group: 'com.amazonaws', name: 'aws-java-sdk-s3', version: '1.11.844'
compile group: 'com.amazonaws', name: 'aws-java-sdk-batch', version: '1.11.844'
compile group: 'com.amazonaws', name: 'aws-java-sdk-s3control', version:'1.11.844'
I want to transfer data from GCS to BigQuery by embulk and digdag.
But error occurs.
com.google.api.client.googleapis.json.GoogleJsonResponseException: 401 Unauthorized
.......
Error: org.embulk.config.ConfigException: com.google.cloud.storage.StorageException: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket.
↓ Details
command :
embulk run XXXX.yaml
XXXX.yaml :
in:
type: gcs
bucket: <bucket name>
path_prefix: <file path>
auth_method: compute_engine
parser:
type: poi_excel
sheets: <sheet name>
skip_header_lines: 4
columns:
- {name: 'name', type: string}
.
.
.
out:
type: bigquery
mode: replace
project: <project name>
dataset: <dataset name>
table: <table name>
auth_method: compute_engine
schema_file: <file name of json type>
gcs_bucket: <gcs tmp bucket name>
output :
$ embulk run target_item_bottoms_config.yaml
2020-07-22 14:27:36.559 +0900: Embulk v0.9.23
2020-07-22 14:27:37.609 +0900 [WARN] (main): DEPRECATION: JRuby org.jruby.embed.ScriptingContainer is directly injected.
2020-07-22 14:27:40.577 +0900 [INFO] (main): Gem's home and path are set by default: "/Users/oniki/.embulk/lib/gems"
2020-07-22 14:27:41.662 +0900 [INFO] (main): Started Embulk v0.9.23
2020-07-22 14:27:41.853 +0900 [INFO] (0001:transaction): Loaded plugin embulk-input-gcs (0.3.2)
2020-07-22 14:27:46.263 +0900 [INFO] (0001:transaction): Loaded plugin embulk-output-bigquery (0.6.4)
2020-07-22 14:27:46.369 +0900 [INFO] (0001:transaction): Loaded plugin embulk-parser-poi_excel (0.1.7)
org.embulk.exec.PartialExecutionException: org.embulk.config.ConfigException: com.google.cloud.storage.StorageException: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket.
at org.embulk.exec.BulkLoader$LoaderState.buildPartialExecuteException(BulkLoader.java:340)
at org.embulk.exec.BulkLoader.doRun(BulkLoader.java:566)
at org.embulk.exec.BulkLoader.access$000(BulkLoader.java:35)
at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:353)
at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:350)
at org.embulk.spi.Exec.doWith(Exec.java:22)
at org.embulk.exec.BulkLoader.run(BulkLoader.java:350)
at org.embulk.EmbulkEmbed.run(EmbulkEmbed.java:242)
at org.embulk.EmbulkRunner.runInternal(EmbulkRunner.java:291)
at org.embulk.EmbulkRunner.run(EmbulkRunner.java:155)
at org.embulk.cli.EmbulkRun.runSubcommand(EmbulkRun.java:431)
at org.embulk.cli.EmbulkRun.run(EmbulkRun.java:90)
at org.embulk.cli.Main.main(Main.java:64)
Suppressed: java.lang.NullPointerException
at org.embulk.exec.BulkLoader.doCleanup(BulkLoader.java:463)
at org.embulk.exec.BulkLoader$3.run(BulkLoader.java:397)
at org.embulk.exec.BulkLoader$3.run(BulkLoader.java:394)
at org.embulk.spi.Exec.doWith(Exec.java:22)
at org.embulk.exec.BulkLoader.cleanup(BulkLoader.java:394)
at org.embulk.EmbulkEmbed.run(EmbulkEmbed.java:245)
... 5 more
Caused by: org.embulk.config.ConfigException: com.google.cloud.storage.StorageException: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket.
at org.embulk.input.gcs.AuthUtils.newClient(AuthUtils.java:81)
at org.embulk.input.gcs.GcsFileInput.listFiles(GcsFileInput.java:49)
at org.embulk.input.gcs.GcsFileInputPlugin.transaction(GcsFileInputPlugin.java:59)
at org.embulk.spi.FileInputRunner.transaction(FileInputRunner.java:62)
at org.embulk.exec.BulkLoader.doRun(BulkLoader.java:507)
... 11 more
Caused by: com.google.cloud.storage.StorageException: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket.
at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:226)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.list(HttpStorageRpc.java:366)
at com.google.cloud.storage.StorageImpl$8.call(StorageImpl.java:338)
at com.google.cloud.storage.StorageImpl$8.call(StorageImpl.java:335)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
at com.google.cloud.storage.StorageImpl.listBlobs(StorageImpl.java:334)
at com.google.cloud.storage.StorageImpl.list(StorageImpl.java:290)
at org.embulk.input.gcs.AuthUtils.newClient(AuthUtils.java:77)
... 15 more
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 401 Unauthorized
{
"code" : 401,
"errors" : [ {
"domain" : "global",
"location" : "Authorization",
"locationType" : "header",
"message" : "Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket.",
"reason" : "required"
} ],
"message" : "Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket."
}
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:401)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1097)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:499)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.list(HttpStorageRpc.java:356)
... 23 more
Error: org.embulk.config.ConfigException: com.google.cloud.storage.StorageException: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket.
my environment :
$ gcloud config list
[compute]
region = us-east1
zone = us-east1-c
[core]
account = myname#xxx.com
disable_usage_reporting = False
project = <project ID>
Your active configuration is: [default]
$ gcloud auth list
Credentialed Accounts
ACTIVE ACCOUNT
* myname#xxxx.com
To set the active account, run:
$ gcloud config set account `ACCOUNT`
$ gsutil ls
gs://<bucket name>
my gcp IAM role :
owner
I understand that the solution to this error is authorization.
But my preferences seem to be fine.
what's wrong?
As the documentation [1], if we have 401- Unauthorized error then there could be many reasons, please have a related list of reasons listed below [followed the link 1], which could be helpful for troubleshooting:
Reason:AuthenticationRequiredRequesterPays
Access to a Requester Pays bucket requires authentication.
Reason: authError
This error indicates a problem with the authorization provided in the request to Cloud Storage. The following are some situations where that will occur:
The OAuth access token has expired and needs to be refreshed. This can be avoided by refreshing the access token early, but code can also catch this error, refresh the token and retry automatically.
Multiple non-matching authorizations were provided; choose one mode only.
The OAuth access token's bound project does not match the project associated with the provided developer key.
The Authorization header was of an unrecognized format or uses an unsupported credential type.
reason:lockedDomainExpired
When downloading content from a cookie-authenticated site, e.g., using the Storage Browser, the response will redirect to a temporary domain. This error will occur if access to said domain occurs after the domain expires. Issue the original request again, and receive a new redirect.
Reason: push.webhookUrlUnauthorized
Requests to storage.objects.watchAll will fail unless you verify you own the domain.
Reason: required
Access to a non-public method that requires authorization was made, but none was provided in the Authorization header or through other means.
[1] https://cloud.google.com/storage/docs/json_api/v1/status-codes#401_Unauthorized
I try locally , and create Service Account Key and save at local .
◾️XXXX.yaml
before
auth_method: compute_engine
after
auth_method: json_key
json_keyfile: /path/to/json_keyfile.json
I am using AWS Batch with ECS to perform a job which need to send a request to Athena. I use python boto3 to send the query and the get the request status :
start_query_execution : work fine
get_query_execution : have an error !
When I try to get the query execution I have the following error :
{'QueryExecution': {'QueryExecutionId': 'XXXX', 'Query': "SELECT * FROM my_table LIMIT 10 ", 'StatementType': 'DML', 'ResultConfiguration': {'OutputLocation': 's3://my_bucket_name/athena-results/query_id.csv'}, 'QueryExecutionContext': {'Database': 'my_database'}, 'Status': {'State': 'FAILED', 'StateChangeReason': '**Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 4.**. ; S3 Extended Request ID: ....=)'
I have the all permissions to the container role (only to test) :
s3:*
athena : *
glue : *
I face this problem only in container in AWS batch : with the same policy and code in a lambda it's working !
Any help will be appreciated.
In Athena Output location what I have been using Athena bucket name not file name.
As result set will be generated which will have its own id
'ResultConfiguration': {'OutputLocation': 's3://my_bucket_name/athena-results/'}
If ypu are not sure of the bucket for query you can check in query console -->settings