AWS Glue Job getting Access Denied when writing to S3

AWS Glue Job getting Access Denied when writing to S3 - amazon-web-services

I have a Glue ETL job, created by CloudFormation. This job extracts data from RDS Aurora and write to S3.
When I run this job, I get the error below.
The job has an IAM service role.
This service role allows
Glue and RDS service,
assume arn:aws:iam::aws:policy/AmazonS3FullAccess and arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole, and
has full range of rds:* , kms:* , and s3:* actions allow to the corresponding RDS, KMS, and S3 resources.
I have the same error whether the S3 bucket is encrypted with either AES256 or aws:kms.
I get the same error whether the job has a Security Configuration or not.
I have a job doing the exactly same thing that I created manually and can run successfully without a Security Configuration.
What am I missing? Here's the full error log
"/mnt/yarn/usercache/root/appcache/application_1...5_0002/container_15...45_0002_01_000001/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o145.pyWriteDynamicFrame.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 2.0 failed 4 times, most recent failure: Lost task 3.3 in stage 2.0 (TID 30, ip-10-....us-west-2.compute.internal, executor 1): com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: F...49), S3 Extended Request ID: eo...wXZw=
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588

Unfortunately the error doesn't tell us much except that it's failing during the write of your DynamicFrame.
There is only a handful of possible reasons for the 403, you can check if you have met them all:
Bucket Policy rules on the destination bucket.
The IAM Role needs permissions (although you mention having S3*)
If this is cross-account, then there is more to check with regards things like to allow-policies on the bucket and user. (In general a Trust for the Canonical Account ID is simplest)
I don't know how complicated your policy documents might be for the Role and Bucket, but remember that an explicit Deny statement takes precedence over an allow.
If the issue is KMS related, I would check to ensure your Subnet you select for the Glue Connection has a route to reach the KMS endpoints (You can add an Endpoint for KMS in VPC)
Make sure issue is not with the Temporary Directory that is also configured for your job or perhaps write-operations that are not your final.
Check that your account is the "object owner" of the location you are writing to (normally an issue when read/writing data between accounts)
If none of the above works, you can shed some more light with regards to your setup. Perhaps the code for write-operation.

In addition to Lydon's answer, error 403 is also received if your Data Source location is the same as the Data Target; defined when creating a Job in Glue. Change either of these if they are identical and the issue will be resolved.

You should add a Security configurations(mentioned under Secuity tab on Glue Console). providing S3 Encryption mode either SSE-KMS or SSE-S3.
Security Configuration
Now select the above security configuration while creating your job under Advance Properties.
Duly verify you IAM role & S3 bucket policy.
It will work

How are you providing permission for PassRole to glue role?
{
"Sid": "AllowAccessToRoleOnly",
"Effect": "Allow",
"Action": [
"iam:PassRole",
"iam:GetRole",
"iam:GetRolePolicy",
"iam:ListRolePolicies",
"iam:ListAttachedRolePolicies"
],
"Resource": "arn:aws:iam::*:role/<role>"
}
Usually we create roles using <project>-<role>-<env> e.g. xyz-glue-dev where project name is xyz and env is dev. In that case we use "Resource": "arn:aws:iam::*:role/xyz-*-dev"

For me it was two things.
Access policy for a bucket should be given correctly - bucket/*, here I was missing the * part
Endpoint in VPC must be created for glue to access S3 https://docs.aws.amazon.com/glue/latest/dg/vpc-endpoints-s3.html
After these two settings, my glue job ran successfully. Hope this helps.

Make sure you have given the right policies.
I was facing the same issue, thought I had the role configured well.
But after I erased the role and followed this step, it worked ;]

Related

google storage transfer service account does not exist in new project

I am trying to create resources using Terraform in a new GCP project. As part of that I want to set roles/storage.legacyBucketWriter to the Google managed service account which runs storage transfer service jobs (the pattern is project-[project-number]#storage-transfer-service.iam.gserviceaccount.com) for a specific bucket. I am using the following config:
resource "google_storage_bucket_iam_binding" "publisher_bucket_binding" {
bucket = "${google_storage_bucket.bucket.name}"
members = ["serviceAccount:project-${var.project_number}#storage-transfer-service.iam.gserviceaccount.com"]
role = "roles/storage.legacyBucketWriter"
}
to clarify, I want to do this so that when I create one off transfer jobs using the JSON APIs, it doesn't fail prerequisite checks.
When I run Terraform apply, I get the following:
Error applying IAM policy for Storage Bucket "bucket":
Error setting IAM policy for Storage Bucket "bucket": googleapi:
Error 400: Invalid argument, invalid
I think this is because the service account in question does not exist yet as I can not do this via the console either.
Is there any other service that I need to enable for the service account to be created?

it seems I am able to create/find the service account once I run this:
https://cloud.google.com/storage/transfer/reference/rest/v1/googleServiceAccounts/get
for my project to get the email address.
not sure if this is the best way but it works..

Soroosh's reply is accurate, after querying the API as per this DOC: https://cloud.google.com/storage-transfer/docs/reference/rest/v1/googleServiceAccounts/ will enable the service account and terraform will run, but now you have to create an api call in terraform for that to work, ain't nobody got time for that.

Why is my Elastic Beanstalk app denied PutItem access to my DynamoDB, despite its role?

The goal
I want to programmatically add an item to a table in my DynamoDB from my Elastic Beanstalk application, using code similar to:
Item item = new Item()
.withPrimaryKey(UserIdAttributeName, userId)
.withString(UserNameAttributeName, userName);
table.putItem(item);
The unexpected result
Logs show the following error message, with the [bold parts] being my edits:
User: arn:aws:sts::[iam id?]:assumed-role/aws-elasticbeanstalk-ec2-role/i-[some number] is not authorized to perform: dynamodb:PutItem on resource: arn:aws:dynamodb:us-west-2:[iam id?]:table/PiggyBanks (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: AccessDeniedException; Request ID: [the request id])
I am able to get the table just fine, but things go awry when PutItem is called.
The configuration
I created a new Elastic Beanstalk application. According to the documentation, this automatically assigns the application a new role, called:
aws-elasticbeanstalk-service-role
That same documentation indicates that I can add access to my database as follows:
Add permissions for additional services to the default service role in the IAM console.
So, I found the aws-elasticbeanstalk-service-role role and added to it the managed policy, AmazonDynamoDBFullAccess. This policy looks like the following, with additional actions removed for brevity:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"dynamodb:*",
[removed for brevity]
"lambda:DeleteFunction"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
This certainly looks like it should grant the access I need. And, indeed, the policy simulator verifies this. With the following parameters, the action is allowed:
Role: aws-elasticbeanstalk-service-role
Service: DynamoDB
Action: PutItem
Simulation Resource: [Pulled from the above log] arn:aws:dynamodb:us-west-2:[iam id?]:table/PiggyBanks
Update
In answer to the good question by filipebarretto, I instantiate the DynamoDB object as follows:
private static DynamoDB createDynamoDB() {
AmazonDynamoDBClient client = new AmazonDynamoDBClient();
client.setRegion(Region.getRegion(Regions.US_WEST_2));
DynamoDB result = new DynamoDB(client);
return result;
}
According to this documentation, this should be the way to go about it, because it is using the default credentials provider chain and, in turn, the instance profile credentials,
which exist within the instance metadata associated with the IAM role
for the EC2 instance.
[This option] in the default provider chain is available only when
running your application on an EC2 instance, but provides the greatest
ease of use and best security when working with EC2 instances.
Other things I tried
This related Stack Overflow question had an answer that indicated region might be the issue. I've tried tweaking the region with no additional success.
I have tried forcing the usage of the correct credentials using the following:
AmazonDynamoDBClient client = new AmazonDynamoDBClient(new InstanceProfileCredentialsProvider());
I have also tried creating an entirely new environment from within Elastic Beanstalk.
In conclusion
By the error in the log, it certainly looks like my Elastic Beanstalk application is assuming the correct role.
And, by the results of the policy simulator, it looks like the role should have permission to do exactly what I want to do.
So...please help!
Thank you!

Update the aws-elasticbeanstalk-ec2-role role, instead of the aws-elasticbeanstalk-service-role.
This salient documentation contains the key:
When you create an environment, AWS Elastic Beanstalk prompts you to provide two AWS Identity and Access Management (IAM) roles, a service role and an instance profile. The service role is assumed by Elastic Beanstalk to use other AWS services on your behalf. The instance profile is applied to the instances in your environment and allows them to upload logs to Amazon S3 and perform other tasks that vary depending on the environment type and platform.
In other words, one of these roles (-service-role) is used by the Beanstalk service itself, while the other (-ec2-role) is applied to the actual instance.
It's the latter that pertains to any permissions you need from within your application code.

To load your credentials, try:
InstanceProfileCredentialsProvider mInstanceProfileCredentialsProvider = new InstanceProfileCredentialsProvider();
AWSCredentials credentials = mInstanceProfileCredentialsProvider.getCredentials();
AmazonDynamoDBClient client = new AmazonDynamoDBClient(credentials);
or
AmazonDynamoDBClient client = new AmazonDynamoDBClient(new DefaultAWSCredentialsProviderChain());

aws elasticbeanstalk: cannot deploy to worker environment via eb cli

I've created a worker environment for my eb application in order to take advantage of its "periodic tasks" capabilities using cron.yaml (located in the root of my application). It's a simple sinatra app (for now) that I would like to use to use to issue requests to my corresponding web server environment.
However, I'm having trouble deploying via the eb cli. Below is what happens I run eb deploy.
╰─➤ eb deploy
Creating application version archive "4882".
Uploading myapp/4882.zip to S3. This may take a while.
Upload Complete.
INFO: Environment update is starting.
ERROR: Service:AmazonCloudFormation, Message:Stack named 'awseb-e-1a2b3c4d5e-stack'
aborted operation. Current state: 'UPDATE_ROLLBACK_IN_PROGRESS'
Reason: The following resource(s) failed to create: [AWSEBWorkerCronLeaderRegistry].
I've looked around the CloudFormation dashboard to see to check for possible errors. After reading a bit of about what I could find regarding AWSEBWorkerCronLeaderRegistry, I found it that it's most likely a DynamoDB table that gets updated/created. However, when I look in the DynamoDB dashboard, there are no tables listed.
As always, any help, feedback, or guidance is appreciated.

If you are reluctant to add full DynamoDB access (like I was), Beanstalk now provides a Managed Policy for Worker environment permissions (AWSElasticBeanstalkWorkerTier). You can try adding one of these to your instance profile role instead.
See http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/iam-instanceprofile.html

We had the same issue and fixed it by attaching AmazonDynamoDBFullAccess to Elastic Beanstalk role (which was named aws-elasticbeanstalk-ec2-role in our case).

I was using Codepipeline to deploy my worker and was getting the same error. Eventually I tried giving AWS-CodePipeline-Service the AmazonDynamoDBFullAccess policy and that seemed to resolve the issue.

As Anthony suggested, when triggering the deploy from other services such as CodePileline, its service role needs the dynamodb:CreateTable permission to create the Leader Registry table (more info below) in DynamoDB.
Adding Full Access permission is a bad practice and should be avoided. Also, the managed policy AWSElasticBeanstalkWorkerTier does not have the appropriate permissions since it is for the worker to access DynamoDB and check if they are the current leader.
1. Find the Role that is trying to create the table:
Go to CloudTrail > Event History
Filter Event Name: CreateTable
Make sure the error code is AccessDenied
Locate the role name (i.e. AWSCodePipelineServiceRole-us-east-1-dev):
2. Add the permissions:
Go to IAM > Roles
Find the role in the list
Attach a policy with:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CreateCronLeaderTable",
"Effect": "Allow",
"Action": "dynamodb:CreateTable",
"Resource": "arn:aws:dynamodb:*:*:table/*-stack-AWSEBWorkerCronLeaderRegistry*"
}
]
}
3. Check results:
Redeploy by triggering the pipeline
Check Elasticbeanstalk for errors
Optionally go to CloudTrail and make sure the request succeded this time.
You may use this technique any time you are not sure of what permission should be attached to what.
About the Cron Leader Table
From the Periodic Tasks Documentation:
Elastic Beanstalk uses leader election to determine which instance in your worker environment queues the periodic task. Each instance attempts to become leader by writing to an Amazon DynamoDB table. The first instance that succeeds is the leader, and must continue to write to the table to maintain leader status. If the leader goes out of service, another instance quickly takes its place.
For those wondering, this DynamoDB table uses 10 RCU and 5 WCU which covered by the always free tier.

How can I limit EC2 describe images permissions?

I'm trying to constrain the images which a specific IAM group can describe. If I have the following policy for my group, users in the group can describe any EC2 image:
{
"Effect": "Allow",
"Action": ["ec2:DescribeImages"],
"Resource": ["*"]
}
I'd like to only allow the group to describe a single image, but when I try setting "Resource": ["arn:aws:ec2:eu-west-1::image/ami-c37474b7"], I get exceptions when trying to describe the image as a member of the group:
AmazonServiceException Status Code: 403,
AWS Service: AmazonEC2,
AWS Request ID: 911a5ed9-37d1-4324-8493-84fba97bf9b6,
AWS Error Code: UnauthorizedOperation,
AWS Error Message: You are not authorized to perform this operation.
I got the ARN format for EC2 images from IAM Policies for EC2, but perhaps something is wrong with my ARN? I have verified that the describe image request works just fine when my resource value is "*".

Unfortunately the error message is misleading, the problem is that Resource-Level Permissions for EC2 and RDS Resources aren't yet available for all API actions, see this note from Amazon Resource Names for Amazon EC2:
Important
Currently, not all API actions support individual ARNs; we'll add support for additional API actions and ARNs for additional Amazon EC2 resources later. For information about which ARNs you can
use with which Amazon EC2 API actions, as well as supported condition
keys for each ARN, see Supported Resources and Conditions for Amazon
EC2 API Actions.
In particular, all ec2:Describe* actions are absent still from Supported Resources and Conditions for Amazon EC2 API Actions at the time of this writing, which implies that you cannot use anything but "Resource": ["*"] for ec2:DescribeImages.
The referenced page on Granting IAM Users Required Permissions for Amazon EC2 Resources also mentions that AWS will add support for additional actions, ARNs, and condition keys in 2014 - they have indeed regularly expanded resource level permission coverage over the last year or so already, but so far only for actions which create or modify resources, but not any which require read access only, something many users desire and expect for obvious reasons, including myself.

Static content for AWS EC2 with IAM role

Reading through the / many / resources on how to utilize temporary AWS credentials in a launched EC2 instance, I can't seem to get an extremely simple POC running.
Desired:
Launch an EC2 instance
SSH in
Pull a piece of static content from a private S3 bucket
Steps:
Create an IAM role
Spin up a new EC2 instance with the above IAM role specified; SSH in
Set the credentials using aws configure and the details that (successfully) populated in http://169.254.169.254/latest/meta-data/iam/security-credentials/iam-role-name
Attempt to use the AWS CLI directly to access the file
IAM role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::bucket-name/file.png"
}
]
}
When I use the AWS CLI to access the file, this error is thrown:
A client error (Forbidden) occurred when calling the HeadObject operation: Forbidden
Completed 1 part(s) with ... file(s) remaining
Which step did I miss?

For future reference, the issue was in how I was calling the AWS CLI; previously I was running:
aws configure
...and supplying the details found in the auto-generated role profile.
Once I simply allowed it to find its own temporary credentials and just specified the only other required parameter manually (region):
aws s3 cp s3://bucket-name/file.png file.png --region us-east-1
...the file pulled fine. Hopefully this'll help out someone in the future!

Hope this might help some other Googler that lands here.
The
A client error (403) occurred when calling the HeadObject operation: Forbidden
error can also be caused if your system clock is too far off. I was 12 hours in the past and got this error. Set the clock to within a minute of the true time, and error went away.

According to Granting Access to a Single S3 Bucket Using Amazon IAM, the IAM policy may need to be applied to two resources:
The bucket proper (e.g. "arn:aws:s3:::4ormat-knowledge-base")
All the objects inside the bucket (e.g. "arn:aws:s3:::4ormat-knowledge-base/*")
Yet another tripwire. Damn!

I just got this error because I had an old version of awscli:
Broken:
$ aws --version
aws-cli/1.2.9 Python/3.4.0 Linux/3.13.0-36-generic
Works:
$ aws --version
aws-cli/1.5.4 Python/3.4.0 Linux/3.13.0-36-generic

You also get this error if the key doesn't exist in the bucket.
Double-check the key -- I had a script that was adding an extra slash at the beginning of the key when it POSTed items into the bucket. So this:
aws s3 cp --region us-east-1 s3://bucketname/path/to/file /tmp/filename
failed with "A client error (Forbidden) occurred when calling the HeadObject operation: Forbidden."
But this:
aws s3 cp --region us-east-1 s3://bucketname//path/to/file /tmp/filename
worked just fine. Not a permissions issue at all, just boneheaded key creation.

I had this error because I didn't attach a policy to my IAM user.

tl;dr: wild card file globbing worked better in s3cmd for me.
As cool as aws-cli is --for my one-time S3 file manipulation issue that didn't immediately work as I would hope and thought it might-- I ended up installing and using s3cmd.
Whatever syntax and behind the scenes work I conceptually imagined, s3cmd was more intuitive and accomodating to my baked in preconceptions.
Maybe it isn't the answer you came here for, but it worked for me.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js