spark s3 access without configuring keys and with only IAM role - amazon-web-services

I have a HDP cluster on AWS and I have one s3(in other account) also, my hadoop version is Hadoop 3.1.1.3.0.1.0-187
Now I want to read from the s3 (which is in different account) and process, then write the result to my s3(same account as cluster).
But as per the HDP guide Here tells, I can configure only one keys of either my account or other account.
But in my case I want to configure two account keys, so How to do do that ?
Due to some security reason, other account can not change the bucket policy to add IAM role which is created in my account , Hence I tried to access like below
Configured the keys of other account
Added IAM role(which has access policy for my bucket) of my account
but Still I got below error when I tried to access my account s3 from spark write
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3

What you need is to use the EC2 instance profile role. It is an IAM role that is attached to your instance: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html
You first create a role with permissions that allow s3 access. Then you attach that role to your HDP cluster(EC2 autoscaling group and EMR can both achieve that).No IAM access key configuration needed on your side, although AWS still does that for you in the background. This is the s3 "outbound" access part.
The 2nd step is to set up the bucket policy to allow cross-account access: https://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html
You will need to do this for each bucket in your different accounts. This is basically the "inbound" s3 access permission part.
You will encounter 400 if any part of your access(i.e., your instance profile role's permission, S3 bucket ACL, bucket policy, public access block setting and etc..) is denied in the permission chain. There are much more layers on the "inbound" side. So to start to get things working, if you are not IAM expert, try to start with a very open policy(use '*' wildcard) and then narrow things down.

If I've understood right
you want your EC2 VMs to access an S3 bucket to which the IAM role doesn't have access
your have a set of AWS login details for the external S3 bucket (login and password)
HDP3 has an default auth chain of, in order
per-bucket secrets. fs.s3a.bucket.NAME.access.key, fs.s3a.bucket.NAME.secret.key
config-wide secrets fs.s3a.access.key, fs.s3a.secret.key
env vars AWS_ACCESS_KEY and AWS_SECRET_KEY
the IAM Role (it does an HTTP GET to the 169.something server which serves up a new set of IAM role credentials at least once an hour)
What you need to try here is set up some per-bucket secrets for only the external source (either in a JCEKS file on all nodes in core-site.xml, or in the spark default. For example, if the external bucket was s3a://external, you'd have
spark.hadoop.fs.s3a.bucket.external.access.key AKAISOMETHING spark.hadoop.fs.s3a.bucket.external.secret.key SECRETSOMETHING
HDP3/Hadoop 3 can handle >1 secret in the same JCEKS file without problems. HADOOP-14507. my code. Older versions let you put username:secret in the URI, but that's such a security troublespot (everything logs those URIs as they aren't viewed as sensistive), that feature has been cut from Hadoop now. Stick to the JCEKs file with a per-bucket secret, falling back to IAM role for your own data
Note you can fiddle with the authentication list for ordering and behaviour: if you add use the TemporaryAWSCredentialsProvider then it'll support session keys as well, which is often handy.
<property>
<name>fs.s3a.aws.credentials.provider</name>
<value>
org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider,
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider,
com.amazonaws.auth.EnvironmentVariableCredentialsProvider,
org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider
</value>
</property>

Related

Accessing S3 bucket data from EC2 instance through IAM

So I have created an IAM user and added a permission to access S3 then I have created an EC2 instance and SSH'ed into the it.
After giving "aws s3 ls" command, the reply was
"Unable to locate credentials. You can configure credentials by running "aws configure".
so what's the difference between giving IAM credentials(Key and Key ID) using "aws configure" and editing the bucket policy to allow s3 access to my instance's public IP.
Even after editing the bucket policy(JSON) to allow S3 access to my instance's public IP why am I not able to access the s3 bucket unless I use "aws configure"(Key and Key ID)?
Please help! Thanks.
Since you are using EC2 you should really use EC2 Instance Profiles instead of running aws configure and hard-coding credentials in the file system.
As for the your question of S3 bucket policies versus IAM roles, here is the official documentation on that. They are two separate tools you would use in securing your AWS account.
As for your specific command that failed, note that the AWS CLI tool will always try to look for credentials by default. If you want it to skip looking for credentials you can pass the --no-sign-request argument.
However, if you were just running aws s3 ls then that was trying to list all the buckets in your account, which you would have to have IAM credentials for. Individual bucket policies would not be taken into account in that scenario.
If you were running aws s3 ls s3://bucketname then that may have worked as aws s3 ls s3://bucketname --no-sign-request.
When you create iam user so there are two parts
policies
roles
Policies are attached to user, like what all services user can pr can't access
roles are attached to application, what all access that application can have
So you have to permit ec2 to access S3
There are two ways for that
aws configure
attach role to ec2 instance
while 1 is tricky and legthy , 2 is easy
Go to ec2-instance-> Actions -> Security -> Modify IAM role -> then select role (ec2+s3 access role)
thats it , you can simply do aws s3 ls from ec2 instance

Get access from account1's ec2 to account2 s3 object using was sdk

I have running app on auto-scaled ec2 env. of account1 created via AWS CDK (it also should have support to be run on multiple regions). During the app execution I need to get object from account2's s3.
One of the ways to get s3 data is use tmp credentials(via sts assume role):
on account1 side create a policy for ec2 instance role to assume sts tmp credentials for s3 object
on account2 side create a policy GetObject access to the s3 object
on account2 site create role and attach point2's policy to it + trust relationship to account1's ec2 role
Pros: no user credentials are required to get access to the data
Cons: after each env update requires manual permission configuration
Another way is to create a user in account2 with permission to get s3 object and put the credentials on account1 side.
Pros: after each env update doesn't require manual permission configuration
Cons: Exposes IAM user's credentials
Is there a better option to eliminate manual permission config and explicit IAM user credentials sharing?
You can add a Bucket Policy on the Amazon S3 bucket in Account 2 that permits access by the IAM Role used by the Amazon EC2 instance in Account 1.
That way, the EC2 instance(s) can access the bucket just like it is in the same Account, without have to assume any roles or users.
Simply set the Principal to be the ARN of the IAM Role used by the EC2 instances.

Put Object to S3 Bucket of another account

We are able to put objects into our S3 Bucket.
But now we have a requirement that we need to put these Object directly to an S3 Bucket which belongs to a different account and different region.
Here we have few questions:
Is this possible?
If possible what changes we need to do for this?
They have provided us Access Key, Secret Key, Region, and Bucket details.
Any comments and suggestions will be appreciated.
IAM credentials are associated with a single AWS Account.
When you launch your own Amazon EC2 instance with an assigned IAM Role, it will receive access credentials that are associated with your account.
To write to another account's Amazon S3 bucket, you have two options:
Option 1: Your credentials + Bucket Policy
The owner of the destination Amazon S3 bucket can add a Bucket Policy on the bucket that permits access by your IAM Role. This way, you can just use the normal credentials available on the EC2 instance.
Option 2: Their credentials
It appears that you have been given access credentials for their account. You can use these credentials to access their Amazon S3 bucket.
As detailed on Working with AWS Credentials - AWS SDK for Java, you can provide these credentials in several ways. However, if you are using BOTH the credentials provided by the IAM Role AND the credentials that have been given to you, it can be difficult to 'switch between' them. (I'm not sure if there is a way to tell the Credentials Provider to switch between a profile stored in the ~/.aws/credentials file and those provided via instance metadata.)
Thus, the easiest way is to specify the Access Key and Secret Key when creating the S3 client:
BasicAWSCredentials awsCreds = new BasicAWSCredentials("access_key_id", "secret_key_id");
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.withCredentials(new AWSStaticCredentialsProvider(awsCreds))
.build();
It is generally not a good idea to put credentials in your code. You should load them from a configuration file.
Yes, it's possible. You need to allow cross account S3 put operation in bucket's policy.
Here is a blog by AWS. It should help you in setting up cross account put action.

error 403 while creating emr cluster using my reducer and mapper?

I am trying to use my bucket to give the arguments for the EMR to create a cluster for it is giving me "All access to this object has been disabled (Service: Amazon S3; Status Code: 403; Error Code: AllAccessDisabled;"
I have used my Reducer and Mapper python files and my bucket's permission is public too
is there something wrong with my mapper and reducer files or am I missing a trick here
Make sure you've assigned your EMR cluster an IAM role that has adequate S3 access permissions. IAM enables you to grant permissions to users, groups, or resources (like your EMR cluster, in this instance) to be able to access other services or resources in AWS (like S3, which is currently giving you an access denied error).
To do this through EMRFS:
Navigate to the EMR console
click Security configurations (on left menu)
Scroll down to IAM roles for EMRFS
Enable Use IAM roles for EMRFS requests to Amazon S3
Add role mapping
Select desired IAM role (Admin)
Select whatever basis for access you prefer (User, group, or S3 bucket name prefix)
Here's a pic of what it looks like in console:
More on this available in the docs here: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-roles.html
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-emrfs-iam-roles.html

Using IAM roles transitively

I have a question on using IAM roles with EC2 and EMR. Here's my current setup:
I have a EC2 machine launched with a particular IAM role (let's call this role 'admin'). My workflow is to upload a file to S3 from this machine and then create an EMR cluster with a particular IAM role (a 'runner' role). The EMR cluster works on the file uploaded to S3 from the admin machine.
Admin is a role with privileges to all APIs in all AWS services. Runner has access to all APIs in EMR, EC2 and S3.
For some reason, the EMR cluster is unable to access the input file loaded in S3. It keeps getting an 'access denied' exception from s3.
I guess writing to s3 from one IAM role and reading it from a different IAM role is what is causing the issue.
Any ideas on what is going wrong here or whether this is even a supported use-case is appreciated.
Thanks!
http://blogs.aws.amazon.com/security/post/TxPOJBY6FE360K/IAM-policies-and-Bucket-Policies-and-ACLs-Oh-My-Controlling-Access-to-S3-Resourc
S3 objects are protected in three ways as seen in the post I linked to.
Your IAM role will need the permission to read S3 objects.
The S3 bucket policy must allow your IAM role access to the object.
The S3 ACL for the specific object must also allow your IAM role access to the object.