Copy data from DynamoDB into Redshift across two different AWS accounts? - amazon-web-services

For reasons beyond my control, I have the following:
A table CustomerPhoneNumber in DynamoDB under one AWS account.
A Redshift cluster under a different AWS account (same geographic region; EU)
Is there any way to run the COPY command to move data from Dynamo into Redshift across accounts?
Typically if they were under the same account, it would be done via IAM role pretty easily:
copy public.my_table (col1, col2, col3) from 'dynamodb://CustomerPhoneNumber' iam_role 'arn:aws:iam::XXXXXXXXXXX:role/RandomRoleName' readratio 40;
But obviously this doesn't work in my case.
Any ideas?

Answer above by John is not applicable any more; This is how you can do it-
AWS account with required resource like dynamodb in this case(trustING account) need to have account requiring access(trusTED AWS account)... as trusted in their Dynamo db read only role: arn:aws:iam:::role/
Create a policy which does sts:AssumeRole (above trustING account's role arn as the resource),
attach that policy to redshift-access-role(which has all privileges required to run the copy command).
Run the command as:
iam_role 'arn:aws:iam::<trusTEDawsAccountId>:role/redshift_access_role,arn:aws:iam::<trusTINGawsAccountId>:role/<dynamodbreadrole>'
readratio 50
Details in:
https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html

You can use CREDENTIALS and specify the access key and secret key for the other account. Add the following to your COPY statement:
credentials 'aws_access_key_id=AKIAXXXXX;aws_secret_access_key=yyyyyy'
You cannot use cross account roles with Redshift. To quote Amazon documentation:
An IAM role can be associated with an Amazon Redshift cluster only if
both the IAM role and the cluster are owned by the same AWS account.
Authorizing COPY and UNLOAD Operations Using IAM Roles

apparently stackoverflow need formatting, the code is:
copy redshift_tbl from 'dynamodb://dynamotbl'
iam_role 'arn:aws:iam::<TRUSTEDacAWSid>:role/redshift_access_role,arn:aws:iam::<trusTINGacAWSid>:role/<dynamodb-role-in-trustingac>'
readratio 50
*Note: no space between commas in roles

Related

AWS IAM roles Trust Relationships

I want to create fully automated creating of new roles in AWS and connecting this with Snowflake. To connect Snowflake with AWS we must edit trust relationships and paste their STORAGE_AWS_EXTERNAL_ID.
Is there any way to do this fully automated?
How about creating a batch script using AWS CLI et SNOW SQL following the steps provided in the Snowflake user guide.
Create your AWS IAM policy and get your policy's arn
Create an AWS IAM role linked to this policy and get the role's arn
Create the snowflake storage integration linked to this role and get STORAGE_AWS_IAM_USER_ARN and STORAGE_AWS_EXTERNAL_ID from DESC INTEGRATION command.
Update the AWS IAM policy with previous values (i.e snowflake's user arn and external id).

Amazon Athena Cross Account Access

Can I create a database and table in Athena service within my account to access S3 data in another account?
I went over the below link and I assume as per this documentation both Amazon Athena and S3 bucket have to be in the same account and access is provided to the user in another account.
https://console.aws.amazon.com/athena/home?force&region=us-east-1#query
From Access Control Policies - Amazon Athena:
To run queries in Athena, you must have the appropriate permissions for:
The Athena actions.
The Amazon S3 locations where the underlying data is stored that you are going to query in Athena.
...
So, it seems that the IAM User who is executing the Athena query requires access to the Amazon S3 location.
This could be done by adding a Bucket Policy to the S3 bucket in the other account that permits the IAM User access to the bucket.
To explain better:
Account-A with IAM-User-A and AWS Athena
Account-B with Bucket-B that has a Bucket Policy granting access to IAM-User-A
This answer deals with the additional information that:
A Lambda function in Account-A must be able to create a table in Amazon Athena in Account-B
I haven't tested it, but I think you will require:
Role-A in Account-A for the Lambda function that:
Permits AssumeRole on Role-B
Role-B in Account-B that:
Permits access to Amazon Athena and the source bucket in Amazon S3
Trusts Role-A
The Lambda function will run with Role-A. It will then use credentials from Role-A to call AssumeRole on Role-B. This will return a new set of credentials that can be used to call Amazon Athena in Account-B.

spark s3 access without configuring keys and with only IAM role

I have a HDP cluster on AWS and I have one s3(in other account) also, my hadoop version is Hadoop 3.1.1.3.0.1.0-187
Now I want to read from the s3 (which is in different account) and process, then write the result to my s3(same account as cluster).
But as per the HDP guide Here tells, I can configure only one keys of either my account or other account.
But in my case I want to configure two account keys, so How to do do that ?
Due to some security reason, other account can not change the bucket policy to add IAM role which is created in my account , Hence I tried to access like below
Configured the keys of other account
Added IAM role(which has access policy for my bucket) of my account
but Still I got below error when I tried to access my account s3 from spark write
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3
What you need is to use the EC2 instance profile role. It is an IAM role that is attached to your instance: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html
You first create a role with permissions that allow s3 access. Then you attach that role to your HDP cluster(EC2 autoscaling group and EMR can both achieve that).No IAM access key configuration needed on your side, although AWS still does that for you in the background. This is the s3 "outbound" access part.
The 2nd step is to set up the bucket policy to allow cross-account access: https://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-managing-access-example2.html
You will need to do this for each bucket in your different accounts. This is basically the "inbound" s3 access permission part.
You will encounter 400 if any part of your access(i.e., your instance profile role's permission, S3 bucket ACL, bucket policy, public access block setting and etc..) is denied in the permission chain. There are much more layers on the "inbound" side. So to start to get things working, if you are not IAM expert, try to start with a very open policy(use '*' wildcard) and then narrow things down.
If I've understood right
you want your EC2 VMs to access an S3 bucket to which the IAM role doesn't have access
your have a set of AWS login details for the external S3 bucket (login and password)
HDP3 has an default auth chain of, in order
per-bucket secrets. fs.s3a.bucket.NAME.access.key, fs.s3a.bucket.NAME.secret.key
config-wide secrets fs.s3a.access.key, fs.s3a.secret.key
env vars AWS_ACCESS_KEY and AWS_SECRET_KEY
the IAM Role (it does an HTTP GET to the 169.something server which serves up a new set of IAM role credentials at least once an hour)
What you need to try here is set up some per-bucket secrets for only the external source (either in a JCEKS file on all nodes in core-site.xml, or in the spark default. For example, if the external bucket was s3a://external, you'd have
spark.hadoop.fs.s3a.bucket.external.access.key AKAISOMETHING spark.hadoop.fs.s3a.bucket.external.secret.key SECRETSOMETHING
HDP3/Hadoop 3 can handle >1 secret in the same JCEKS file without problems. HADOOP-14507. my code. Older versions let you put username:secret in the URI, but that's such a security troublespot (everything logs those URIs as they aren't viewed as sensistive), that feature has been cut from Hadoop now. Stick to the JCEKs file with a per-bucket secret, falling back to IAM role for your own data
Note you can fiddle with the authentication list for ordering and behaviour: if you add use the TemporaryAWSCredentialsProvider then it'll support session keys as well, which is often handy.
<property>
<name>fs.s3a.aws.credentials.provider</name>
<value>
org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider,
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider,
com.amazonaws.auth.EnvironmentVariableCredentialsProvider,
org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider
</value>
</property>

How to grant bucket-owner-full-control to a file unloaded from redshift in one account to an s3 bucket in another account?

I have a redshift cluster in an AWS account "A" and an S3 bucket in account "B". I need to unload data from redshift account in A to an S3 bucket in B.
I've already provided the necessary bucket policy and role policy to unload the data. The data is also getting unloaded successfully. Now the problem is that the owner of the file created from this unload is account
A and the file needs to be used by user B. On trying to access that object I am getting access denied. How do I solve this?
PS: ListBucket and GetObject permissions have been granted by the redshift IAM policy.
This is what worked for me - Chaining IAM roles.
For example, suppose Company A wants to access data in an Amazon S3 bucket that belongs to Company B. Company A creates an AWS service role for Amazon Redshift named RoleA and attaches it to their cluster. Company B creates a role named RoleB that's authorized to access the data in the Company B bucket. To access the data in the Company B bucket, Company A runs a COPY command using an iam_role parameter that chains RoleA and RoleB. For the duration of the UNLOAD operation, RoleA temporarily assumes RoleB to access the Amazon S3 bucket.
More details here: https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html#authorizing-redshift-service-chaining-roles

Is there a way to give someone AWS EMR/Ec2 machine access without any download rights?

We have not given anyone any download rights through S3 but it is still possible to download data through an EMR cluster using scp
Is it possible to give someone the cluster dns but make sure they can use the data in the cluster but not download it?
EMR nodes by default will assume EC2 instance profile: EMR_EC2_DefaultRole IAM role to access resources on your account including S3. Policies defined in this role will decide on what EMR had access to.
If that role has s3:* , or s3:get* etc, allowed , on all resources like buckets and objects, then all nodes on EMR can download objects from all buckets on your account. (Given you do not have any bucket policies).
http://docs.aws.amazon.com/AmazonS3/latest/dev/using-with-s3-actions.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html https://aws.amazon.com/blogs/security/iam-policies-and-bucket-policies-and-acls-oh-my-controlling-access-to-s3-resources/
Yes, given EMR has access to S3 , if you are sharing the Private SSH key (.pem) file of an EMR/Ec2 with a user, they can use SCP to copy data from EMR to their machine.