How to get AWS Glue crawler to assume a role in another AWS account to get data from that account's S3 bucket? - amazon-web-services

There's some CSV data files I need to get in S3 buckets belonging to a series of AWS accounts belonging to a third-party; the owner of the other accounts has created a role in each of the accounts which grants me access to those files; I can use the AWS web console (logged in to my own account) to switch to each role and get the files. One at a time, I switch to the role for each of the accounts and then get the files for that account, then move on to the next account and get those files, and so on.
I'd like to automate this process.
It looks like AWS Glue can do this, but I'm having trouble with the permissions.
What I need it to do is create permissions so that an AWS Glue crawler can switch to the right role (belonging to each of the other AWS accounts) and get the data files from the S3 bucket of those accounts.
Is this possible and if so how can I set it up? (e.g. what IAM roles/permissions are needed?) I'd prefer to limit changes to my own account if possible rather than having to ask the other account owner to make changes on their side.
If it's not possible with Glue, is there some other easy way to do it with a different AWS service?
Thanks!
(I've had a series of tries but I keep getting it wrong - my attempts are so far from being right that there's no point in me posting the details here).

Yes, you can automate your scenario with Glue by following these steps:
Create an IAM role in your AWS account. This role's name must start with AWSGlueServiceRole but you can append whatever you want. Add a trust relationship for Glue, such as:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "glue.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Attach two IAM policies to your IAM role. The AWS managed policy named AWSGlueServiceRole and a custom policy that provides the access needed to all the target cross account S3 buckets, such as:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BucketAccess",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::examplebucket1",
"arn:aws:s3:::examplebucket2",
"arn:aws:s3:::examplebucket3"
]
},
{
"Sid": "ObjectAccess",
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": [
"arn:aws:s3:::examplebucket1/*",
"arn:aws:s3:::examplebucket2/*",
"arn:aws:s3:::examplebucket3/*"
]
}
]
}
Add S3 bucket policies to each target bucket that allows your IAM role the same S3 access that you granted it in your account, such as:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BucketAccess",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::your_account_number:role/AWSGlueServiceRoleDefault"
},
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::examplebucket1"
},
{
"Sid": "ObjectAccess",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::your_account_number:role/AWSGlueServiceRoleDefault"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::examplebucket1/*"
}
]
}
Finally, create Glue crawlers and jobs in your account (in the same regions as the target cross account S3 buckets) that will ETL the data from the cross account S3 buckets to your account.

Using the AWS CLI, you can create named profiles for each of the roles you want to switch to, then refer to them from the CLI. You can then chain these calls, referencing the named profile for each role, and include them in a script to automate the process.
From Switching to an IAM Role (AWS Command Line Interface)
A role specifies a set of permissions that you can use to access AWS
resources that you need. In that sense, it is similar to a user in AWS
Identity and Access Management (IAM). When you sign in as a user, you
get a specific set of permissions. However, you don't sign in to a
role, but once signed in as a user you can switch to a role. This
temporarily sets aside your original user permissions and instead
gives you the permissions assigned to the role. The role can be in
your own account or any other AWS account. For more information about
roles, their benefits, and how to create and configure them, see IAM
Roles, and Creating IAM Roles.

You can achieve this with AWS lambda and Cloudwatch Rules.
You can create a lambda function that has a role attached to it, lets call this role - Role A, depending on the number of accounts you can either create 1 function per account and create one rule in cloudwatch to trigger all functions or you can create 1 function for all the accounts (be cautious to the limitations of AWS Lambda).
Creating Role A
Create an IAM Role (Role A) with the following policy allowing it to assume the role given to you by the other accounts containing the data.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1509358389000",
"Effect": "Allow",
"Action": [
"sts:AssumeRole"
],
"Resource": [
"",
"",
....
"
]// all the IAM Role ARN's from the accounts containing the data or if you have 1 function for each account you can opt to have separate roles
}
]
}
Also you will need to make sure that a trust relationship with all the accounts are present in Role A's Trust Relationship policy document.
Attach Role A to the lambda functions you will be running. you can use serverless for development.
Now your lambda function has Role A attached to it and Role A has sts:AssumeRole permissions over the role's created in the other accounts.
Assuming that you have created 1 function for 1 account in you lambda's code you will have to first use STS to switch to the role of the other account and obtain temporary credentials and pass these to S3 options before fetching the required data.
if you have created 1 function for all the accounts you can have the role ARN's in an array and iterate over it, again when doing this be aware of the limits of AWS lambda.

Related

How to give access to one specific bucket?

I have an Amazon AWS account and I'm using Amazon S3.
I'd like to give access to specific people to a Amazon S3 bucket.
Here's what I'd like to do :
Amazon AWS: Access limited to my account
Amazon S3: Access limited to my account
Bucket "website-photos": Access authorized to 3 people that will be able to read and write in the bucket through AWS management console.
Files in the bucket "website-photos": Public can read them.
How can I setup this config?
Just create an IAM policy and attach to the users you want to give access:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ListObjectsInBucket",
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": ["arn:aws:s3:::bucket-name"]
},
{
"Sid": "AllObjectActions",
"Effect": "Allow",
"Action": "s3:*Object",
"Resource": ["arn:aws:s3:::bucket-name/*"]
}
]
}
See: Amazon S3: Allows Read and Write Access to Objects in an S3 Bucket - AWS Identity and Access Management
The general approach is:
If you want something to be "public" (accessible by anyone), then use a Bucket Policy
If you want to only assign permissions to a specific IAM User, then attach a policy to the IAM User
If you want to only assign permissions to a group of IAM Users, then create an IAM Group, attach a policy and assign the group to the desired IAM Users

AWS S3 data lake cross account usage

we have the following scenario:
AWS Account A (application) writes data from an application to an S3 bucket owned by account B (data lake). The analysts in account C (reporting) want to proccess the data and build reports and dashboards on top of it.
Account A can write data to the data lake with --acl bucket-owner-full-control to allow Account B the access. But Account C still cannot see and process the data.
One (in our eyes bad) solution is to copy the data to the same location (overwrite) as account B, effectively taking ownership for the data in the process and eliminating the issue. We don't want it, because ... ugly
We tried assuming roles in the different accounts, but it does not work for all our infrastructure. E.g. S3 access via CLI or console is OK, but using it from EMR in account C does not. Also we have on-premise infrastructure (local taskrunners), where this mechanism is not an option.
Maintaining IAM roles for all accounts and users is too much effort. We aim for an automatic solution, not one that we have to take action every time a new user or account is added.
Do you have any suggestions?
One nice and clean way is to use a bucket policy granting read access to the external account (account C) by supplying the account ARN as the principal.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Grant read access to reporting account",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::insertReportingAccountIdHere:root"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetObject",
"s3:GetObjectAcl"
],
"Resource": [
"arn:aws:s3:::yourdatalakebucket",
"arn:aws:s3:::yourdatalakebucket/*"
]
}
]
}
This lets the reporting account manage the (ListBucket, gGtObject) permissions on the bucket for its own users, meaning you can now create an IAM policy on Account C with the permission to fetch data from the specified data lake bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Allow reading files from the data lake",
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetObject",
"s3:GetObjectAcl"
],
"Resource": [
"arn:aws:s3:::yourdatalakebucket",
"arn:aws:s3:::yourdatalakebucket/*"
]
}
]
}
This policy can then be attached to any Account C IAM role or user group you want. For example, you could attach it to your standard Developer or Analyst roles to give access to large groups of users, or you could attach it to a service role to give a particular service access to the bucket.
There is a guide on the Amazon S3 documentation site on how to do this.
You can do via the following documentation,
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_enable-console-saml.html
Steps:
Create SAML provider
Create Role for the SAML provider, example below
Assign the users role based on saml conditions
E.g., You can create S3 Readers, S3 Writers and assign permissions based on that.
Example Assume Role with SAML:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"Federated": "arn:aws:iam::ACCOUNT-ID-WITHOUT-HYPHENS:saml-provider/ExampleOrgSSOProvider"},
"Action": "sts:AssumeRoleWithSAML",
"Condition": {"StringEquals": {
"saml:edupersonorgdn": "ExampleOrg",
"saml:aud": "https://signin.aws.amazon.com/saml"
}}
}]
}
Hope it helps.
In our case, we solved it using roles in the DataLake account (B), both for write (WriterRole) and read (ReaderRole) access. When writing to the DataLake from Account A, your writer assumes the "WriterRole" in Account B, that has the required permission. When reading from Account C, you assume the "ReaderRole".
The issues with EMR reading, we solved with EMRFS using IAM roles for reading (https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-emrfs-iam-roles.html)

AWS S3 bucket organization access policy

My company has an organization set up in AWS (CompanyA as example). Each team has an account joined to this organization (HR, ProductA, ProductB, ect..). We in ProductA are attempting to grant read-only access to an S3 bucket which serves as a yum repository we own to anyone under this organization from their ec2 instance without auth (yum commands works out of box)
Some items we evaluated:
https://github.com/rmela/yum-s3-plugin -> This would go along with user principal access, users would need to add their keys to pull from the repo
http://parthicloud.com/how-to-access-s3-bucket-from-application-on-amazon-ec2-without-access-credentials/ -> Great tutorial for inside your own account, ec2 instances need to be brought up with a IAM policy to allow access to bucket.
Add a condition to the bucket policy listing your AWS Organization, and allow all principals access. See AWS Global Condition Context Keys, search for aws:PrincipalOrgID. "When you add and remove accounts, policies that include aws:PrincipalOrgID automatically include the correct accounts and don't require manual updating."
The Action and Resource sections in the example below should be the same as for your current policy that lists all the AWS accounts in your organization.
{
"Version": "2012-10-17",
"Statement": {
"Sid": "AllowOrganizationToReadYumBucket",
"Effect": "Allow",
"Principal": "*",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-yum-bucket",
"arn:aws:s3:::your-yum-bucket/*"
],
"Condition": {
"StringEquals": {"aws:PrincipalOrgID":["o-xxxxxxxxxxx"]}
}
}
}
It is not possible for an Amazon S3 bucket policy to refer to a group of accounts in an AWS Organization.
Your bucket policy would need to list permissions for each account individually. For example:
"Principal":{"AWS":"arn:aws:iam::ACCOUNT-NUMBER:user/*"}

How to lockdown S3 bucket to specific users and IAM role(s)

In our environment, all IAM user accounts are assigned a customer-managed policy that grants read-only access to a lot of AWS services. Here's what I want to do:
Migrate a sql server 2012 express database from on-prem to a RDS instance
Limit access to the S3 bucket containing the database files
Here's the requirements according to AWS:
A S3 bucket to store the .bak database file
A role with access to the bucket
SQLSERVER_BACKUP_RESTORE option attached to RDS instance
So far, I've done the following:
Created a bucket under the name "test-bucket" (and uploaded the .bak file here)
Created a role under the name "rds-s3-role"
Created a policy under the name "rds-s3-policy" with these settings:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::test-bucket/"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObjectMetaData",
"s3:GetObject",
"s3:PutObject",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload"
],
"Resource": "arn:aws:s3:::test-bucket/*"
}
]
}
Assigned the policy to the role
Gave the AssumeRole permissions to the RDS service to assume the role created above
Created a new option group in RDS with the SQLSERVER_BACKUP_RESTORE option and linked it to my RDS instance
With no restrictions on my S3 bucket, I can perform the restore just fine; however, I can't find a solid way of restricting access to the bucket without hindering the RDS service from doing the restore.
In terms of my attempts to restrict access to the S3 bucket, I found a few posts online recommending using an explicit Deny statement to deny access to all types of principals and grant access based on some conditional statements.
Here's the contents of my bucket policy:
{
"Version": "2012-10-17",
"Id": "Policy1486769843194",
"Statement": [
{
"Sid": "Stmt1486769841856",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::test-bucket",
"arn:aws:s3:::test-bucket/*"
],
"Condition": {
"StringNotLike": {
"aws:userid": [
"<root_id>",
"<user1_userid>",
"<user2_userid>",
"<user3_userid>",
"<role_roleid>:*"
]
}
}
}
]
}
I can confirm the bucket policy does restrict access to only the IAM users that I specified, but I am not sure how it treats IAM roles. I used the :* syntax above per a document I found on the aws forums where the author stated the ":*" is a catch-all for every principal that assumes the specified role.
The only thing I'm having a problem with is, with this bucket policy in place, when I attempt to do the database restore, I get an access denied error. Has anyone ever done something like this? I've been going at it all day and haven't been able to find a working solution.
The following, admittedly, is guesswork... but reading between the lines of the somewhat difficult to navigate IAM documentation and elsewhere, and taking into account the way I originally interpreted it (incorrectly), I suspect that you are using the role's name rather than the role's ID in the policy.
Role IDs look similar to AWSAccessKeyIds except that they begin with AROA....
For the given role, find RoleId in the output from this:
$ aws iam get-role --role-name ROLE-NAME
https://aws.amazon.com/blogs/security/how-to-restrict-amazon-s3-bucket-access-to-a-specific-iam-role/
Use caution when creating a broad Deny policy. You can end up denying s3:PutBucketPolicy to yourself, which leaves you in a situation where your policy prevents you from changing the policy... in which case, your only recourse is presumably to persuade AWS support to remove the bucket policy. A safer configuration would be to use this to deny only the object-level permissions.

Why does this S3 policy not allow me to download files?

This is the policy I have:
{
"Version": "2012-10-17",
"Id": "Policy1477084949492",
"Statement": [
{
"Sid": "Stmt1477084932198",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::__redacted__"
},
{
"Sid": "Stmt1477084947291",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::__redacted__/*"
}
]
}
I am able to view the files in the bucket via aws s3 ls but am not able to download.
My understanding is that these permissions should give full access to any AWS identity.
Question- Is there some reason that is not the case here?
Your policy works for me when I test it in my account.
In IAM, a deny overwrites an allow, and I suspect that you have a conflicting policy somewhere. Check all user policies, and groups that the user is a member of for conflicting policies.
You don't explicitly say you are doing this, but just to cover all bases. If you are running the s3 get on an instance with an IAM Role associated with it, check to make sure that the IAM Roles permissions are appropriate.
Depending on what you are actually doing this could explain your situation. If you are using an EC2 instance with an IAM Role, it will be using that IAM Role for permissions by default not your IAM User permissions. If you run aws configure and explicitly configure it with IAM User issued key and secret then it will use the IAM User policies.
Best practices say that if you are performing work on an EC2 instance, where possible and where your use case allows for it; you should not be using keys and secrets on the host but using an EC2 IAM Role.
Additional Reading:
IAM Policy Evaluation Logic
http://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_evaluation-logic.html