AWS S3 data lake cross account usage - amazon-web-services

we have the following scenario:
AWS Account A (application) writes data from an application to an S3 bucket owned by account B (data lake). The analysts in account C (reporting) want to proccess the data and build reports and dashboards on top of it.
Account A can write data to the data lake with --acl bucket-owner-full-control to allow Account B the access. But Account C still cannot see and process the data.
One (in our eyes bad) solution is to copy the data to the same location (overwrite) as account B, effectively taking ownership for the data in the process and eliminating the issue. We don't want it, because ... ugly
We tried assuming roles in the different accounts, but it does not work for all our infrastructure. E.g. S3 access via CLI or console is OK, but using it from EMR in account C does not. Also we have on-premise infrastructure (local taskrunners), where this mechanism is not an option.
Maintaining IAM roles for all accounts and users is too much effort. We aim for an automatic solution, not one that we have to take action every time a new user or account is added.
Do you have any suggestions?

One nice and clean way is to use a bucket policy granting read access to the external account (account C) by supplying the account ARN as the principal.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Grant read access to reporting account",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::insertReportingAccountIdHere:root"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetObject",
"s3:GetObjectAcl"
],
"Resource": [
"arn:aws:s3:::yourdatalakebucket",
"arn:aws:s3:::yourdatalakebucket/*"
]
}
]
}
This lets the reporting account manage the (ListBucket, gGtObject) permissions on the bucket for its own users, meaning you can now create an IAM policy on Account C with the permission to fetch data from the specified data lake bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Allow reading files from the data lake",
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetObject",
"s3:GetObjectAcl"
],
"Resource": [
"arn:aws:s3:::yourdatalakebucket",
"arn:aws:s3:::yourdatalakebucket/*"
]
}
]
}
This policy can then be attached to any Account C IAM role or user group you want. For example, you could attach it to your standard Developer or Analyst roles to give access to large groups of users, or you could attach it to a service role to give a particular service access to the bucket.
There is a guide on the Amazon S3 documentation site on how to do this.

You can do via the following documentation,
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_enable-console-saml.html
Steps:
Create SAML provider
Create Role for the SAML provider, example below
Assign the users role based on saml conditions
E.g., You can create S3 Readers, S3 Writers and assign permissions based on that.
Example Assume Role with SAML:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"Federated": "arn:aws:iam::ACCOUNT-ID-WITHOUT-HYPHENS:saml-provider/ExampleOrgSSOProvider"},
"Action": "sts:AssumeRoleWithSAML",
"Condition": {"StringEquals": {
"saml:edupersonorgdn": "ExampleOrg",
"saml:aud": "https://signin.aws.amazon.com/saml"
}}
}]
}
Hope it helps.

In our case, we solved it using roles in the DataLake account (B), both for write (WriterRole) and read (ReaderRole) access. When writing to the DataLake from Account A, your writer assumes the "WriterRole" in Account B, that has the required permission. When reading from Account C, you assume the "ReaderRole".
The issues with EMR reading, we solved with EMRFS using IAM roles for reading (https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-emrfs-iam-roles.html)

Related

Allow federated users to access AWS S3 bucket

I have an app in play store that needs to download extra resources after installation. I thought that using Amazon s3 is a good idea to achieve it.(tell me if it is a bad idea)
Now, I would like to allow any logged in user through OpenID to access resources in the bucket.
Problem is;
I can't allow federated users to access items in bucket. What ever i have tried so far is the following access policy;
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ReadObjects",
"Effect": "Allow",
"Principal": {"Federated": "accounts.google.com"},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::mysuperbbucket/*"
}
]
}
but somehow AWS bucket policy editor raises an exception on it and saying that "Federated Principal Not Supported: The policy type does not support a federated provider in the principal element. Use a supported principal."
Alright, what to do now?

Grant access to Amazon S3 bucket only to one IAM User

I wish to have a bucket that only one IAM user could access using the AWS Console, list its content and access object files inside it.
So, I have created the IAM user, the bucket itself, and later:
bucket policy as follow:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "statement1",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::0000000:user/dave"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::testbucket1234"
},
{
"Sid": "statement2",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::0000000:user/dave"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::testbucket1234/*"
}
]
}
And also a inline policy attached to my user's group, as follow:
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:*Object",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::testbucket1234/*"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "*"
}
]
}
Now: I can list my buckets, access the desired bucket, list its content (so far so good). The problem is when I try to open one file object inside the bucket and I get "access denied" error. If I turn the object public, I can access it, but I can also access it using other IAM accounts, and that is not the intention. I want to access the bucket, list its contents and access objects only by usage of this specific IAM account. What am I doing wrong? How can I reach this goal? Thanks in advance.
By default, no IAM User can access any bucket. It is only by granting permissions to users that they can access resources.
However, many people tend to grant Amazon S3 permissions for all buckets, at least for Administrators. This then makes it difficult to remove permissions so that a bucket can only be accessed by one user. While it can be done with Deny policies, such policies are difficult to craft correctly.
For situations where specific data should only be accessed by one user, or a specific group of users (eg HR staff), I would recommend that you create a separate AWS Account and only grant permission to specific IAM Users or IAM Groups via a Bucket Policy (which works fine cross-account). This way, any generic policies that grant access to "all buckets" will not apply to buckets in this separate account.
Update: Accessing private objects
Expanding on what is mentioned in the comments below, a private object in Amazon S3 can be accessed by an authorized user. However, when accessing the object, it is necessary to identify who is accessing the object and their identity must be proved. This can be done in one of several ways:
In the Amazon S3 management console, use the Open command (in the Actions menu). This will open the object using a pre-signed URL that authorizes the access based upon the user who logged into the console. The same method is used for the Download option.
Using the AWS Command-Line Interface (CLI), you can download objects. The AWS CLI needs to be pre-configured with your IAM security credentials to prove your identity.
Programs using an AWS SDK can access S3 objects using their IAM security credentials. In fact, the AWS CLI is simply a Python program that uses the AWS SDK.
If you want to access the object via a URL, an application can generate an Amazon S3 pre-signed URLs. This URL includes the user's identity and a security signature that grants access to a private object for a limited period (eg 5 minutes). This method is commonly used when web applications want to grant access to a private object, such as a document or photo. The S3 management console actually uses this method when a user selects Actions/Open, so that the user can view a private object in their browser.

How to get AWS Glue crawler to assume a role in another AWS account to get data from that account's S3 bucket?

There's some CSV data files I need to get in S3 buckets belonging to a series of AWS accounts belonging to a third-party; the owner of the other accounts has created a role in each of the accounts which grants me access to those files; I can use the AWS web console (logged in to my own account) to switch to each role and get the files. One at a time, I switch to the role for each of the accounts and then get the files for that account, then move on to the next account and get those files, and so on.
I'd like to automate this process.
It looks like AWS Glue can do this, but I'm having trouble with the permissions.
What I need it to do is create permissions so that an AWS Glue crawler can switch to the right role (belonging to each of the other AWS accounts) and get the data files from the S3 bucket of those accounts.
Is this possible and if so how can I set it up? (e.g. what IAM roles/permissions are needed?) I'd prefer to limit changes to my own account if possible rather than having to ask the other account owner to make changes on their side.
If it's not possible with Glue, is there some other easy way to do it with a different AWS service?
Thanks!
(I've had a series of tries but I keep getting it wrong - my attempts are so far from being right that there's no point in me posting the details here).
Yes, you can automate your scenario with Glue by following these steps:
Create an IAM role in your AWS account. This role's name must start with AWSGlueServiceRole but you can append whatever you want. Add a trust relationship for Glue, such as:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "glue.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Attach two IAM policies to your IAM role. The AWS managed policy named AWSGlueServiceRole and a custom policy that provides the access needed to all the target cross account S3 buckets, such as:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BucketAccess",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::examplebucket1",
"arn:aws:s3:::examplebucket2",
"arn:aws:s3:::examplebucket3"
]
},
{
"Sid": "ObjectAccess",
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": [
"arn:aws:s3:::examplebucket1/*",
"arn:aws:s3:::examplebucket2/*",
"arn:aws:s3:::examplebucket3/*"
]
}
]
}
Add S3 bucket policies to each target bucket that allows your IAM role the same S3 access that you granted it in your account, such as:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BucketAccess",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::your_account_number:role/AWSGlueServiceRoleDefault"
},
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::examplebucket1"
},
{
"Sid": "ObjectAccess",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::your_account_number:role/AWSGlueServiceRoleDefault"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::examplebucket1/*"
}
]
}
Finally, create Glue crawlers and jobs in your account (in the same regions as the target cross account S3 buckets) that will ETL the data from the cross account S3 buckets to your account.
Using the AWS CLI, you can create named profiles for each of the roles you want to switch to, then refer to them from the CLI. You can then chain these calls, referencing the named profile for each role, and include them in a script to automate the process.
From Switching to an IAM Role (AWS Command Line Interface)
A role specifies a set of permissions that you can use to access AWS
resources that you need. In that sense, it is similar to a user in AWS
Identity and Access Management (IAM). When you sign in as a user, you
get a specific set of permissions. However, you don't sign in to a
role, but once signed in as a user you can switch to a role. This
temporarily sets aside your original user permissions and instead
gives you the permissions assigned to the role. The role can be in
your own account or any other AWS account. For more information about
roles, their benefits, and how to create and configure them, see IAM
Roles, and Creating IAM Roles.
You can achieve this with AWS lambda and Cloudwatch Rules.
You can create a lambda function that has a role attached to it, lets call this role - Role A, depending on the number of accounts you can either create 1 function per account and create one rule in cloudwatch to trigger all functions or you can create 1 function for all the accounts (be cautious to the limitations of AWS Lambda).
Creating Role A
Create an IAM Role (Role A) with the following policy allowing it to assume the role given to you by the other accounts containing the data.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1509358389000",
"Effect": "Allow",
"Action": [
"sts:AssumeRole"
],
"Resource": [
"",
"",
....
"
]// all the IAM Role ARN's from the accounts containing the data or if you have 1 function for each account you can opt to have separate roles
}
]
}
Also you will need to make sure that a trust relationship with all the accounts are present in Role A's Trust Relationship policy document.
Attach Role A to the lambda functions you will be running. you can use serverless for development.
Now your lambda function has Role A attached to it and Role A has sts:AssumeRole permissions over the role's created in the other accounts.
Assuming that you have created 1 function for 1 account in you lambda's code you will have to first use STS to switch to the role of the other account and obtain temporary credentials and pass these to S3 options before fetching the required data.
if you have created 1 function for all the accounts you can have the role ARN's in an array and iterate over it, again when doing this be aware of the limits of AWS lambda.

How to lockdown S3 bucket to specific users and IAM role(s)

In our environment, all IAM user accounts are assigned a customer-managed policy that grants read-only access to a lot of AWS services. Here's what I want to do:
Migrate a sql server 2012 express database from on-prem to a RDS instance
Limit access to the S3 bucket containing the database files
Here's the requirements according to AWS:
A S3 bucket to store the .bak database file
A role with access to the bucket
SQLSERVER_BACKUP_RESTORE option attached to RDS instance
So far, I've done the following:
Created a bucket under the name "test-bucket" (and uploaded the .bak file here)
Created a role under the name "rds-s3-role"
Created a policy under the name "rds-s3-policy" with these settings:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::test-bucket/"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObjectMetaData",
"s3:GetObject",
"s3:PutObject",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload"
],
"Resource": "arn:aws:s3:::test-bucket/*"
}
]
}
Assigned the policy to the role
Gave the AssumeRole permissions to the RDS service to assume the role created above
Created a new option group in RDS with the SQLSERVER_BACKUP_RESTORE option and linked it to my RDS instance
With no restrictions on my S3 bucket, I can perform the restore just fine; however, I can't find a solid way of restricting access to the bucket without hindering the RDS service from doing the restore.
In terms of my attempts to restrict access to the S3 bucket, I found a few posts online recommending using an explicit Deny statement to deny access to all types of principals and grant access based on some conditional statements.
Here's the contents of my bucket policy:
{
"Version": "2012-10-17",
"Id": "Policy1486769843194",
"Statement": [
{
"Sid": "Stmt1486769841856",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::test-bucket",
"arn:aws:s3:::test-bucket/*"
],
"Condition": {
"StringNotLike": {
"aws:userid": [
"<root_id>",
"<user1_userid>",
"<user2_userid>",
"<user3_userid>",
"<role_roleid>:*"
]
}
}
}
]
}
I can confirm the bucket policy does restrict access to only the IAM users that I specified, but I am not sure how it treats IAM roles. I used the :* syntax above per a document I found on the aws forums where the author stated the ":*" is a catch-all for every principal that assumes the specified role.
The only thing I'm having a problem with is, with this bucket policy in place, when I attempt to do the database restore, I get an access denied error. Has anyone ever done something like this? I've been going at it all day and haven't been able to find a working solution.
The following, admittedly, is guesswork... but reading between the lines of the somewhat difficult to navigate IAM documentation and elsewhere, and taking into account the way I originally interpreted it (incorrectly), I suspect that you are using the role's name rather than the role's ID in the policy.
Role IDs look similar to AWSAccessKeyIds except that they begin with AROA....
For the given role, find RoleId in the output from this:
$ aws iam get-role --role-name ROLE-NAME
https://aws.amazon.com/blogs/security/how-to-restrict-amazon-s3-bucket-access-to-a-specific-iam-role/
Use caution when creating a broad Deny policy. You can end up denying s3:PutBucketPolicy to yourself, which leaves you in a situation where your policy prevents you from changing the policy... in which case, your only recourse is presumably to persuade AWS support to remove the bucket policy. A safer configuration would be to use this to deny only the object-level permissions.

S3 IAM Policy to access other account

We need to create an IAM user that is allowed to access buckets in our client's S3 accounts (provided that they have allowed us access to those buckets as well).
We have created an IAM user in our account with the following inline policy:
{
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts",
"s3:PutObject",
"s3:ListBucketMultipartUploads",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::*"
}
]
}
In addition to this, we will request that our clients use the following policy and apply it to their relevant bucket:
{
"Version": "2008-10-17",
"Id": "Policy1416999097026",
"Statement": [
{
"Sid": "Stmt1416998971331",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::229569340673:user/our-iam-user"
},
"Action": [
"s3:AbortMultipartUpload",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::client-bucket-name/*"
},
{
"Sid": "Stmt1416999025675",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::229569340673:user/our-iam-user"
},
"Action": [
"s3:ListBucketMultipartUploads",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::client-bucket-name"
}
]
}
Whilst this all seems to work fine, the one major issue that we have discovered is our own internal inline policy seems to give full access to our-iam-user to all of our own internal buckets.
Have we mis-configured something, or are we missing something else obvious here?
According to AWS support, this is not the right way to approach the problem:
https://forums.aws.amazon.com/message.jspa?messageID=618606
I am copying the answer from them here.
AWS:
The policy you're using with your IAM user grants access to any Amazon S3 bucket. In this case this will include any S3 bucket in your account and any bucket in any other account, where the account owner has granted your user access. You'll want to be more specific with the policy of your IAM user. For example, the following policy will limit your IAM user access to a single bucket.
You can also grant access to an array of buckets, if the user requires access to more than one.
Me
Unfortunately, we don't know beforehand all of our client's bucket names when we create the inline policy. As we get more and more clients to our service, it would be impractical to keep adding new client bucket names to the inline policy.
I guess another option is to create a new AWS account used solely for the above purpose - i.e. this account will not itself own anything, and will only ever be used for uploading to client buckets.
Is this acceptable, or are there any other alternatives options open to us?
AWS
Having a separate AWS account would provide clear security boundaries. Keep in mind that if you ever create a bucket in that other account, the user would inherit access to any bucket if you grant access to "arn:aws:s3:::*".
Another approach would be to use blacklisting (note whitelisting as suggested above is a better practice).
As you can see, the 2nd statement explicitly denies access to an array of buckets. This will override the allow in the first statment. The disadvantage here is that by default the user will inherit access to any new bucket. Therefore, you'd need to be diligent about adding new buckets to the blacklist. Either approach will require you to maintain changes to the policy. Therefore, I recommend my previous policy (aka whitelisting) where you only grant access to the S3 buckets that the user requires.
Conclusion
For our purposes, the white listing/blacklisting approach is not acceptable because we don't know before all the buckets that will be supplied by our clients. In the end, we went the route of creating a new AWS account with a single user, and that user does not have of its own s3 buckets
The policy you grant to your internal user gives this user access to all S3 bucket for the API listed (the first policy in your question). This is unnecessary as your client's bucket policies will grant your user required privileges to access to client's bucket.
To solve your problem, remove the user policy - or - explicitly your client's bucket in the list of allowed [Resources] instead of using "*"