Granting tens of thousands of AWS accounts access to a bucket? - amazon-web-services

We are a humble startup that mines data from the entire Internet and put them in an Amazon S3 bucket to share with the world. For now we have 2TB of data and soon we may reach the 20TB mark.
Our subscribers will be able to download all the data from the Amazon S3 bucket we have. We have to opt for requester pays for the bandwidth apparently unless we want to end up with some heart breaking S3 bills.
Pre-signed URL is not an option because it doesn't seem to audit bandwidth usage in real time, thus is vulnerable to download abuses.
After some research this seems to be the way to grant different AWS accounts the needed permissions to access our bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Permissions to foreign account 1",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ForeignAccount-ID-1:root"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::ourbucket"
]
},
{
"Sid": "Permissions to foreign account 2",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ForeignAccount-ID-2:root"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::ourbucket"
]
},
{
"Sid": "Permissions to foreign account 3",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ForeignAccount-ID-3:root"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::ourbucket"
]
},
......
]
}
Wherein ForeignAccount-ID-x is the account ID e.g. 2222-2222-2222.
However the issue is, we may potentially have tens of thousands or even more subscribers to this bucket.
Is this the right and efficient way to add permissions for them to access this bucket?
Would it pose any performance difficulties to this bucket considering each request would go through this mountainous bucket policy?
Any better solutions for this problem?

Your requirement for Amazon S3 Requester Pays Buckets is understandable, but leads to other limitations.
User will need their own AWS account to authenticate — it will not work with federated logins such as AWS Cognito. Also, pre-signed URLs aren't of benefit because they are generated from an AWS account too.
Bucket policies are limited to 20KB and ACLs are limited to 100 grants.
So, this approach seems unlikely to work.
Another option would be to create a mechanism where your system can push content to another user's AWS account. They would need to provide a destination bucket and some form of access (eg an IAM Role that can be assumed) and your application could copy files to their bucket. However, this could be difficult for regularly-published data.
Another option would be to allow access to the content only from within the same AWS Region. Thus, users would be able to read and process the data in AWS using services such as Amazon EMR. They could write applications on EC2 that access the data in Amazon S3. They would be able to copy the data to their own buckets. The only thing they cannot do is access the data from outside AWS. This would eliminate Data Transfer costs. The data could even be provided in multiple regions to serve worldwide users.
A final option would be to propose your dataset to the AWS Public Dataset Program, which will cover the cost of storage and data transfer for "publicly available high-value cloud-optimized datasets".

Related

How can I find what external S3 buckets (AWS-owned) are being accessed?

I'm using WorkSpaces Web (not WorkSpaces!) with an S3 VPC endpoint. I would like to be able to restrict S3 access via the S3 endpoint policy to only the buckets required by WorkSpaces Web. I cannot find any documentation with the answers, and AWS support does not seem to know what these buckets are. How can I find out what buckets the service is talking to? I see the requests in VPC flow logs, but that obviously doesn't show what URL or bucket it is trying to talk to. I have tried the same policy used for WorkSpaces (below), but it was not correct (or possibly not enough). I have confirmed that s3:GetObject is the only action needed.
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "Access-to-specific-bucket-only",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": [
"arn:aws:s3:::aws-windows-downloads-us-east-1/*",
"arn:aws:s3:::amazon-ssm-us-east-1/*",
"arn:aws:s3:::amazon-ssm-packages-us-east-1/*",
"arn:aws:s3:::us-east-1-birdwatcher-prod/*",
"arn:aws:s3:::aws-ssm-distributor-file-us-east-1/*",
"arn:aws:s3:::aws-ssm-document-attachments-us-east-1/*",
"arn:aws:s3:::patch-baseline-snapshot-us-east-1/*",
"arn:aws:s3:::amazonlinux.*.amazonaws.com/*",
"arn:aws:s3:::repo.*.amazonaws.com/*",
"arn:aws:s3:::packages.*.amazonaws.com/*"
]
}
]
}

Monitor Number of S3 Buckets in account

There is a limit of 100 buckets per AWS account. My application is creating buckets when certain conditions are met. Is there a mechanism to monitor the number of buckets created in my account? I would like to alarm/get notified before I reach the 100 bucket limit.
Edit: The plan is to create prefix per customer and grant access to the prefix using Resource Policy. The customers would be uploading objects to only the prefix they have access to. We would update resource policy every time we create a new prefix. Sample policy as shown below. Once we hit limit on Resource Policy size for bucket, we would then need to create new bucket.
"Statement": [
{
"Sid": "AllowGetObject",
"Effect": "Allow",
"Principal": {
"AWS":"123456789012"
},
"Action": "s3:PutObject",
"Resource": [
"arn:aws:s3:::TestBucketName/123456789012/*",
"arn:aws:s3:::TestBucketName/123456789012"
]
}
]
Unfortunately for S3 there is no AWS backed solution that performs all of the actions for monitoring S3.
To do this you would need to create your own solution, the below is a suggestion for covering this problem:
Use a Lambda function to call the list-buckets function, counting the total number of buckets in your account. Push the value to CloudWatch as a custom metric.
Create a CloudWatch alarm for this metric based on a specific threshold.
Create a Lambda function and use the list-service-quotas function to get your service quotas for S3 buckets. Use this to update the alarm thresholds.
Set both of these Lambda functions on a scheduled CloudWatch event.
For other services quotas you might be able to take advantage of the Trusted Advisor API if you are using Business or Enterprise support plan, however this only covers specific quotas for services.
If your application is running on node.js, you can get the number of buckets using the following code:
const s3 = new AWS.S3();
s3.listBuckets({}, (err, data) => {
if (err) console.log(err);
else console.log(data.Buckets.length);
}
It appears that:
You are providing customers with credentials associated with an IAM User (not a good practice because generally IAM User credentials are for your internal staff, not external entities)
You want to allow customers to upload data to Amazon S3
I would recommend:
Use one Amazon S3 bucket
Allow customers to access their own folder (Prefix) within the bucket
This can be done by creating a bucket policy that uses IAM Policy Variables, which can automatically insert the username into the policy. This allows one policy to apply differently for every user.
Here is an example from IAM policy elements: Variables and tags - AWS Identity and Access Management:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": ["s3:ListBucket"],
"Effect": "Allow",
"Resource": ["arn:aws:s3:::mybucket"],
"Condition": {"StringLike": {"s3:prefix": ["${aws:username}/*"]}}
},
{
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Effect": "Allow",
"Resource": ["arn:aws:s3:::mybucket/${aws:username}/*"]
}
]
}
This way, users can access their own folder, but cannot access other users' folders.

My AS3 Bucket Policy only applies to some Objects

I'm having a really hard time setting up my bucket policy, it looks like my bucket policy only applies to some objects in my bucket.
What I want is pretty simple: I store video files in the bucket and I want them to be exclusively downloadable from my webiste.
My approach is to block everything by default, and then add allow rules:
Give full rights to root and Alice user.
Give public access to files in my bucket from only specific referers (my websites).
Note:
I manually made all the objects 'public' and my settings for Block Public Access are all set to Off.
Can anyone see any obvious errors in my bucket policy?
I don't understand why my policy seems to only work for some files.
Thank you so much
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::MY_BUCKET/*",
"Condition": {
"StringNotLike": {
"aws:Referer": [
"https://mywebsite1.com/*",
"https://mywebsite2.com/*"
]
}
}
},
{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::MY_BUCKET/*",
"Condition": {
"StringLike": {
"aws:Referer": [
"https://mywebsite1.com/*",
"https://mywebsite2.com/*"
]
}
}
},
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::426873019732:root",
"arn:aws:iam::426873019732:user/alice"
]
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::MY_BUCKET",
"arn:aws:s3:::MY_BUCKET/*"
]
}
]
}
Controlling access via aws:Referer is not secure. It can be overcome quite easily. A simple web search will provide many tools that can accomplish this.
The more secure method would be:
Keep all objects in your Amazon S3 bucket private (do not "Make Public")
Do not use a Bucket Policy
Users should authenticate to your application
When a user wishes to access one of the videos, or when your application creates an HTML page that refers/embeds a video, the application should determine whether the user is entitled to access the object.
If the user is entitled to access the object, the application creates an Amazon S3 pre-signed URL, which provides time-limited access to a private object.
When the user's browser requests to retrieve the object via the pre-signed URL, Amazon S3 will verify the contents of the URL. If the URL is valid and the time limit has not expired, Amazon S3 will return the object (eg the video). If the time has expired, the contents will not be provided.
The pre-signed URL can be created in a couple of lines of code and does not require and API call back to Amazon S3.
The benefit of using pre-signed URLs is that your application determines who is entitled to view objects. For example, a user could choose to share a video with another user. Your application would permit the other user to view this shared video. It would not require any changes to IAM or bucket policies.
See: Amazon S3 pre-signed URLs
Also, if you wish to grant access to an Amazon S3 bucket to specific IAM Users (that is, users within your organization, rather than application users), it is better to grant access on the IAM User rather than via an Amazon S3 bucket. If there are many users, you can create an IAM Group that contains multiple IAM Users, and then put the policy on the IAM Group. Bucket Policies should generally be used for granting access to "everyone" rather than specific IAM Users.
In general, it is advisable to avoid using Deny policies since they can be difficult to write correctly and might inadvertently deny access to your Admin staff. It is better to limit what is being Allowed, rather than having to combine Allow and Deny.

AWS S3 data lake cross account usage

we have the following scenario:
AWS Account A (application) writes data from an application to an S3 bucket owned by account B (data lake). The analysts in account C (reporting) want to proccess the data and build reports and dashboards on top of it.
Account A can write data to the data lake with --acl bucket-owner-full-control to allow Account B the access. But Account C still cannot see and process the data.
One (in our eyes bad) solution is to copy the data to the same location (overwrite) as account B, effectively taking ownership for the data in the process and eliminating the issue. We don't want it, because ... ugly
We tried assuming roles in the different accounts, but it does not work for all our infrastructure. E.g. S3 access via CLI or console is OK, but using it from EMR in account C does not. Also we have on-premise infrastructure (local taskrunners), where this mechanism is not an option.
Maintaining IAM roles for all accounts and users is too much effort. We aim for an automatic solution, not one that we have to take action every time a new user or account is added.
Do you have any suggestions?
One nice and clean way is to use a bucket policy granting read access to the external account (account C) by supplying the account ARN as the principal.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Grant read access to reporting account",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::insertReportingAccountIdHere:root"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetObject",
"s3:GetObjectAcl"
],
"Resource": [
"arn:aws:s3:::yourdatalakebucket",
"arn:aws:s3:::yourdatalakebucket/*"
]
}
]
}
This lets the reporting account manage the (ListBucket, gGtObject) permissions on the bucket for its own users, meaning you can now create an IAM policy on Account C with the permission to fetch data from the specified data lake bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Allow reading files from the data lake",
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetObject",
"s3:GetObjectAcl"
],
"Resource": [
"arn:aws:s3:::yourdatalakebucket",
"arn:aws:s3:::yourdatalakebucket/*"
]
}
]
}
This policy can then be attached to any Account C IAM role or user group you want. For example, you could attach it to your standard Developer or Analyst roles to give access to large groups of users, or you could attach it to a service role to give a particular service access to the bucket.
There is a guide on the Amazon S3 documentation site on how to do this.
You can do via the following documentation,
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_enable-console-saml.html
Steps:
Create SAML provider
Create Role for the SAML provider, example below
Assign the users role based on saml conditions
E.g., You can create S3 Readers, S3 Writers and assign permissions based on that.
Example Assume Role with SAML:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"Federated": "arn:aws:iam::ACCOUNT-ID-WITHOUT-HYPHENS:saml-provider/ExampleOrgSSOProvider"},
"Action": "sts:AssumeRoleWithSAML",
"Condition": {"StringEquals": {
"saml:edupersonorgdn": "ExampleOrg",
"saml:aud": "https://signin.aws.amazon.com/saml"
}}
}]
}
Hope it helps.
In our case, we solved it using roles in the DataLake account (B), both for write (WriterRole) and read (ReaderRole) access. When writing to the DataLake from Account A, your writer assumes the "WriterRole" in Account B, that has the required permission. When reading from Account C, you assume the "ReaderRole".
The issues with EMR reading, we solved with EMRFS using IAM roles for reading (https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-emrfs-iam-roles.html)

S3 IAM Policy to access other account

We need to create an IAM user that is allowed to access buckets in our client's S3 accounts (provided that they have allowed us access to those buckets as well).
We have created an IAM user in our account with the following inline policy:
{
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts",
"s3:PutObject",
"s3:ListBucketMultipartUploads",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::*"
}
]
}
In addition to this, we will request that our clients use the following policy and apply it to their relevant bucket:
{
"Version": "2008-10-17",
"Id": "Policy1416999097026",
"Statement": [
{
"Sid": "Stmt1416998971331",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::229569340673:user/our-iam-user"
},
"Action": [
"s3:AbortMultipartUpload",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::client-bucket-name/*"
},
{
"Sid": "Stmt1416999025675",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::229569340673:user/our-iam-user"
},
"Action": [
"s3:ListBucketMultipartUploads",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::client-bucket-name"
}
]
}
Whilst this all seems to work fine, the one major issue that we have discovered is our own internal inline policy seems to give full access to our-iam-user to all of our own internal buckets.
Have we mis-configured something, or are we missing something else obvious here?
According to AWS support, this is not the right way to approach the problem:
https://forums.aws.amazon.com/message.jspa?messageID=618606
I am copying the answer from them here.
AWS:
The policy you're using with your IAM user grants access to any Amazon S3 bucket. In this case this will include any S3 bucket in your account and any bucket in any other account, where the account owner has granted your user access. You'll want to be more specific with the policy of your IAM user. For example, the following policy will limit your IAM user access to a single bucket.
You can also grant access to an array of buckets, if the user requires access to more than one.
Me
Unfortunately, we don't know beforehand all of our client's bucket names when we create the inline policy. As we get more and more clients to our service, it would be impractical to keep adding new client bucket names to the inline policy.
I guess another option is to create a new AWS account used solely for the above purpose - i.e. this account will not itself own anything, and will only ever be used for uploading to client buckets.
Is this acceptable, or are there any other alternatives options open to us?
AWS
Having a separate AWS account would provide clear security boundaries. Keep in mind that if you ever create a bucket in that other account, the user would inherit access to any bucket if you grant access to "arn:aws:s3:::*".
Another approach would be to use blacklisting (note whitelisting as suggested above is a better practice).
As you can see, the 2nd statement explicitly denies access to an array of buckets. This will override the allow in the first statment. The disadvantage here is that by default the user will inherit access to any new bucket. Therefore, you'd need to be diligent about adding new buckets to the blacklist. Either approach will require you to maintain changes to the policy. Therefore, I recommend my previous policy (aka whitelisting) where you only grant access to the S3 buckets that the user requires.
Conclusion
For our purposes, the white listing/blacklisting approach is not acceptable because we don't know before all the buckets that will be supplied by our clients. In the end, we went the route of creating a new AWS account with a single user, and that user does not have of its own s3 buckets
The policy you grant to your internal user gives this user access to all S3 bucket for the API listed (the first policy in your question). This is unnecessary as your client's bucket policies will grant your user required privileges to access to client's bucket.
To solve your problem, remove the user policy - or - explicitly your client's bucket in the list of allowed [Resources] instead of using "*"