I have an AWS S3 bucket with different datasets in it. I need to share data with public users and internal users securely with different access levels. For example, one needs to use data for Tableau. The data also is needed to share with the public users who does not have AWS Accounts as a bidirectional workflow (they need to read and write). I know that using AWS S3 Access Points we can create different users with different access levels, but I'm assuming that Access Points is just usable within the AWS Account and for Internal purposes. How can we expose data securely for both internal and external users? Does Amazon Cognito works for this scenario?
Related
Currently, we use AWS IAM User permanent credentials to transfer customers' data from our company's internal AWS S3 buckets to customers' Google BigQuery tables following BigQuery Data Transfer Service documentation.
Using permanent credentials possesses security risks related to the data stored in AWS S3.
We would like to use AWS IAM Role temporary credentials, which require the support of a session token on the BiqQuery side to get authorized on the AWS side.
Is there a way that the BigQuery Data Transfer Servce can use AWS IAM roles or temporary credentials to authorise against AWS and transfer data?
We considered Omni framework (https://cloud.google.com/bigquery/docs/omni-aws-cross-cloud-transfer) to transfer data from S3 to BQ, however, we faced several concerns/limitations:
Omni framework targets data analysis use-case rather than data transfer from external services. This concerns us that the design of Omni framework may have drawbacks in relation to data transfer at high scale
Omni framework currently supports only AWS-US-EAST-1 region (we require support at least in AWS-US-WEST-2 and AWS-EU-CENTRAL-1 and corresponding Google regions). This is not backward compatible with current customers' setup to transfer data from internal S3 to customers' BQ.
Our current customers will need to signup for Omni service to properly migrate from the current transfer solution we use
We considered a workaround with exporting data from S3 through staging in GCS (i.e. S3 -> GCS -> BQ), but this will also require a lot of effort from both customers and our company's sides to migrate to the new solution.
Is there a way that the BigQuery Data Transfer Servce can use AWS IAM roles or temporary credentials to authorise against AWS and transfer data?
No unfortunately.
The official Google BigQuery Data Transfer Service only mentions AWS access keys all throughout the documentation:
The access key ID and secret access key are used to access the Amazon S3 data on your behalf. As a best practice, create a unique access key ID and secret access key specifically for Amazon S3 transfers to give minimal access to the BigQuery Data Transfer Service. For information on managing your access keys, see the AWS general reference documentation.
The irony of the Google documentation is that while it refers to best practices and links to the official AWS docs, it actually doesn't endorse best practices and ignores what AWS mention:
We recommend that you use temporary access keys over long term access keys, as mentioned in the previous section.
Important
Unless there is no other option, we strongly recommend that you don't create long-term access keys for your (root) user. If a malicious user gains access to your (root) user access keys, they can completely take over your account.
You have a few options:
hook into both sides manually (i.e. link up various SDKs and/or APIs)
find an alternative BigQuery-compatible service, which does as such
accept the risk of long-term access keys.
In conclusion, Google is at fault here of not following security best practices and you - as a consumer - will have to bear the risk.
Suppose you have to share data with a third party over the internet and the data is stored in AWS. What would be the most secure and easy way to do this?
Since sending mail is not very secure, i thought of the solution of creating a S3 bucket and run a SFTP server (with AWS Family) on it. Is there a better solution in AWS to achieve this?
This depends on how you want to "share data" and where that data resides.
Let's say you have an object in Amazon S3 that you would like to make available. There are several options for sharing access:
You could create an Amazon S3 pre-signed URL, which provides time-limited access to a private object. This is similar to storing something in DropBox and using the "Get Link" command to obtain a special URL that provides access to the object.
If the other people have their own AWS Account, you could share a specific bucket or an object with them. This has the benefit that you could put objects in a bucket and they can retrieve any of them whenever they wish.
You could write a web application that requires users to authenticate and then gives them the ability to access objects in Amazon S3. This would be similar to a photo-sharing website, where people login and can access/share photos. You would be responsible for writing this application and managing the authentication.
Update
Based on the information you provided (S3, few users, automated), the easiest method would probably be to have the other users sign-up to AWS or provide them with IAM access credentials from your own AWS Account (not recommended if you have large numbers of such users).
You can grant permission for them to access your data and they could use the AWS Command-Line Interface (CLI) to access/download the data. This can be automated with the aws s3 cp and aws s3 sync commands.
I am developing an application with django rest and one of the features is to let the user store ID cards and driver license. I am thinking of using amazon AWS S3 to store the files.
Is that secure enough for that functionality? What is usually used for that type of files?
Amazon Simple Storage Service (S3)
It allows you to store an infinite amount of data that can be accessed
programmatically via different methods like REST API, SOAP, web
interface, and more. It is an ideal storage option for videos, images
and application data.
Features:
Fully managed
Store in buckets
Versioning
Access control lists and bucket policies
AES-256 bit encryption at rest
Private by default
Best used for:
Hosting entire static websites
Static web content and media
Store data for computation and large-scale analytics, like analyzing
financial transactions, clickstream analytics, and media transcoding
Disaster recovery solutions for business continuity
Secure solution for backup & archival of sensitive data
Use encryption to protect your data:
If your use case requires encryption during transmission, Amazon S3 supports the HTTPS protocol, which encrypts data in transit to and from Amazon S3. All AWS SDKs and AWS tools use HTTPS by default
Restrict access to your S3 resources:
By default, all S3 buckets are private and can be accessed only by users that are explicitly granted access. When using AWS, it's a best practice to restrict access to your resources to the people that absolutely need it, you can see in that Doc.
I would go with aws s3 for such a use case where I want to store this kind of information.
Setting default server-side encryption behavior for Amazon S3 buckets. Depending on the type of setup and amount of money I am willing to spend, I would choose to go with Customer Managed Key for encrypting the bucket.
Considering the I am going through all the security checks AWS recommends How can I secure the files in my Amazon S3 bucket?.
Enable replication, Versioning, Logging and maybe IP based access for all the good keeping.
S3 provides all kinds of bells and whistles for security in that case.
I have 1 s3 bucket per customer. Customers are external entities and they dont share data with anyone else. I write to S3 and customer reads from S3. As per this architecture, I can only scale to 1000 buckets as there is a limit to s3 buckets per account. I was hoping to use APs to create 1 AP per customer and put data in one bucket. The customer can then read the files from the bucket using AP.
Bucket000001/prefix01 . -> customeraccount1
Bucket000001/prefix02 . -> customeraccount2
...
S3 access points require you to set policy for a IAM user in access point as well as the bucket level. If I have 1000s of IAM users, do I need to set policy for each of them in the bucket? This would result in one giant policy. there is a max policy size in the bucket, so I may not be able to do that.
Is this the right use case where access points can help?
The recommended approach would be:
Do NOT assign IAM Users to your customers. These types of AWS credentials should only be used by your internal staff and your own applications.
You should provide a web application (or an API) where customers can authenticate against your own user database (or you could use Amazon Cognito to manage authentication).
Once authenticated, the application should grant access either to a web interface to access Amazon S3, or the application should provide temporary credentials for accessing Amazon S3 (more details below).
Do not use one bucket per customer. This is not scalable. Instead, store all customer data in ONE bucket, with each user having their own folder. There is no limit on the amount of data you can store in Amazon S3. This also makes it easier for you to manage and maintain, since it is easier to perform functions across all content rather than having to go into separate buckets. (An exception might be if you wish to segment buckets by customer location (region) or customer type. But do not use one bucket per customer. There is no reason to do this.)
When granting access to Amazon S3, assign permissions at the folder-level to ensure customers only see their own data.
Option 1: Access via Web Application
If your customers access Amazon S3 via a web application, then you can code that application to enforce security at the folder level. For example, when they request a list of files, only display files within their folder.
This security can be managed totally within your own code.
Option 2: Access via Temporary Credentials
If your customers use programmatic access (eg using the AWS CLI or a custom app running on their systems), then:
The customer should authenticate to your application (how this is done will vary depending upon how you are authenticating users)
Once authenticated, the application should generate temporary credentials using the AWS Security Token Service (STS). While generating the credentials, grant access to Amazon S3 but specify the customer's folder in the ARN (eg arn:aws:s3:::storage-bucket/customer1/*) so that they can only access content within their folder.
Return these temporary credentials to the customer. They can then use these credentials to make API calls directly to Amazon S3 (eg from the AWS Command-Line Interface (CLI) or a custom app). They will be limited to their own folder.
This approach is commonly done with mobile applications. The mobile app authenticates against the backend, receives temporary credentials, then uses those credentials to interact directly against S3. Thus, the back-end app is only used for authentication.
Examples on YouTube:
5 Minutes to Amazon Cognito: Federated Identity and Mobile App Demo
Overview Security Token Service STS
AWS: Use the Session Token Service to Securely Upload Files to S3
We have some way to achieve your goal.
use IAM group to grant access to a folder. Create a group, add a user to a group, and assign a role to the group to access the folder.
Another way is to use bucket policy (${aws:username} in Condition) to grant Access to User-Specific Folders. Refer to this link https://aws.amazon.com/blogs/security/writing-iam-policies-grant-access-to-user-specific-folders-in-an-amazon-s3-bucket/
Is it possible to give different access to different buckets in s3? In detail, I have 10 different buckets in s3 and each of those bucket related to different people. So I want to give them access only to their particular bucket(by sharing a URL or something like that)
Is this possible?
The normal way to assign access is:
Permanent credentials (eg associate with an IAM User) are only provided to internal IT staff who are managing or using the AWS services.
End users of a web application should be authenticated by the application (eg using Amazon Cognito, LDAP, AD, Google). The application will then be responsible for generating Pre-Signed URLs for uploading and downloading files.
For mobile applications, it is quite common to create temporary credentials using the Security Token Service, which allows the mobile app to directly make AWS API calls. The credentials can be given limited permissions, such as only being able to access one S3 bucket.
So, it really comes down to 'how' the users will be accessing the bucket. If they are doing it directly, then provide temporary credentials via STS. If they are doing it via an application, then the application will be responsible for providing individual access to upload/download.
By the way, it's not necessarily a good idea to give a different bucket to every user, because there is a limit on the number of buckets you can create. Instead, you could give access to separate paths within the same bucket. Proper use of permissions will ensure they cannot see/impact other users' data.
For how this works with IAM Users, see: Variables in AWS Access Control Policies | AWS News Blog