How to keep track of data uploaded by user with AWS? - amazon-web-services

I'm building a platform whereby users upload data to us. Users will upload data to us, whereby the amount of data is appropriately 750 MB for transaction.
What is the standard way to keep track of the data uploaded by user? How do you organize the data by user on S3 buckets?
A simple scheme could be to tag/prefix each item uploaded to the S3 buckets by the username, and use this logic in our application to allow users to work with these files. We can then keep track of username + data uploaded within a database (like AWS Dynamo).
Things get a bit complicated when I start thinking about features allowing groups to access these files of course....
Is there a better approach for this task on AWS? It feels like a standard problem.

AWS does not have build-in tools for keeping track of uploads per "user" nor any upload limits. This is what you, as a developer on AWS, need to design and implement. DynamoDB is a popular choice to keep track of S3 uploads and limits per user in your application.
Regarding organization. Well it depends. If your users login through Cognito to your application, each user will have IAM federated identity associated with them. Thus, you can organize the bucket and control user access using this feature, as shown for instance in the following link:
Amazon S3: Allows Amazon Cognito users to access objects in their bucket
User groups, could also be managed through Congito.

Related

How to organize the data stored in S3 buckets by users on AWS?

I've allocated AWS buckets for my own projects and my own use, but I've never set them up for other users to use via a website. This will be an abstract question, as I don't know how to move forward:
I am building a platform whereby registered users at one point can upload data. The amount of data is appropriately 100 MB for transaction, which isn't small.
How do I organize the data stored by user, by bucket? I suspect I can keep track of all transactions by tagging username, or tagging by other metadata. Is there a more scalable approach? This feels like it would take a great deal of manual work.
You could use metadata or tags on your bucket objects. For instance you could tag with a username.
To prevent load on your system, you could put an API Gateway in front of your S3 bucket and allow API Gateway's throttling functionality to prevent certain load. Check out this page.

Limited access to AWS S3 bucket

I am trying to understand access security as it relates to Amazon S3. I want to host some files in an S3 bucket, using CloudFront to access it via my domain. I need to limit access to certain companies/individuals. In addition I need to manage that access individually.
A second access model is project based, where I need to make a library of files available to a particular project team, and I need to be able to add and remove team members in an ad hoc manner, and then close access for the whole project at some point. The bucket in question might be the same for both scenarios.
I assume something like this is possible in AWS, but all I can find (and understand) on the AWS site involves using IAM to control access via the AWS console. I don't see any indication that I could create an IAM user, add them to an IAM group, give the group read only access to the bucket and then provide the name and password via System.Net.WebClient in PowerShell to actually download the available file. Am I missing something, and this IS possible? Or am I not correct in my assumption that this can be done with AWS?
I did find Amazon CloudFront vs. S3 --> restrict access by domain? - Stack Overflow that talks about using CloudFront to limit access by Domain, but that won't work in a WfH scenario, as those home machines won't be on the corporate domain, but the corporate BIM Manager needs to manage access to content libraries for the WfH staff. I REALLY hope I am not running into an example of AWS just not being ready for the current reality.
Content stored in Amazon S3 is private by default. There are several ways that access can be granted:
Use a bucket policy to make the entire bucket (or a directory within it) publicly accessible to everyone. This is good for websites where anyone can read the content.
Assign permission to IAM Users to grant access only to users or applications that need to access to the bucket. This is typically used within your organization. Never create an IAM User for somebody outside your organization.
Create presigned URLs to grant temporary access to private objects. This is typically used by applications to grant web-based access to content stored in Amazon S3.
To provide an example for pre-signed URLs, imagine that you have a photo-sharing website. Photos provided by users are private. The flow would be:
A user logs in. The application confirms their identity against a database or an authentication service (eg Login with Google).
When the user wants to view a photo, the application first checks whether they are entitled to view the photo (eg it is their photo). If they are entitled to view the photo, the application generates a pre-signed URL and returns it as a link, or embeds the link in an HTML page (eg in a <img> tag).
When the user accesses the link, the browser sends the URL request to Amazon S3, which verifies the encrypted signature in the signed URL. If if it is correct and the link has not yet expired, the photo is returned and is displayed in the web browser.
Users can also share photos with other users. When another user accesses a photo, the application checks the database to confirm that it was shared with the user. If so, it provides a pre-signed URL to access the photo.
This architecture has the application perform all of the logic around Access Permissions. It is very flexible since you can write whatever rules you want, and then the user is sent to Amazon S3 to obtain the file. Think of it like buying theater tickets online -- you just show the ticket and the door and you are allowed to sit in the seat. That's what Amazon S3 is doing -- it is checking the ticket (signed URL) and then giving you access to the file.
See: Amazon S3 pre-signed URLs
Mobile apps
Another common architecture is to generate temporary credentials using the AWS Security Token Service (STS). This is typically done with mobile apps. The flow is:
A user logs into a mobile app. The app sends the login details to a back-end application, which verifies the user's identity.
The back-end app then uses AWS STS to generate temporary credentials and assigns permissions to the credentials, such as being permitted to access a certain directory within an Amazon S3 bucket. (The permissions can actually be for anything in AWS, such as launching computers or creating databases.)
The back-end app sends these temporary credentials back to the mobile app.
The mobile app then uses those credentials to make calls directly to Amazon S3 to access files.
Amazon S3 checks the credentials being used and, if they have permission for the files being requests, grants access. This can be done for uploads, downloads, listing files, etc.
This architecture takes advantage of the fact that mobile apps are quite powerful and they can communicate directly with AWS services such as Amazon S3. The permissions granted are based upon the user who logs in. These permissions are determined by the back-end application, which you would code. Think of it like a temporary employee who has been granted a building access pass for the day, but they can only access certain areas.
See: IAM Role Archives - Jayendra's Blog
The above architectures are building blocks for how you wish to develop your applications. Every application is different, just like the two use-cases in your question. You can securely incorporate Amazon S3 in your applications while maintaining full control of how access is granted. Your applications can then concentrate on the business logic of controlling access, without having to actually serve the content (which is left up to Amazon S3). It's like selling the tickets without having to run the theater.
You ask whether Amazon S3 is "ready for the current reality". Many of the popular web sites you use every day run on AWS, and you probably never realize it.
If you are willing to issue IAM User credentials (max 5000 per account), the steps would be:
Create an IAM User for each user and select Programmatic access
This will provide an Access Key and Secret Key that you can provide to each user
Attach permissions to each IAM User, or put the users in an IAM Group and attach permissions to the IAM Group
Each user can run aws configure on their computer (using the AWS Command-Line Interface (CLI) to store their Access Key and Secret Key
They can then use the AWS CLI to upload/download files
If you want the users to be able to access via the Amazon S3 management console, you will need to provide some additional permissions: Grant a User Amazon S3 Console Access to Only a Certain Bucket
Alternatively, users could use a program like CyberDuck for an easy Drag & Drop interface to Amazon S3. Cyberduck will also ask for the Access Key and Secret Key.

AWS architecture to handle rights management for file access in AWS S3

In a nutshell: What is the best way to give and control end user access to files stored in a S3 bucket with specific access rules determined for each files by which “group” the end user belong to and what is his role in that “group”, when there is a lot of dynamically defined “group” (more than 100 000) and each user can be part of several “groups” (more than 1000).
I am in a team where we are developing a product based on AWS lambda, accessible with a web app. The product is developed using micro service architecture.
To explain our use case, let's imagine we have 3 micro services:
User service, that is in fact AWS Cognito (handle user and authorization in the whole platform)
Company service. Developed by us, based on AWS Lambda and dynamoDB. That manage company information (name, people, and other metadata that I will not explain here)
Document service. This service, that we need to develop, need to handle documents that belongs to a company.
In terms of architecture, we have some difficulty to handle the following use case:
We would like that people that belong to one or multiple companies can have access to that documents (files). These people may have some role inside the company (Executive, HR, Sales). Depending of these roles, people may have access to only a subpart of company documents.
Of course, people that do not belong to a company will not have access to that company documents.
To handle such use cases, we would like to use AWS S3, and if possible, without redeveloping our own micro service that may proxify AWS S3.
The problem is: How we can manage rights with AWS S3 for our use case ?
We have investigated multiple solutions.
Using IAM policies that restrict S3 file access (the WEB app access S3 directly, no proxy).
If our S3 bucket is organized by company name/UUID (folders at the root dir of S3), we can think about creating an IAM policy every time we create a company and configure it so that every user in a company have access to the company folder, and only that folder.
Create a bucket for each company is not possible because AWS limit the number of S3 bucket to 100 (or 1000) per AWS account. And our product may have more than 1000 companies
Putting user in group (group == 1 company) is not possible because the number of groups per user pool is 500.
Using lamda#edge that proxify AWS S3 call to verify that file URI in S3 is authorized for the requested user (user belongs to the company and have the right roles to read its documents). This Lambda#edge will call an internal service to know if this user is authorized to get files from this company (based on the called URL)
Using AWS S3 Pre Signed URL. We can create our own document-service, that expose CREATE, GET, DELETE api, that will contact AWS S3 service after having done authorization checking (user belongs to the company) and generate pre signed URL to upload or get a file. Then the user (WebApp) will call S3 directly.
In fact, If I try to summarize our problem, we have some difficulties to handle a mix of RBAC and authorization control inside an AWS product developed with AWS lambda, and exposing AWS S3 to end user.
If you have experience or recommendation for this kind of use case, you advice will be very welcome.
I am answering my question to expose to you our final decision.
We have chosen the solution based on presigned URL, that will let us:
being independent to AWS S3 (it is possible to change from S3 to another file storage service without too many cost)
not exposing S3 API to our client (web application), but just URL where the webapp can do native upload or download files
right management is done inside the service itself (doc-service), that will generate pre signed URL after the authorization is done
information to do rights management come from Cognito (authentication) and company service (authorization)
Bellow, an architecture diagram that expose this, based on AWS lambda:
I'd consider using STS to generate temporary credentials for a certain role and policy (which can be defined dynamically). So basically it's more or less your number 1 except that you don't have to pre-create all these policies, you can construct them dynamically.
Something along the lines:
AWSSecurityTokenService client = AWSSecurityTokenServiceClientBuilder.standard().build();
AssumeRoleRequest request = new AssumeRoleRequest()
.withRoleArn("arn:aws:iam::123456789012:role/sales")
.withRoleSessionName("Scott Tiger")
.withPolicy("{\"Version\":\"2012-10-17\"," +
"\"Statement\":[{\"Sid\":\"Stmt1\",\"Effect\":\"Allow\",\"Action\":\"s3:GetObject\"," +
"\"Resource\":\"arn:aws:s3:::document_storage_bucket/" + company + "/" + department + "/*\"}]}");
AssumeRoleResult response = client.assumeRole(request);
(Sorry for the line breaks.)
This will give you credentials with permissions that are the intersection of the role's identity-based policy and the session policies.
You can then pass these credentials to the user, generate presigned URLs, whatever you need.
As to me, I would go with the 5th solution :
1 - This will allow you to manage your rights exactly the way you design it, without too many constraints. You will also absorb easily any change on your authorization rules.
2 - The document download feature is thus not completely coupled with S3. If you want later to go for an other implementation (EDM, dynamic generation, ...) you can manage that from your gateway, and even use several systems at the same time.

AWS S3 bucket restrict accss

I'd like to set up a website on S3 bucket. The website is for our team admin to submit a list of student names so that those names could be stored in the database.
Now if I'd like all team members are able to view the website, but only allow one person(team admin) to really submit the names, what should I do? I think this is access permission issue, but not quite clear how AWS deal with this. I guess related to IAM users/roles? But exactly what should I do?
Many thanks
================
Forget to mention, my design involves the whole chain like S3/static website, Javascript, Lambda function, API Gateway, DynamoDB. I'm wondering at which step and how should I control the access?
Another thinking is, should I create an account for team admin, so that only he could login and submit? Maybe not necessary?
S3 Websites are static. This means that you cannot execute code to do anything, such as query a database.
To implement your objective, you will need to combine several services.
S3 Websites: Your S3 bucket will store all of the files such as CSS, JavaScript, HTML, Images, ...
JavaScript: When the client accesses your website, JavaScript functions will be loaded with your HTML to provide client based processing.
Amazon Cognito: Cognito will manage authentication. Using STS your clients will receive temporary access keys to access AWS resources.
DynamoDB: This will be your database. Using the access keys from Cognito / STS, users will access the database. The level of access is controlled by your AWS IAM Policies that you created for each user or group of users.
There are lots of examples of this design on the Internet and several "serverless" books have been written with entire designs mapped out.
Yes, you can use IAM roles to provide read/write access to the DB. (short answer)
S3 is only good for hosting your static website, whereas if you wish to restrict read and write controls - I would suggest you switch to either AWS RDS instance or AWS Aurora.
With RDS, you can have a read replica - which will only give read access to viewing users and only you as an admin can insert/update the tables.
This solution would also make your DB's response time better.Since the reads would be handled by different instance and writing by different.
Hope this helps.

aws s3 iam user vs signed urls

I need to allow my users to access and upload content to AWS S3.
Now I have two decisions to make:
Create a separate bucket for each user or store data for each user in different directories in a single bucket.
Allow them access via signed url or create a separate IAM user for everyone?
The process needs to be fully automated and scalable (i.e. many users sign up or stop using the services every day).
Each user uploads files every few seconds. I therefore thought that a separate IAM user would save me the roundtrip to get a signed URL, but I am not sure if it is practical to have potentially thousands of IAM users.
You can only have 100 buckets per AWS account, total not per region so you shouldn't create a bucket per user.
Use keys (folders) to organise the data. Provide access via signed cookies (or urls, but cookies are better for per user) and do the authentication in your application (or use AWS Cognito). IAM isn't really designed for your application end users but for AWS service users.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-choosing-signed-urls-cookies.html