I want to build an application using Amazon Web Services (AWS).
The way the application should work is this;
I make a program that lets the user import a large file in an external format and send it to AWS (S3?) in my own format.
Next many users can access the data from web and desktop applications.
I want to charge per user accessing the data.
The problem is that the data on AWS must be in an unintelligible format or the users may copy the data over to another AWS account where I can not charge them. In other words the user need to do some "decrypting" of the data before they can be used. On the web this must be done in JavaScript which is plaintext and would allow the users to figure out my unintelligible format.
How can I fix this problem?
Is there for instance a built in encryption/decryption mechanism?
Alternatively is there some easy way in AWS to make a server that decrypts the data using precompiled code that I upload to AWS?
In general when you don't want your users to access your application's raw data you just don't make that data public. You should build some sort of server-side process that reads the raw data and serves up what the user is requesting. You can store the data in a database or in files on S3 or wherever you want, just don't make it publicly accessible. Then you can require a user to login to your application in order to access the data.
You could host such a service on AWS using EC2 or Elastic Beanstalk or possibly Lambda. You could also possibly use API Gateway to manage access to the services you build.
Regarding your specific question about a service on AWS that will encrypt your public data and then decrypt it on the fly, there isn't anything that does that out of the box. You would have to build such a service and host it on Amazon, but I don't think that is the right way to go about this at all. Just don't make your data publicly accessible in the first place, and make all requests for data go through some service to verify that the user should be able to access the data. In your case that would mean verifying that the user has paid to access the data they are requesting.
Related
I have a scenario where I want to use different storage services of AWS like S3 and DynamoDB. One user can have information attached to him and saved in different services. Now I want to implement a RESTful API that allows the user to retrieve all of his data.
How can I make sure that I find all data. Is there any architecture or specific service that allows me to find all data attached to a user (user is identified by his ID)?
We're making some machines in which there's a part which uploads the images captured by the camera to Google Cloud Storage. For this purpose what I've done is
Create a service account for each machine.
Create a custom role with
permissions:
storage.objects.create
storage.buckets.get
storage.objects.get
Apply this role to that service account.
Download the JSON credentials key file and use this file with python script (in which I specify bucket name) to upload image to GCP Storage.
Is this way of doing things efficient and secure given that we only ship 2-3 machines each month?
Also I will have to ship JSON file with each machine, if the above method is valid, is this fine or there's any method to hide this key file?
Your case isn't so simple!
Firstly, if you want to put a service account in each machine, you will be limited a day (you are limited to 100 service accounts per project). And using the same service account, or the same key is too dangerous
Secondly, your use case sounds like IoT use case where you have lot of devices on edge to communicate with the cloud. But PubSub messages are limited to 10Mb max and IoT Core solution doesn't fit with your case.
The 2 latest solutions are based on the same principle:
Make an endpoint public (Cloud Run, Cloud Functions, App Engine or whatever you want)
Call this endpoint with your machine, and their own token (i.e. a string, encrypted or not)
Check the token, if OK you can (here the 2 alternatives)
Create an access token (short lived token) on a service account with the minimal permission for the machine usage, and send it back to the machine. The machine will use it to call the Google Cloud API, such as Cloud Storage API. The advantage of this solution is that you will be able to use the access token to reach other GCP APIs in the future if your use case, and your machine update require them.
Create a signedUrl and send it back to the machine. Then the machine has to upload file to this URL. The advantage is the strict limitation to Cloud Storage, no other GCP service.
The main issue with the 2 latest solution is that required public endpoint and you are exposed to attacks on it. You can protect it behind a load balancer and mitigate the attacks with Cloud Armor. Think also to limit the scalability of your public endpoint, to prevent any useless expenses in case of attacks.
I am trying to understand access security as it relates to Amazon S3. I want to host some files in an S3 bucket, using CloudFront to access it via my domain. I need to limit access to certain companies/individuals. In addition I need to manage that access individually.
A second access model is project based, where I need to make a library of files available to a particular project team, and I need to be able to add and remove team members in an ad hoc manner, and then close access for the whole project at some point. The bucket in question might be the same for both scenarios.
I assume something like this is possible in AWS, but all I can find (and understand) on the AWS site involves using IAM to control access via the AWS console. I don't see any indication that I could create an IAM user, add them to an IAM group, give the group read only access to the bucket and then provide the name and password via System.Net.WebClient in PowerShell to actually download the available file. Am I missing something, and this IS possible? Or am I not correct in my assumption that this can be done with AWS?
I did find Amazon CloudFront vs. S3 --> restrict access by domain? - Stack Overflow that talks about using CloudFront to limit access by Domain, but that won't work in a WfH scenario, as those home machines won't be on the corporate domain, but the corporate BIM Manager needs to manage access to content libraries for the WfH staff. I REALLY hope I am not running into an example of AWS just not being ready for the current reality.
Content stored in Amazon S3 is private by default. There are several ways that access can be granted:
Use a bucket policy to make the entire bucket (or a directory within it) publicly accessible to everyone. This is good for websites where anyone can read the content.
Assign permission to IAM Users to grant access only to users or applications that need to access to the bucket. This is typically used within your organization. Never create an IAM User for somebody outside your organization.
Create presigned URLs to grant temporary access to private objects. This is typically used by applications to grant web-based access to content stored in Amazon S3.
To provide an example for pre-signed URLs, imagine that you have a photo-sharing website. Photos provided by users are private. The flow would be:
A user logs in. The application confirms their identity against a database or an authentication service (eg Login with Google).
When the user wants to view a photo, the application first checks whether they are entitled to view the photo (eg it is their photo). If they are entitled to view the photo, the application generates a pre-signed URL and returns it as a link, or embeds the link in an HTML page (eg in a <img> tag).
When the user accesses the link, the browser sends the URL request to Amazon S3, which verifies the encrypted signature in the signed URL. If if it is correct and the link has not yet expired, the photo is returned and is displayed in the web browser.
Users can also share photos with other users. When another user accesses a photo, the application checks the database to confirm that it was shared with the user. If so, it provides a pre-signed URL to access the photo.
This architecture has the application perform all of the logic around Access Permissions. It is very flexible since you can write whatever rules you want, and then the user is sent to Amazon S3 to obtain the file. Think of it like buying theater tickets online -- you just show the ticket and the door and you are allowed to sit in the seat. That's what Amazon S3 is doing -- it is checking the ticket (signed URL) and then giving you access to the file.
See: Amazon S3 pre-signed URLs
Mobile apps
Another common architecture is to generate temporary credentials using the AWS Security Token Service (STS). This is typically done with mobile apps. The flow is:
A user logs into a mobile app. The app sends the login details to a back-end application, which verifies the user's identity.
The back-end app then uses AWS STS to generate temporary credentials and assigns permissions to the credentials, such as being permitted to access a certain directory within an Amazon S3 bucket. (The permissions can actually be for anything in AWS, such as launching computers or creating databases.)
The back-end app sends these temporary credentials back to the mobile app.
The mobile app then uses those credentials to make calls directly to Amazon S3 to access files.
Amazon S3 checks the credentials being used and, if they have permission for the files being requests, grants access. This can be done for uploads, downloads, listing files, etc.
This architecture takes advantage of the fact that mobile apps are quite powerful and they can communicate directly with AWS services such as Amazon S3. The permissions granted are based upon the user who logs in. These permissions are determined by the back-end application, which you would code. Think of it like a temporary employee who has been granted a building access pass for the day, but they can only access certain areas.
See: IAM Role Archives - Jayendra's Blog
The above architectures are building blocks for how you wish to develop your applications. Every application is different, just like the two use-cases in your question. You can securely incorporate Amazon S3 in your applications while maintaining full control of how access is granted. Your applications can then concentrate on the business logic of controlling access, without having to actually serve the content (which is left up to Amazon S3). It's like selling the tickets without having to run the theater.
You ask whether Amazon S3 is "ready for the current reality". Many of the popular web sites you use every day run on AWS, and you probably never realize it.
If you are willing to issue IAM User credentials (max 5000 per account), the steps would be:
Create an IAM User for each user and select Programmatic access
This will provide an Access Key and Secret Key that you can provide to each user
Attach permissions to each IAM User, or put the users in an IAM Group and attach permissions to the IAM Group
Each user can run aws configure on their computer (using the AWS Command-Line Interface (CLI) to store their Access Key and Secret Key
They can then use the AWS CLI to upload/download files
If you want the users to be able to access via the Amazon S3 management console, you will need to provide some additional permissions: Grant a User Amazon S3 Console Access to Only a Certain Bucket
Alternatively, users could use a program like CyberDuck for an easy Drag & Drop interface to Amazon S3. Cyberduck will also ask for the Access Key and Secret Key.
I was trying to understand the Google Cloud Platform storage but couldn't really comprehend the language used in the documentation. I wanted to ask if you could use the storage and the APIs to store photos users take within your application and also get the images back if provided with a URL? and even if you can, would it be a safe and reasonable method to do so?
Yes you can pretty much use a storage bucket to store any kind of data.
In terms of transferring images from an application to storage buckets, the application must be authorised to write to the bucket.
One option is to use a service account key within the application. A service account is a special account that can be used by an application to authorise to various Google APIs, including the storage API.
There is some more information about service accounts here and information here about using service account keys. These keys can be used within your application, and allow the application to inherit the permission/scopes assigned to that service account.
In terms of retrieving images using a URL, one possible option would be to use signed URLs which would allow you to give users read or write access to an object (in your case images) in a bucket for a given amount of time.
Access to bucket objects can also be controlled with ACL (Access Control Lists). If you're happy for you images to be available publicly (i.e. accessible to everybody), it's possible to set an ACL with 'Reader' access for AllUsers.
More information on this can be found here.
Should you decide to make the images available publically, the URL format to retrive the object/image from the bucket would be:
https://storage.googleapis.com/[BUCKET_NAME]/[OBJECT_NAME]
EDIT:
In relation to using an interface to upload the files before the files land in the bucket, one option would be to have a instance with an external IP address (or multiple instances behind a Load Balancer) where the images are initially uploaded. You could mount Cloud Storage to this instance using FUSE, so that uploaded files are easily transferred to the bucket. In terms of databases you have the option of manually installing your database on a Compute Engine instance, or using a fully managed database service such as Cloud SQL.
I have an application where users are part of a 'group' of users. Each group can 'upload' documents to the application. Behind the scenes I am using S3 to store these documents.
I've spent a ton of time reading the AWS documentation but still don't understand the simplest/correct way to do the following:
User 1 in group A can upload documents to application
User 2 in group A can see and access all group A documents in application
User 3 in group B can upload documents to application
User 3 in group B cannot see any documents that belong to group A (and vice-versa)
Should I be using the API to create a new bucket for each 'group'?
Or can all of this be done in a single bucket with subdirectories for each group & then set access limitations?
Should I be setting up an IAM group policy and applying it to each web app user?
I'm not sure of the best architecture for this scenario so would really appreciate a point in the right direction.
AWS credentials should be assigned to your application and to your IT staff who need to maintain the application.
Users of your application should not be given AWS credentials.
Users should interact directly with your application and your application will make calls to the AWS API from the back-end. This way, your application has full control of what data they can see and what operations they can perform.
Think of it like a database -- you never want to give users direct access to a database. Instead, they should always interact via an application, which will store and update information in a database.
There are some common exceptions to the above:
If you want users to access/download a file stored in S3, your application can generate a pre-signed URL, which is a time-limited URL that permits access to an Amazon S3 object. Your application is responsible for generating the URL when it wants to grant access and the URl can be included in an HTML page (eg show a private picture on a web page).
If you want to allow users to upload files directly to S3, you could again use a pre-signed URL or you could grant public Write access to an Amazon S3 bucket. Think of it like a modern FTP server.
Bottom line: Your application is in charge! Also, consider using pre-signed URLs to provide direct access to objects when the application permits it.