AWS S3 object with data sensitive object names - amazon-web-services

We name the S3 object name with the birthday of the employees. It is stupid. We want to avoid creating object name with sensitive data. Is it safe to store the sensitive data using S3 user-defined metadata or Add an S3 bucket policy that denies the action S3:Getobject. Which will work?

As you mentioned; its not a good idea to create object name with sensitive data; but its ok... Not too bad also.. I will suggest to remove listAllObjects() permissions in the S3 policy. Policy should only allow getObject() which means anyone can get the object ONLY when they know object name; i.e. when calling api already knows DOB of the user.
With listAllObjects() permissions; caller can list all the objects in the bucket and get DOB of users.

Object keys and user metadata should not be used for sensitive data. The reasoning behind object keys is readily apparent, but metadata may be less obvious;
metadata is returned in the HTTP headers every time an object is fetched. This can't be disabled, but it can be worked around with CloudFront and Lambda#Edge response triggers, which can be used to redact the metadata when the object is downloaded through CloudFront; however,
metadata is not stored encrypted in S3, even if the object itself is encrypted.
Object tags are also not appropriate for sensitive data, because they are also not stored encrypted. Object tags are useful for flagging objects that contain sensitive data, because tags can be used in policies to control access permissions on the object, but this is only relevant when the object itself contains the sensitive data.
In the case where "sensitive" means "proprietary" rather than "personal," tags can be an acceptable place for data... this might be data that is considered sensitive from a business perspective but that does not need to be stored encrypted, such as the identification of a specific software version that created the object. (I use this strategy so that if a version of code is determined later to have a bug, I can identify which objects might have been impacted because they were generated by that version). You might want to keep this information proprietary but it would not be "sensitive" in this context.

If your s3 bucket is used to store private data and your allowing public access to the bucket this is always a bad idea - it's basically security by obscurity.
Instead of changing your existing s3 structure you could lock down the bucket to just your app then you serve the data via cloudfront signed urls?
Basically in your code where you currently inject the s3 url You can instead call the aws api to create a signed url from the s3 url and a policy and send this new url to the end user. This would mask the s3 url, and you can enforce other restrictions like how long the link is valid, enforce requiring a specific header or limit access to a specific ip etc. You also get cdn edge caching and reduced costs as side benefits.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html

Related

How to mass / bulk PutObjectAcl in Amazon S3?

I am looking for a way to update several objects ACL in one (of few request) to the AWS API.
My web application contains several sensitive objects stored in AWS S3. This object have a default ACL to "private". I sometimes need to update several objects ACL to "public-read" for some time (a couple of minutes) before going back to "private".
For a couple of objects, one request per object to PutObjectAcl is ok. But when dealing with several objects (hundreds), the operation requires to much time.
My question is : how can I "mass put object acl" or "bulk put object acl" ? The AWS API doesn't contain a specific answer, like DeleteObjects (which allows to delete several objects at once). But may be I didn't look in the right place ?!
Any tricks or way to work around that would be of great value !
Mixing private and public objects inside a bucket is usually a bad idea. If you only need those objects to be public for a couple of minutes, you can create a pre-signed GET URL and set a desired expiration time.

Storing S3 Urls vs calling listObjects

I have an app that has an attachments feature for users. They can upload documents to S3 and then revisit and preview and/or Download said attachments.
I was planning on storing the S3 urls in DB and then pre-signing them when the User needs them. I'm finding a caveat here is that this can lead to edge cases between S3 and the DB.
I.e. if a file gets removed from S3 but its url does not get removed from DB (or vice-versa). This can lead to data inconsistency and may mislead users.
I was thinking of just getting the urls via the network by using listObjects in the s3 client SDK. I don't really need to store the urls and this guarantees the user gets what's actually in S3.
Only con here is that it makes 1 API request (as opposed to DB hit)
Any insights?
Thanks!
Using a database to store an index to files is a good idea, especially once the volume of objects increases. The ListObjects() API only returns 1000 objects per call. This might be okay if every user has their own path (so you can use ListObjects(Prefix='user1/'), but that's not ideal if you want to allow document sharing between users.
Using a database will definitely be faster to obtain a listing, and it has the advantage that you can filter on attributes and metadata.
The two systems will only get "out of sync" if objects are created/deleted outside of your app, or if there is an error in the app. If this concerns you, then use Amazon S3 Inventory, to provide a regular listing of objects in the bucket and write some code to compare it against the database entries. This will highlight if anything is going wrong.
While Amazon S3 is an excellent NoSQL database (Key = filename, Value = contents), it isn't good for searching/listing a large quantity of objects.

Storing of S3 Keys vs URLs

I have some functionality that uploads Documents to an S3 Bucket.
The key names are programmatically generated via some proprietary logic for the layout/naming convention needed.
The results of my S3 upload command is the actual url itself. So, it's in the format of
REGION/BUCKET/KEY
I was planning on storing that full url into my DB so that users can access their uploads.
Given that REGION and BUCKET probably wouldn't change, does it make sense to just store the KEY - and then dynamically generate the full url when the client needs it?
Just want to know what the desired pattern here is and what others do. Thanks!
Storing the full URL is a bad idea. As you said in the question, the region and bucket are already known, so storing the full URL is a waste of disk space. Also, if in the future say, you want to migrate your assets to a different bucket may be in a different region, having full URLs stored in the DB just make things harder.

AWS S3 Bucket policy to prevent Object updates

I have set of objects in an S3 Bucket, all with a common prefix. I want to prevent updating of the currently existing objects, however allow users to add new objects in the same prefix.
As I understand it, the S3:PutObject action is both used to update existing objects AND create new ones.
Is there a bucket policy that can limit updating, while allowing creating?
ex: forbid modifying already existing s3:/bucket/Input/obj1, but allow creating s3:/bucket/Input/obj2
edit, context: We're using S3 as a store for regression test data, used to test our transformations. As we're continuously adding new test data, we want to ensure that the already ingested input data doesn't change. This would resolve one of the current causes of failed tests. All our input data is stored with the same prefix, and likewise for the expected data.
No, this is not possible.
The same API call, and the same permissions, are used to upload an object regardless of whether an object already exists with the same name.
You could use Amazon S3 Versioning to retain both the old object and the new object, but that depends on how you will be using the objects.
It is not possible in a way you describe, but there is a mechanism of sorts, called S3 object lock, which allows you to lock a specific version of file. It will not prevent creation of new versions of file, but the version you lock is going to be immutable.

Get User-Agent and IP Address of object anonymously uploaded to S3?

Is there a way to retrieve the User-Agent and/or IP Address of the client that uploaded an object anonymously to an S3 bucket?
I'm using an anonymous upload method similar to this: https://gist.github.com/jareware/d7a817a08e9eae51a7ea
which shows a method of testing against the client's User-Agent (via. aws:UserAgent in a StringNotEquals block in the Condition block), and there is documentation that shows that aws:SourceIp exists as well, but I see no way of grabbing either after the file has been uploaded.
Am I missing something?
This is possible if you enabled logging on the bucket...
http://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html
...or captured a bucket notification event about the upload...
http://docs.aws.amazon.com/AmazonS3/latest/dev/notification-content-structure.html
Otherwise, no.
Anonymous uploads are just a really, really bad idea. You can end up with objects where the only available action to you is deleting them. Bucket ownership != object ownership. Authenticating requests is simply not that difficult, so I would encourage you to back away slowly from anonymous S3 writes.