What's the risk in using project-id in GCS bucket names? - google-cloud-platform

I've been using project-id as a prefix in my GCS bucket-names to easily get a unique name.
When I read GCS-best practises
It says clearly not to use project-names or project-numbers (nothing about projectId:s)
But on the other hand, when I spin up GAE, two buckets containing the project-id are automatically created.
Is Google not following their own best practices or did I miss something?
Are the greatest risk of having projectId in bucket name that I give clues to a potential attacker about the project since bucket-names are publicly visible?

It does appear, to some degree, that Google might not be following its best practices (as listed on that page, assuming that project names and numbers mean GCP names and numbers). The default bucket for Firebase projects layered on top of GCP does the same.
The documentation you linked states the reason to avoid using project names:
... because anyone can probe for the existence of a bucket ...
The idea is that if someone knows the name of your project, they could use that to build the full name of the bucket, and use that knowledge in an attack in order to gain its contents. However, if your security configuration is exactly what it should be, then knowing the name of the bucket won't be a problem. This is particularly true for Firebase projects, which use security rules to determine who should be able to access what objects.
I'd take the advice in the documentation as a measure of security through obscurity in order to prevent attackers from guessing the names of your buckets and any of its contents. But if that's not your concern, then ignore it.

It looks like they're just worried about leaking PII. I'm not sure why they mentioned project names, unless it's because someone might include PII in their project name.
Don't use user IDs, email addresses, project names, project numbers, or any personally identifiable information (PII) in bucket names because anyone can probe for the existence of a bucket. Similarly, be very careful with putting PII in your object names, because object names appear in URLs for the object.
The two buckets I see created in my account have an appspot.com suffix. You cannot create arbitrary appspot.com buckets because they have a . in the name and thus are subject to verification:
Bucket names must contain only lowercase letters, numbers, dashes (-), underscores (_), and dots (.). Spaces are not allowed. Names containing dots require verification.
You are right though that the automatic bucket creation is inconsistent with their best practice guidelines.

Related

Organizing files in S3

I have a social media web application. Users upload pictures such as profile picture, project pictures, and etc. What's the best way to organize these files in a S3 bucket?
I thought of creating a folder with userid as its name inside the bucket and the inside that multiple other folders i.e. profile, projects and etc.
Not sure if that's the best approach to follow!
The names (Keys) you assign an object in Amazon S3 are frankly irrelevant.
What matters is that you have a database that tracks the objects, their ownership and their purpose.
You should not use the filename (Key) of an Amazon S3 object as a way of storing information about the object, because your application might have millions of objects in S3 and it is too slow to scan the list of objects to see which ones exist. Instead, consult a database to find them.
To answer your question: Yes, create a prefix by username if you wish, but then just give it a unique name (eg a Universally unique identifier - Wikipedia) that avoids name clashes.
Earlier there used to be a need to add random prefixes for better performance. More details here and here.
Following is the extract from one of that pages
Pay Attention to Your Naming Scheme If:
Distributing the Key names
Don’t save your object's key name starts with a date or standard key
names, it improves complexity in the S3 indexing and will reduce
performance, because based on the indexing objects saves in the single
storage partition .
Amazon S3 maintains keys lexicographically in its internal indices.
However, as of 17 Jul 2018 announcement, adding random prefix to S3 key isn't required for improving the performance

Is there anything to be gained by using 'folders' in an s3 bucket?

I am moving a largish number of jpgs (several hundred thousand) from a static filesystem to amazon s3.
On the old filesytem, I grouped files into subfolders to keep the total number of files / folder manageable.
For example, a file
4aca29c7c0a76c1cbaad40b2693e6bef.jpg
would be saved to:
/4a/ca/29/4aca29c7c0a76c1cbaad40b2693e6bef.jpg
From what I understand, s3 doesn't respect hierarchial namespaces. So if I were to use 'folders' on s3, the object, including the /'s, would really just be in a flat namesapce.
Still, according to the docs, amazon recommends mimicking a structured filesytem when working with s3.
So I am wondering: Is there anything to be gained using the above folder structure to organize files on s3? Or in this case am I better off just adding the files to s3 without any kind of 'folder' structure.
Performance is not impacted by the use (or non-use) of folders.
Some systems can use folders for easier navigation of the files. For example, Amazon Athena can scan specific sub-directories when querying data rather than having to read every file.
If your bucket is being used for one specific purpose, there is no reason to use folders. However, if it contains different types of data, then you might consider at least a top-level set of folders to keep data separated.
Another potential reason for using folders is for security. A bucket policy can grant access to buckets based upon a prefix (which is a folder name). However, this is likely not relevant for your use-case.
Using "folders" has no performance impact on S3, either way. It doesn't make it faster, and it doesn't make it slower.
The value of delimiting your object keys with / is in organization, both machine-friendly and human-friendly.
If you're trolling through a bucket in the console, troubleshooting, those meaningless noise-filled keys are a hassle to paginate through, only a few dozen at a time.
The console automatically groups objects into imaginary folders based on the / delimiters, so you can find your object to inspect it (check headers, metadata, etc.) is much easier if you can just click on 4a then ca then 29.
The S3 ListObjects APIs support requesting all the objects with a certain key prefix, but they also support finding all the common prefixes before the next delimiter, so you can send API requests to list prefix 4a/ca/ with delimiter / and it will only return the "folders" one level deep, which it refers to as "common prefixes."
This is less meaningful if your object keys are fully opaque and convey nothing more about the objects, as opposed to using key prefixes like images/ and thumbnails/ and videos/.
Having been an admin and working with S3 for a number of years, and having worked with buckets with key naming schemes designed by different teams, I would definitely recommend using some / delimiters for organization purposes. The buckets without them become more of a hassle to navigate over time.
Note that the console does allow you to "create folders," but this is more of the illusion -- there is no need to actually do this, unless you're loading a bucket manually. When you create a folder in the console, it just creates an empty object with a / at the end.

Easy way to created dated subdirectories on AWS S3

I'm trying to create a web service that is able to store user-upload files in S3. The problem is that we want the files stored in "dated directories".
For example, if a user uploads a.txt on 12/1/2017 at 9:15am, the file should look like this in S3:
https://s3-eu-west-1.amazonaws.com/test-bucket/uploaded/2017/12/1/9/a.txt
Does S3 have any API to help us achieving this or do we need to hand-craft this solution?
There is no such API in S3. Think of Amazon S3 as a storage service, not an application or database.
It is the responsibility of your application to store the data in the desired naming format -- just like storing data on a disk.
By the way, your naming format could do with some improvement:
Always expand fields to the correct number of digits (use 01 for January rather than 1) so that they sort correctly.
Think about your use-case -- if you will be scanning documents by year, then the /2017/12/01/09/a.txt naming format makes sense since you can look in the 2017 directory (not that directories really exist in S3). If not, then simply store it as /2017-12-01-09-a.txt.
Make it very clear which one is month vs day -- the USA is the only country in the world that treats "12/1/2017" as December 1st. The rest of the world reads it as "12 January". Using the format of 2017-12-01 makes it clear that it is 1-December-2017.
What about naming conflicts? Can only one person upload a file with a given name on a given day? How are you going to differentiate between different users uploading a file with the same name?
The reality is, the filename is totally irrelevant -- your application should use a database to keep track of objects that users
upload and assign each of them a unique name. When a file is later
requested, lookup the filename in the database and then provide that
file. Do not use S3 filenames as a pseudo-database where the name
conveys particular meaning, otherwise you'll often have to rename
files to add more meaning!
Directories don't actually exist in S3 -- they are just part of the filename. So, you can create a file in a given directory just by storing it -- there is no need to pre-create directories.
AWS S3 does not provide you with such logic. But it should by fairly easy to use the time information of your application to create such a s3 object key ("path").
Good luck!

AWS S3 - Privacy error when accessing file from link

I am working with a team that is using S3 to host content and they moved from a single bucket for all brands to one bucket for each brand and now we are having trouble when linking to the content from within salesforce site.com page. When I copy the link from S3 as HTTPS, I get a >"Your connection is >not private, Attackers might be trying to steal your information from >spiritxpress.s3.varsity.s3.amazonaws.com (for example, passwords, messages, or credit cards)."
I have asked them to compare the settings from the one that is working, and I don't have access to dig into it myself, and we are pretty new to this as well so thought I would see if there were any known paths to walk down. The ID and Key have not changed and I can access the content via CyberDuck, it just is not loading when reached via a link.
Let me know if additional information is needed and I will provide as quickly as I can.
[EDIT] the bucket naming convention they are using is all lowercase and meets convention guidelines as well, but it seems strange to me they way it is structured as they have named the bucket "brandname.s3.companyname" and when copying the link it comes across as "https://brandname.s3.company.s3.amazonaws.com/directory/filename" where the other bucket was being rendered as "https://s3.amazonaws.com/bucketname/......
Whoever made this change has failed to account for the way wildcard certificates work in HTTPS.
Requests to S3 using HTTPS are greeted with a certificate identifying itself as "*.s3[-region].amazonaws.com" and in order for the browser to consider this to be valid when compared to the link you're hitting, there cannot be any dots in the part of the hostname that matches the * offered by the cert. Bucket names with dots are valid, but they cannot be used on the left side of "s3[-region].amazonaws.com" in the hostname unless you are willing and able to accept a certificate that is deemed invalid... they can only be used as the first element of the path.
The only way to make dotted bucket names and S3 native wildcard SSL to work together is the other format: https://s3[-region].amazonaws.com/example.dotted.bucket.name/....
If your bucket isn't in us-standard, you likely need to use the region in the hostname, so that the request goes to the correct endpoint, e.g. https://s3-us-west-2.amazonaws.com/example.dotted.bucket.name/path... for a bucket in us-west-2 (Oregon). Otherwise S3 may return an error telling you that you need to use a different endpoint (and the endpoint they provide in the error message will be valid, but probably not the one you're wanting for SSL).
This is a limitation on how SSL certificates work, not a limitation in S3.
Okay, it appears it did boil down to some permissions that were missed and we were able to get the file to display as expected. Other issues are present, but the present one is resolved so marking as answered.

Is there a way to build AWS S3 bucket endpoints automatically, regardless of region?

So I have an app in Node that accesses stuff in buckets. I want it to be able to use buckets in any region, transparently. Unfortunately, the way of building the URL for the endpoint differs based on what region you're in.
If it's in US-Standard, I can say http://s3.amazonaws.com/BUCKETNAME/path/to/file. If it's anywhere else, that doesn't work (non-coincidentally, you're limited to domain-allowed characters (lowercase and numbers only) for bucket names in non-US Standard) and you use http://BUCKETNAME.s3.amazonaws.com/path/to/file.
(Note you can get more complicated and say
I'm thinking this is not a unique problem, so want to put it out there.
http://bucketname.s3.amazonaws.com/path/to/file works in US-Standard also, so you should be able to use this single construct on any bucket anywhere (unless I'm missing something in your question).