Amazon S3 - Can prefixes include the start of the file name? - amazon-web-services

I have several types of files within an Amazon S3 bucket, all of which are in the same folder. There are three "types" of files that I wan't to apply different transition/delete days to, and all of their filenames start the same way. I am wondering if prefixes for files need to just address folders, or if they can include the start of the filename as well. For example, the files start with data_file_*, log_file_*, and error_file_*, if they are all in a folder files/, can I set a rule with the prefix being files/error_file_? If so, is that syntax correct?
Note that changing the directory structure is not an option for me, and the AWS documentation doesn't have any examples like this, or any related comments that I can find.

The use-case you describe is actually the only valid way to set lifecycle rules. S3 has no concept of "folders" (even though it looks like that in the AWS console). It only understands filenames that happen to have slashes in them. This is typical for object based store (S3), in contrast to file storage (your laptop).
So when creating lifecycle rules, include the full path of the object (files/error_file_). Then the rule will be applied to all files with that prefix.

Related

Is there a way to copy all objects inside a S3 bucket to Redshift using a Wildcard?

I have an S3 Bucket called Facebook
The structure is like this :
Facebook/AUS/transformedfiles/YYYYMMDDHH/payments.csv
Facebook/IND/transformedfiles/YYYYMMDDHH/payments.csv
Facebook/SEA/transformedfiles/YYYYMMDDHH/payments.csv
Is there a way to copy all payments.csv to AWS Redshift?
something like :
copy payments Facebook/*/transformedfiles/YYYYMMDDHH/payments.csv
No, because the FROM clause accepts an object prefix, and implies a trailing wildcard.
If you want to load specific files, you'll need to use a manifest file. You would build this manifest by calling ListObjects and programmatically selecting the files you want.
A manifest file is also necessary if you're creating the files and immediately uploading them, because S3 is eventually consistent -- if you rely on it selecting files with a prefix, it might miss some.

Replace content in all files inside s3 bucket

I have a s3 bucket which is mapped to a domian say xyz.com . When ever a user register on xyz.com a file is created and stored in s3 bucket. Now i have 1000 of files in s3 and I want to replace some text in those files. All files have common name in start ex abc-{rand}.txt
The safest way of doing this would be to regenerate them again through the same process you originally used.
Personally I would try to avoid find and replace as it could lead to modifying parts that you did not intend.
Run multiple generations in parallel and override the existing files. This will ensure the files you generate will match your expectation and will not need to be modified again.
As a suggestion enable versioning before any of these interactions if you want the ability to rollback quickly in a scenario where it needs to be reverted.
Sadly, you can't do this in place in S3. You have to download them, change their content and re-upload.
This is because S3 is an object storage system, not regular file system.
To simply working with S3 files, you can use third part tool s3fs-fuse. The tool will make the S3 appear like a filesystem on your os.

Is there anything to be gained by using 'folders' in an s3 bucket?

I am moving a largish number of jpgs (several hundred thousand) from a static filesystem to amazon s3.
On the old filesytem, I grouped files into subfolders to keep the total number of files / folder manageable.
For example, a file
4aca29c7c0a76c1cbaad40b2693e6bef.jpg
would be saved to:
/4a/ca/29/4aca29c7c0a76c1cbaad40b2693e6bef.jpg
From what I understand, s3 doesn't respect hierarchial namespaces. So if I were to use 'folders' on s3, the object, including the /'s, would really just be in a flat namesapce.
Still, according to the docs, amazon recommends mimicking a structured filesytem when working with s3.
So I am wondering: Is there anything to be gained using the above folder structure to organize files on s3? Or in this case am I better off just adding the files to s3 without any kind of 'folder' structure.
Performance is not impacted by the use (or non-use) of folders.
Some systems can use folders for easier navigation of the files. For example, Amazon Athena can scan specific sub-directories when querying data rather than having to read every file.
If your bucket is being used for one specific purpose, there is no reason to use folders. However, if it contains different types of data, then you might consider at least a top-level set of folders to keep data separated.
Another potential reason for using folders is for security. A bucket policy can grant access to buckets based upon a prefix (which is a folder name). However, this is likely not relevant for your use-case.
Using "folders" has no performance impact on S3, either way. It doesn't make it faster, and it doesn't make it slower.
The value of delimiting your object keys with / is in organization, both machine-friendly and human-friendly.
If you're trolling through a bucket in the console, troubleshooting, those meaningless noise-filled keys are a hassle to paginate through, only a few dozen at a time.
The console automatically groups objects into imaginary folders based on the / delimiters, so you can find your object to inspect it (check headers, metadata, etc.) is much easier if you can just click on 4a then ca then 29.
The S3 ListObjects APIs support requesting all the objects with a certain key prefix, but they also support finding all the common prefixes before the next delimiter, so you can send API requests to list prefix 4a/ca/ with delimiter / and it will only return the "folders" one level deep, which it refers to as "common prefixes."
This is less meaningful if your object keys are fully opaque and convey nothing more about the objects, as opposed to using key prefixes like images/ and thumbnails/ and videos/.
Having been an admin and working with S3 for a number of years, and having worked with buckets with key naming schemes designed by different teams, I would definitely recommend using some / delimiters for organization purposes. The buckets without them become more of a hassle to navigate over time.
Note that the console does allow you to "create folders," but this is more of the illusion -- there is no need to actually do this, unless you're loading a bucket manually. When you create a folder in the console, it just creates an empty object with a / at the end.

AWS CLI - S3 how to replace a folder atomically?

So,
Let's say I have a folder called /example in S3. This folder contains a file called a.txt.
using AWS CLI, how do I upload a local folder, also called example, and replace the current S3 /example atomically. The local folder contains a file called b.txt.
So, I want the behaviour to be that the new S3 /example folder only contains b.txt.
Basically, is there a way to atomically replace an entire folder in S3 with a new one via the AWS CLI?
Thank you!
No, you can't do that.
For starters, S3 is an eventual consistent platform. That means that right after you do a write, you can still get old data back from S3. Practically, this converges quickly (seconds), but there is no upper bound. (They do provide consistency guarantees is some sequence of operations, but generally speaking, it's not strongly consistent)
Secondly, S3 does not have a concept of "folder" or "directory". S3 namespace is flat. The only thing that object /example/a.txt and /example/b.txt have in common is that they start with the same string, just like /foobar.txt and /foobaz.txt begin with the same string. (The User Interface does cheat a bit by treating the / character differently, and giving the illusion of directories)

AWS S3 : Do Lifecycle rules accept regex?

I have an s3 bucket with "folders" folder1, folder2, folder3, folder4. In folder2 and folder3 there is a "new" folder. I need to delete everything in "new", older than 1 day. Can I do that with a rule like /*/new/ ? Some guys say they have seen such rules work in the past, but that particular definition does nothing.
(In the real bucket there are folder1, folder2 ... folder3001 so I can't make rules for every folder, so please don't suggest that. The above example is for simplicity only.)
The PUT livecycle API takes a "Prefix", which as the name says is a prefix, not a regex.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTlifecycle.html
There is also a limit of 1000 rules per bucket.
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
You could change your folder structure so that keys look like "new/folderN".