Currently, my S3 bucket contains files. I want to create a folder for each file on S3.
Current -> s3://<bucket>/test.txt
Expectation -> s3://<bucket>/test/test.txt
How can I achieve this using the EC2 instance?
S3 doesn't have "folders" really, object names may contain / characters in them and that in a way emulates folders. Simply name your objects test/<filename> to achieve that. See the S3 docs for more.
As for doing it from EC2, it is no different from doing it from anywhere else (except, maybe, in EC2 you may be able to rely on an IAM profile instead of using ad-hoc credentials). If you've tried it and failed, maybe post a new question with more details.
If you have Linux you can try something like:
aws s3 ls s3://bucket/ | while read date time size name; do aws s3 mv s3://bucket/${name} s3://bucket/`echo ${name%.*}`/${name}; done
it does not depend upon EC2 instance. You can use aws cli from EC2 instance or some other source with putting desired path, for your case s3:///test/test.txt. you can even change the name of the file you are copying into s3 bucket even its extension if you want.
Related
I use the Illumina Basespace service to do high throughput sequencing secondary analyzes. This service uses AWS servers and therefore all files are stored on s3.
I would like to transfer the files (results of analyzes) from basespace to my own aws s3 account. I would like to know what would be the best strategy to make things go quickly knowing that in the end we can summarize it as a copy of files from an s3 bucket belonging to Illumina to an s3 bucket belonging to me.
The solutions I'm thinking of:
use the CLI basespace tool to copy the files to our on premise servers then transfer them back to aws
use this tool from an ec2 instance.
use the illumina API to get a pre-signed download url (but then how can I use this url to download the file directly into my s3 bucket?).
If I use an ec2 instance, what kind of instance do you recommend to have enough resources without having too much (and therefore spending money for nothing)?
Thanks in advance,
Quentin
I am attempting to use the following command in a bash script to update the x-amz-meta-uid and x-amz-meta-gid of files and folders recursively.
aws s3 cp s3://$SOURCE_CLOUD_BUCKET_PATH s3://$DESTINATION_CLOUD_BUCKET_PATH --recursive --quiet --metadata-directive "REPLACE" --metadata "uid=$USER_UID,gid=$USER_GID"
However, it only seems to be updating the metadata on files. How can I also get this to update the metadata on the directories/folders also?
aws --version
aws-cli/2.0.43 Python/3.7.3 Linux/5.4.0-1029-aws exe/x86_64.ubuntu.18
As AWS S3 document states : https://docs.aws.amazon.com/AmazonS3/latest/user-guide/using-folders.html
In Amazon S3, buckets and objects are the primary resources, and
objects are stored in buckets. Amazon S3 has a flat structure instead
of a hierarchy like you would see in a file system. However, for the
sake of organizational simplicity, the Amazon S3 console supports the
folder concept as a means of grouping objects. Amazon S3 does this by
using a shared name prefix for objects (that is, objects have names
that begin with a common string). Object names are also referred to as
key names.
Since AWS S3 is Object storage service and is not POSIX compliant.
Which means there is no disk level folder structure maintainer.
The folders you are seeing are logical which means a file with name hello/world.text will show hello as parent folder name while world.txt as file name however, the actual filename stored is hello/world.txt.
So the meta-data is also managed at file level and not at folder level since, their are no physical folders.
The CLI behaviour is correct and you need to modify meta-data of files. Though, you can modify meta-data of all files / multiple files in a go.
No need to change the metadata of the files recursively, Metadata of whole folder can be changed. Follow this
I have a bucket with folder s3://mybucket/abc/thisFolder which contains thousands of files inside.
I can use aws s3 rm s3://mybucket/abc/thisFolder --recursive to delete it and all files inside, and it does it fine one by one.
However, there's also a delete-folder command, but to me the official doc is not very clear. Its example says aws workdocs delete-folder --folder-id 26fa8aa4ba2071447c194f7b150b07149dbdb9e1c8a301872dcd93a4735ce65d
I would like to know what is workdocs in example above, and how do I obtain the long --folder-id string for my folder s3://mybucket/abc/thisFolder?
Thank you.
Amazon WorkDocs is a Dropbox-like service.
If you wish to delete objects in Amazon S3, then you should only use AWS CLI commands that start with aws s3 or aws s3api.
Another way to delete folders in Amazon S3 is to configure Amazon S3 Object lifecycle management with a rule to delete objects with a given prefix. They might take a while to delete (~24 hours), but it will happen automatically rather than one-by-one.
I am practicing AWS commands. My client has given me the AWS IAM access key and the secret but not the account that I can log in to the admin panel. Those keys are being used with the project itself. What I am trying to do is that I am trying to list down all the files recursive within a S3 bucket.
This is what I have done so far.
I have configured the AWS profile for CLI using the following command
aws configure
Then I could list all the available buckets by running the following command
aws s3 ls
Then I am trying to list all the files within a bucket. I tried running the following command.
aws s3 ls s3://my-bucket-name
But it seems like it is not giving me the correct content. Also, I need a way to navigate around the bucket too. How can I do that?
You want to list all of the objects recursively but aren't using --recursive flag. This will only show prefixes and any objects at the root level
Relevant docs https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html
A few options.
Roll your own
if you run an aws s3 ls and a line item has the word "PRE" instead of a modify date and size, that means it's a "directory" that you can traverse. You can write a quick bash script to run recursive aws s3 ls commands on everything that returns "PRE" indicating it's hiding more files.
s3fs
Using the S3FS-Fuse project on github, you can mount an S3 bucket on your file system and explore it that way. I haven't tested this and thus can't personally recommend it, but it seems viable, and might be a simple way to use tools you already have and understand (like tree).
One concern that I might have, is when I've used software similar to this it has made a lot of API calls and if left mounted for the long-term, it can run up costs just by the number of API calls.
Sync everything to a localhost (not recommended)
Adding this for completion, but you can run
aws s3 sync s3://mybucket/ ./
This will try to copy everything to your computer and you'll be able to use your own filesystem. However, s3 buckets can hold petabytes of data, so you may not be able to sync it all to your system. Also, s3 provides a lot of strong security precautions to protect the data, which your personal computer probably doesn't.
We have options to :
1. Copy file/object to another S3 location or local path (cp)
2. List S3 objects (ls)
3. Create bucket (mb) and move objects to bucket (mv)
4. Remove a bucket (rb) and remove an object (rm)
5. Sync objects and S3 prefixes
and many more.
But before using the commands, we need to check if S3 service is available in first place. How to do it?
Is there a command like :
aws S3 -isavailable
and we get response like
0 - S3 is available, I can go ahead upload object/create bucket etc.
1 - S3 is not availble, you can't upload object etc. ?
You should assume that Amazon S3 is available. If there is a problem with S3, you will receive an error when making a call with the Amazon CLI.
If you are particularly concerned, then add a simple CLI command first, eg aws s3 ls and throw away the results. But that's really the same concept. Or, you could use the --dry-run option available on many commands that simply indicates whether you would have had sufficient permissions to make the request, but doesn't actually run the request.
It is more likely that you will have an error in your configuration (eg wrong region, credentials not valid) than S3 being down.