I have a 10G file .tar file on s3, I want to decompress that file and keep the unzipped files on s3.
Is there a simple command I can run against s3?
Or do I have to unzip the file myself locally, and upload the individual files back to s3 myself?
Thanks
You can do this from the Amazon CLI, or the new Amazon CloudShell, with a command like
aws s3 cp s3://bucket/data.tar.gz - | tar -xz --to-command='aws s3 cp - s3://bucket/$TAR_REALNAME'
Note all those dangling '-' chars are important for piping to stdout/stdin
There is no command to manipulate file contents on Amazon S3.
You will need to download the file, untar/unzip it, then upload the content to S3.
This will be done the most quickly from an Amazon EC2 instance in the same region as the bucket. You could potentially write an AWS Lambda function to do this too, but beware of the 500MB /tmp disk space limit.
You can, however, mount S3 bucket on EC2 as S3FS.
Here is the link with more detail on how to mount: https://cloudkul.com/blog/mounting-s3-bucket-linux-ec2-instance/
Once mounted you can read and write files to s3 just like you do in local disk.
Related
AWS CLI to download file with its entire folder structure from S3 to local and/or one S3 to another S3
I am looking to download the file from S3 bucket to local with its entire folder structure. For example,
s3://test-s3-dev/apps/test-prd/test/data/sets/frs/bblr/type/level=low/type=data/bd=2022-08-25/region=a/entity=c/ss=tt/dev=mtp/datasetV=1/File123.txt
Above is the S3 path which i need to download on local with it's entire folder structure from S3.
However, by
cp --recursive and synch both are only downloading the File123.txt in current local folder and not downloading the FIle123.txt file with its entire folder structure.
**Please advice how to achieve the File gets downloaded from S3 with its entire folder structure from S3 for ->
To download on local system and/or
Copy from one s3 connection to another S3 connection.**
aws --endpoint-url http://abc.xyz.pqr:9020 s3 cp --recursive s3://test-s3-dev/apps/test-prd/test/data/sets/frs/bblr/type/level=low/type=data/bd=2022-08-25/region=a/entity=c/ss=tt/dev=mtp/datasetV=1/File123.txt ./
OR
aws --endpoint-url http://abc.xyz.pqr:9020 s3 cp --recursive s3://test-s3-dev/apps/test-prd/test/data/sets/frs/bblr/type/level=low/type=data/bd=2022-08-25/region=a/entity=c/ss=tt/dev=mtp/datasetV=1/ ./
OR
aws --endpoint-url http://abc.xyz.pqr:9020 s3 sync s3://test-s3-dev/apps/test-prd/test/data/sets/frs/bblr/type/level=low/type=data/bd=2022-08-25/region=a/entity=c/ss=tt/dev=mtp/datasetV=1/ ./
Above Three aws commands are downloading the file directly in current local folder without copying/sync the file entire directory structure from S3.
I would like to create some dummy files in S3 bucket for testing purposes. Since these are dummy files it seems like an overkill to create them locally and upload to S3 (few GB of data). I created the files with truncate command in linux. Is it possible to create such files directly in S3 or do I need to upload them?
You need to upload them. Since you created the files using a terminal, you can install the AWS CLI and then use the aws s3 cp command upload them to S3. If you have created many files or have a deep folder structure, you can use the --recursive command to upload all files from the myDir to the myBucket recursively:
aws s3 cp myDir s3://mybucket/ --recursive
I have some huge files which are in bucket1. I need to copy some of the files to bucket2. I know some ways where I will download files from bucket1 to local machine and upload to bucket2.
Can I skip this download and upload step and request amazon to copy files without downloading? Is this even possible?
Amazon S3 has API calls that can copy objects between buckets (even between regions), which does not involve any downloading.
The easiest method is to use the AWS Command-Line Interface (CLI), which has some useful commands:
aws s3 sync s3://bucket1/ s3://bucket2/
will syncrhonize files between buckets, so they have the same content.
aws s3 cp --recursive s3://bucket1/ s3://bucket2/
will do similar, but you can be more selective
See: Using High-Level s3 Commands with the AWS Command Line Interface - AWS Command Line Interface
I have set a number of files for restore from glacier to S3 and I want to download it all – the whole bucket.
When I browse the s3 bucket from the web console, I don’t see the glacier restored items (unless I show the version).
Is there a way to download all the bucket files to the local drive, including glacier restored ones?
Edit:
I ran
s3cmd sync s3://bucketname .
got only the non-glacier restored ones.
You can use below command to sync s3 bucket including the restored files:
aws s3 sync s3://bucketname <local directory or .> --force-glacier-transfer
I think you can use our free tool to browse your S3/Glacier buckets and recover data:
https://www.cloudberrylab.com/explorer/amazon-s3.aspx
I'm looking through the documentation of aws cli and I cannot find the way to copy the only files in some directory structure to other bucket with "flattened" structure(I want one directory and all files inside of it).
for example
/a/b/c/1.pg
/a/2.jpg
/a/b/3.jpg
i would want to have in different bucket:
/x/1.jpg
/x/2.jpg
/x/3.jpg
Am I missing something or is it impossible?
Do you have an idea how could I do that?
Assuming that you have aws cli configured on the system and assuming that both the buckets are in the same region.
What you can do is first dowload the s3 bucket to your local machine using:
aws s3 sync s3://originbucket /localdir/
Post this, use a find command to get all the files into one dir
find /localdir/ -type f -exec mv {} /anotherlocaldir/
Finally, you can upload the files to s3 again!
aws s3 sync /anotherlocaldir/ s3://destinationbucket
You don't need to download files locally, as suggested in another answer. Instead, you could write a shell script or something that does the following:
Run ls on s3://bucket1 to get fully-qualified names of all files in it.
For each file, run cp to copy it from current location to s3://bucket2/x/
Here are some examples for your reference:
aws s3 sync /a/b/c/1.pg s3://bucketname/
aws s3 sync /a/2.jpg s3://bucketname/
aws s3 sync /a/b/3.jpg s3://bucketname/
To sync all contents of a dir to S3 bucket:
aws s3 sync /directoryPath/ s3://bucketname/
AWS reference url: http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html