Download bulk objects from Amazon S3 bucket - amazon-web-services

I have a large bucket folder with over 30 million object (images). Now, I need to download only 700,000 object (image) from that large folder.
I have the names of objects (images) that I need to download in a .txt file.
I can use AWS CLI, but not sure if it support downloading many objects at one command.
Is there a straight forward solution for that you would have in mind?

Related

Is there a way to Request Restore or change storage type for every file in s3 bucket?

I have an s3 bucket which used to be a standard tier storage. Then I created a policy to change all files that are more then 30 days old to change storage type to deep archive (glacier). After 2 months every single file in the bucket became deep archive. I'm looking for a fast way to revert back so I could download all my files with mulptiple child folders and subfolders. Thanks.
Going manualy in AWS portal and clicking just on files / unselecting folders , then restore option. I have 1000 of files in subfolders and child foldersh in them.

Is it feasible to maintain directory structure when backing up to AWS S3 Glacier classes?

I am trying to backup 2TB from a shared drive of Windows Server to S3 Glacier
There are maybe 100 folders (some may be nested ) and perhaps 5000 files (some small like spread sheets, photos and other are larger like server images. My first question is what counts as an object here?
Let’s say I have Folder 1 which has 10 folders inside it. Each of 10 folders have 100 files.
Would number of objects be 1 folder + (10 folders * 100 files) = 1001 objects?
I am trying to understand how folder nesting is treated in S3. Do I have to manually create each folder as a prefix and then upload each file inside that using AWS CLI? I am trying to recreate the shared drive experience on the cloud where I can browse the folders and download the files I need.
Amazon S3 does not actually support folders. It might look like it does, but it actually doesn't.
For example, you could upload an object to invoices/january.txt and the invoices directory will just magically 'appear'. Then, if you deleted that object, the invoices folder would magically 'disappear' (because it never actually existed).
So, feel free to upload objects to any location without creating the directories first.
However, if you click the Create folder button in the Amazon S3 management console, it will create a zero-length object with the name of the directory. This will make the directory 'appear' and it would be counted as an object.
The easiest way to copy the files from your Windows computer to an Amazon S3 bucket would be:
aws s3 sync directoryname s3://bucket-name/ --storage-class DEEP_ARCHIVE
It will upload all files, including files in subdirectories. It will not create the folders, since they aren't necessary. However, the folder will still 'appear' in S3.

dynamically create / append to zip from multiple instances

I have a situation where thousands o files are created for a user by multiple backend instances, and then they're uploaded to AWS S3 / Azure Storage. After all the files are created, the user wants to download them as a zip. I can create the zip and then get a pre-signed URL, but I tried few archiving solutions and all of them are just taking too much time (hours).
Is there any way of creating the zip dynamically from the multiple backend instances? I want append to zip after each file creation, from any backend instance.
Zip itself supports the use case you want. For example, zip command in Linux:
When given the name of an existing zip archive, zip will replace identically named entries in the zip archive (matching the relative names as stored in the archive) or add entries for new names.
You need to persist the working zip file somewhere in a file system though. The most obvious choice I can think of is EFS, so that multiple instances can mount the file system and access the zip file.
If you don't want to modify the existing instances/workloads, you can even mount EFS on Lambda. Then set S3 trigger for the Lambda to update zip file every time a new file is uploaded.
I think you can not use only S3 for this, because you cannot update S3 objects. Then you need to download/upload for every new file, which is really not ideal.

How to migrate data from s3 bucket to glacier?

I have a TB sized S3 bucket with pdf files. I need to migrate the old files to glacier. I know that I can create a life cycle rule to migrate files which are older than certain number of days. But in my case currently the bucket consists of both old and new pdf files and they were added at a same time. So they may have same uploaded date. In this case a life cycle rule won't be useful.
In the pdf files there is a field called capture_date. So i need to migrate those files based on the capture_date. (ie: migrate all pdf files if the capture_date < 2015-05-21 likewise).
Will a Fargate job will be useful here? if so, please give a brief idea.
Please suggest your ideas. Thanks in advance
S3 by itself will not read your pdf files. Thus you have to read them yourself, extract data that determine which ones are old and new, and using AWS SDK (or CLI) to move them to Glacier.
Since the files are not too big, you could use S3 Batch along with lambda function which would do the change of the class to glacier.
Alternatively, you could do this on an EC2 instance, using S3 Inventory's CSV list of your objects (assuming large number of them).
And the most traditional way is to just list your bucket, and iterate over each object.

How to create one glacier archive for many s3 objects?

One of our client asks to get all the video that they uploaded to the system. The files are stored at s3. Client expect to get one link that will download archive with all the videos.
Is there a way to create such an archive without downloading files archiving it and uploading back to aws?
So far I didn't find the solution.
Is it possible to do it with glacier, or move the files to folder and expose it?
Unfortunately, you can't create a zip-like archives from existing objects directly on S3. Similarly you can't transfer them to Glacier to do this. Glacier is not going to produce a single zip or rar (or any time of) archive from multiple s3 objects for you.
Instead, you have to download them first, zip or rar (or use which ever archiving format you prefer), and the re-upload to S3. Then you can share the zip/rar with your customers.
There is also a possibility of using multi-part AWS API to merge S3 objects without downloading them. But this requires programming custom solution to merge objects (not creating zip/rar-type archives).
You can create a glacier archive for a specific prefix (what you see as a subfolder) by using AWS lifecycle rules to perform this action.
More information available here.
Original answer
There is no native way to be able to retrieve all objects as an archive via S3.
S3 simply exposes all objects as they are uploaded, unfortunately you will need to perform the archiving as a separate process afterwards.