Using Boto3 how to download list of files from AWS s3 as a zip file maintaining the folder structure? - amazon-web-services

I am trying to download a list of files within a parent folder my maintaining the sub folder structure.
For example:
Folder structure in AWS s3 https://testbucket.s3.amazonaws.com/folder1/folder2/folder3
Subfolders and files within 'folder3':
Subfolders
files
3.1
3.1.1.jpg, 3.1.2. jpg
3.2
3.2.1.jpg, 3.2.2. jpg
3.3
3.3.1.jpg, 3.3.2. jpg
List of files to download: [/folder3/3.1/3.1.1.jpg, /folder3/3.2/3.2.1.jpg, /folder3/3.2/3.2.2.jpg]
Is there an inbuilt function is boto3 to download the mentioned files as a zip file by maintaining the folder structure?
Note: I tried with a python package 'Aws-S3-Manager' but I was not able to maintain folder structure using it

No. Amazon S3 does not have a Zip capability.
You would need to download each object individually, but you can do it in parallel to reduce transfer times.

Related

Coldfusion Update 3 - cf_script mapping

The documentation at https://helpx.adobe.com/ee/coldfusion/kb/coldfusion-2021-update-3.html
says:
If you've created a mapping of the cf_scripts folder, you must copy the contents of the downloaded zip into CF_SCRIPTS/scrips/ajax folder to download the ajax package.
The link on the page is to just the jar file, so I assume they are taking about the zip file you can download from the Update 2 page: https://helpx.adobe.com/coldfusion/kb/coldfusion-2021-update-2.html#:~:text=ColdFusion%20(2021%20release)%20Update%202%20(release%20date%2C%2014,cfsetup%20updates.
It seems strange to paste the contents of the zip file which is the "bundles" folder into CF_SCRIPTS/scrips/ajax... am I missing something?

CodePipeline not saving all files in source artifacts

I've set up a new pipeline in AWS CodePipeline, and connected it to my GitHub account. I'm getting build errors in CodeBuild because a folder that is in my GitHub repository, static/css/, is missing (I'm using CodeBuild to do a gatsby build).
This is not a folder generated in the build process - this folder and its files exist in a clean repo. I've also checked that the branch is correct (master).
When I inspect the zip file in the SourceArtifacts folder in my S3 bucket, this folder is not there.
Any ideas why CodePipeline is not retrieving, or at least keeping, this subfolder and its contents?
Go to your Github repo and select the Green button "Clone or Download", then download the zip file. This is essentially what CodePipeline is doing to get your Github source. Now inspect the files in the zip file and confirm if 'static' directory is there. If it is not there you need to fix that and get the files into github.
It turned out that the missing folder was listed with an export-ignore attribute in the .gitattributes folder. The static/css folder got zipped up with everything else after removing this attribute.

AWS S3 Bucket upload all only zip files

I'm trying to upload all my zip files in folder to my s3 bucket using this command
aws s3 cp recursive s3://<bucket-name>/%date:~4,2%-%date:~7,2%-%date:~10,4% --
recursive --include="*.zip" --exclude="*" --exclude="*/*/*"
the exclude only works in files but not in directory so my all my directory with zip files inside still uploading. Is there a way to upload only the zip files and exclude all kinds of other files and directories without specifying the name of directory/files.
https://docs.aws.amazon.com/cli/latest/reference/s3/index.html#use-of-exclude-and-include-filters
When there are multiple filters, the rule is the filters that appear later in the command take precedence over filters that appear earlier in the command.
Had a similar issue, turns out you need to put exclude="*" first.
aws s3 cp recursive s3://<bucket-name>/%date:~4,2%-%date:~7,2%-%date:~10,4% --
recursive --exclude="*" --exclude="*/*/*" --include="*.zip"
Should work

copy sub directories as well using Jenkins S3 plugin

I am using s3 plugin in Jenkins to copy my project from GIT to S3.
Its working fine; except that it copies only the top level files. It doesn't copy the subdirectories or the files with in the sub directory.
How can I achieve a full copy?
It depends on your OS where the Jenkins job is executed: JENKINS issue 27576 seems to indicate it was an issue, but PR 55 also shows the right syntax to use for a recursive upload:
We had the S3 plugin configured with the source parameter as trunk/build/resources/**/* on Windows builders.
So in your case, make sure your path to upload finishes with /**/* in order to consider all files.
Ant -- copying files and subdirectories from only one subdirectory on a tree
This helped me a lot
if you only what to upload to s3 the whole folder use: foldername/**/
i used this to host a nuxt project in s3 with the dist generated folder.

Copying/using Python files from S3 to Amazon Elastic MapReduce at bootstrap time

I've figured out how to install python packages (numpy and such) at the bootstrapping step using boto, as well as copying files from S3 to my EC2 instances, still with boto.
What I haven't figured out is how to distribute python scripts (or any file) from S3 buckets to each EMR instance using boto. Any pointers?
If you are using boto, I recommend packaging all your Python files in an archive (.tar.gz format) and then using the cacheArchive directive in Hadoop/EMR to access it.
This is what I do:
Put all necessary Python files in a sub-directory, say, "required/" and test it locally.
Create an archive of this: cd required && tar czvf required.tgz *
Upload this archive to S3: s3cmd put required.tgz s3://yourBucket/required.tgz
Add this command-line option to your steps: -cacheArchive s3://yourBucket/required.tgz#required
The last step will ensure that your archive file containing Python code will be in the same directory format as in your local dev machine.
To actually do step #4 in boto, here is the code:
step = StreamingStep(name=jobName,
mapper='...',
reducer='...',
...
cache_archives=["s3://yourBucket/required.tgz#required"],
)
conn.add_jobflow_steps(jobID, [step])
And to allow for the imported code in Python to work properly in your mapper, make sure to reference it as you would a sub-directory:
sys.path.append('./required')
import myCustomPythonClass
# Mapper: do something!