Upload nested directories to S3 with the AWS CLI?

Upload nested directories to S3 with the AWS CLI? - amazon-web-services

I have been trying to upload a static website to s3 with the following cli command:
aws s3 sync . s3://my-website-bucket --acl public-read
It successfully uploads every file in the root directory but fails on the nested directories with the following:
An error occurred (InvalidRequest) when calling the ListObjects operation: Missing required header for this request: x-amz-content-sha256
I have found references to this issue on GitHub but no clear instruction of how to solve it.

s3 sync command recursively copies the local folders to folder like s3 objects.
Even though S3 doesn't really support folders, the sync command creates the s3 objects with a format which will have the folder names in their keys.
As reported on the following amazon support thread "forums.aws.amazon.com/thread.jspa?threadID=235135" the issue should be solved by setting the region correctly.

S3 has no concept of directories.
S3 is an object store where each object is identified by a key.
The key might be a string like "dir1/dir2/dir3/test.txt"
AWS graphical user interfaces on top of S3 interpret the "/" characters as a directory separator and present the file list "as is" it was in a directory structure.
However, internally, there is no concept of directory, S3 has a flat namespace.
See http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html for more details.
This is the reason directories are not synced as there is no directories on S3.
Also the feature request is open in https://github.com/aws/aws-cli/issues/912 but has not been added yet.

Related

AWS CDK Pipelines Creates Multiple Artifact Buckets

I'm having some trouble with CDK Pipeline/ CodePipeline in AWS. When I run the pipeline (git commit) the Assets section always runs even if I don't change the files that it is building and every pipeline execution creates an S3 bucket with pipeline assets so we have loads of s3 buckets. This behaviour while odd does seem to work but it takes a long time to run and doesn't seem right. Is this to be expected and if not what may be the issue?
Update
We sometimes see the below error msg in the build logs which may be related but it doesn't cause failure:
Failed to store notices in the cache: Error: ENOENT: no such file or directory, open '/root/.cdk/cache/notices.json'

If you create an S3 bucket and then reference that bucket in your Codepipeline, the output will always be in that S3 bucket, and the artifacts will be sub directories of that specific S3 bucket. That way you will get new build assets, but they will be placed inside of the same bucket, and you only have one S3 bucket.

how to create nested folders using aws cli in s3 bucket

In linux to create nested folders, irrespective of the intermediate folders exist or not can be done using the below command.
mkdir -p /home/user/some_non_existing_folder1/some_non_existing_folder2/somefolder
Similar to this i want to create a nested folder structure in S3 and place my files there later
how can i do this using aws cli

Folders do not actually exist in Amazon S3.
For example, if you have an empty bucket you could upload a file to invoices/january.txt and the invoices folder will magically 'appear' without needing to be specifically created.
Then, if you were to delete the invoices/january.txt object, then the invoices folder will magically 'disappear' (because it never actually existed).
This works because the Key (filename) of an Amazon S3 object contains the full path of the object. The above object is not called january.txt -- rather, it is called invoices/january.txt. The Amazon S3 console will make it appear as if the folder exists, but it doesn't.
If you click the Create folder button in the S3 management console, then a zero-length object is created with the name of the folder. This causes the folder to 'appear' because it contains a file, but the file will not appear (well, it does appear, but humans see it as a folder). If this zero-length object is deleted, the folder will 'disappear' (because it never actually existed).
Therefore, if you wish to create a directory hierarchy before uploading files, you could upload zero-length objects with the same names as the folders you want to create. You can use the aws s3 cp command to upload such a file.
Or, just upload the files to where you want them to appear, and the folders will magically appear automatically.

# create bucket
aws s3 mb s3://main_folder
# created nested folder
aws s3api put-object --bucket main_folder --key nested1/nested2/nested3/somefoldertosync
# sync my local folder to s3
aws s3 sync /home/ubuntu/somefoldertosync s3://main_folder/nested1/nested2/nested3/somefoldertosync
currently i am using the above way to carry on with my work

read file from s3 bucket

I'm trying to get a file from s3 bucket with golang. What's special in my request is that I need to get a file from the root of the s3. i.e, in my situation, i have a buckets folder which is the root for the s3, inside that i have folders and files. I need to get the files from the buckets folder. it means that i don't have a bucket folder because i access only to the root.
the code im trying is:
numBytes, err := downloader.Download(file, &s3.GetObjectInput{
Bucket: aws.String("/"),
Key: aws.String("some_image.jpeg"),
})
The problem is I got an error that says the object does not exist.
Is it possible to read files from the root of s3? What do I need to write in the bucket? the key is written okay?
Many thanks for helping!

All files in S3 are stored inside buckets. You're not able to store a file in the root of s3.
Each bucket is its own distinct namespace. You can have multiple buckets in your Amazon account, and each file must belong to one of those buckets.
You can either create a bucket using the AWS web interface, command line tools or API. (Or 3rd party software like Cyberduck).
You can read more about buckets in S3 here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingBucket.html

How to access aws s3 current bucketlist content info

I have been provided with the access and secret key for an Amazon S3 container. No more details were provided other than to drop some files into some specific folder.
I downloaded Amazon CLI and also the Amazon SDK. So far, seems to be no way for me to check the bucket name or list the folders where I'm supposed to drop my files. Every single command seems to require the knowledge of a bucket name.
Trying to list with aws s3 ls gives me the error:
An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied
Is there a way to list the content of my current location (I'm guessing the credentials I was given are linked directly to a bucket?). I'd like to see at least the folders where I'm supposed to drop my files, but the SDK client for the console app I'm building seems to always require a bucket name.
Was I provided incomplete info or limited rights?

Do you know the bucket name or not? If you don't and you don't have permission to ListAllMyBuckets and GetBucketLocation on * and ListBucket on the bucket in question, then you can't get the bucket name. That's how it is supposed to work. If you know the bucket, then you can run aws s3 s3://bucket-name/ to get objects in the bucket.
Note, that S3 buckets don't have the concept of "folder". It's User interface "sugar" to make it look like folders and files. Internally, it's just the key and the object

Looks like it was just not possible without enhanced rights or with the actual bucketname. I was able to procure both later on from the client and able to complete the task. Thanks for the comments.

Copied S3 Bucket is not public by default

I have two S3 buckets -
production
staging
I want to periodically refresh the staging bucket so it has all the latest production objects for testing, so I used the aws-cli as follows -
aws s3 sync s3://production s3://staging
So now both buckets have the exact same files.
However, for any given file/object, the production link works and the staging doesn't
e.g.
This works: https://s3-us-west-1.amazonaws.com/production/users/photos/000/001/001/medium/my_file.jpg
This doesn't: https://s3-us-west-1.amazonaws.com/staging/users/photos/000/001/001/medium/my_file.jpg
The staging bucket's objects are not public links, and are private by default.
Is there a way to correct this or avoid this with the aws-cli? I know I can change the bucket policy itself, but it was previously working with all the files that were there. So I'm wondering what it is about copying files over that changed their visibility.
Thanks!

you should be able to add --acl flag :
aws s3 sync s3://production s3://staging --acl public-read
as mentioned in doc private acl is the default

Just did some more research.
Frédéric's answer is correct, but just wanted to expand on that a bit more.
aws s3 sync isn't really a true "sync" by default. It just goes through each file in the source bucket and copies files into the target bucket
If a target file with the same name already exists. I looked for a --force flag to force the overwrite, but apparently none exists
It won't delete "extra" files in the target directory by default (i.e. a file that does not exist in the source directory). The --delete flag will allow you to do that
It does not copy over permissions by default. It's true that --acl public-read will set the target permissions to publicly readable, but that has 2 problems - (1) it just blindly sets that for all files, which you may not want, and (2) it doesn't work when you have several files of varying permissions.
There's an issue about it here, and a PR that's open but still un-merged as of today.
So if you're trying to do a full blind refresh like me for testing purposes, the best option is to
Completely empty the target staging bucket by right clicking in the console and clicking Empty
Run the sync and blindly set everything as public-read (other visibility options are available, see documentation here). - aws s3 sync s3://production s3://staging --acl public-read

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Upload nested directories to S3 with the AWS CLI? - amazon-web-services

Related

AWS CDK Pipelines Creates Multiple Artifact Buckets

how to create nested folders using aws cli in s3 bucket

read file from s3 bucket

How to access aws s3 current bucketlist content info

Copied S3 Bucket is not public by default

Categories

Resources