AWS S3 cp creates undefined files - amazon-web-services

While using the aws cli cp command to copy the files recursively, there is a bug which creates some undefined files.
aws s3 cp --recursive $HOME/$MYHOST-$MYTIMESTAMP/$MYHOST-$MYTIMESTAMP-*.xml s3://mybucket/$MYHOST-$MYTIMESTAMP/
The program works fine and uploads to the specified bucket. But it also creates some undefined files outside the bucket in the root folder. This happens all the time and I have to rm (delete) those annoying undefined files.
I presumed it to be a bug and then tried individually uploading the files rather than using wildcards, with the same results as the recursive, it still creates additional undefined files outside the bucket in the root folder again. And this happens only when I run a bunch of the same cp commands in a bash script. In this case the problem is intermittently showing up.
aws s3 cp $HOME/$MYHOST-$MYTIMESTAMP/$MYHOST-$MYTIMESTAMP-hello.xml s3://mybucket/$MYHOST-$MYTIMESTAMP/
However while doing it only for a single file, it doesn't show up.
My Cli version -
aws-cli/1.14.34 Python/2.7.14+ Linux/4.4.104-39-default
botocore/1.8.38
Any help would be highly appreciated on this.

You have configured S3 access logging to write logs into this bucket. Presumably, these are the log files for this bucket.
Why the filenames begin with "undefined" is not clear -- something may have gone wrong when you set up logging for the bucket so that the log file prefix did not get saved -- but the filenames look like the names of the log files that S3 creates.
https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html
Best practice is to set up a separate bucket for collecting S3 access logs in each region.

Related

Downloading s3 bucket to local directory but files not copying?

There are many, many examples of how to download a directory of files from an s3 bucket to a local directory.
aws s3 cp s3://<bucket>/<directory> /<path>/<to>/<local>/ --recursive
However, I run this command from my AWS CLI that I've connected to and see confirmation in the terminal like:
download: s3://mybucket/myfolder/data1.json to /my/local/dir/data1.json
download: s3://mybucket/myfolder/data2.json to /my/local/dir/data2.json
download: s3://mybucket/myfolder/data3.json to /my/local/dir/data3.json
...
But then I check /my/local/dir for the files, and my directory is empty. I've tried using the sync command instead, I've tried copying just a single file - nothing seems to work right now. In the past I did successfully run this command and downloaded the files as expected.
Why are my files not being copied now, despite seeing no errors?
For testing you can go to your /my/local/dir folder and execute following command:
aws s3 sync s3://mybucket/myfolder .

Delete files older than 30 days under S3 bucket recursively without deleting folders using PowerShell

I can delete files and exclude folders with following script
aws s3 rm s3://my-bucket/ --recursive --exclude="*" --include="*/*.*"
when i tried to add pipe to delete only older files, i'm unable to.. please help with the script.
aws s3 rm s3://my-bucket/ --recursive --exclude="*" --include="*/*.*" | Where-Object {($_.LastModified -lt (Get-Date).AddDays(-31))}
The approach should be to list the files you need, then pipe the results to a delete call (a reverse of what you have). This might be better managed by a full blown script rather than a one line shell command. There's an article on this and some examples here.
Going forward, you should let S3 versioning take care of this, then you don't have to manage a script or remember to run it. Note: it'll only work with files that are added after versioning has been enabled.

How to create a empty folder in google storage(bucket) using gsutil command?

How we can create the folder using gsutil command. I am using Bashoperator in airflow where I need to use the gsutil Bash command, Bucket is already created I want to create a folder inside bucket.
I already tried with below command but It's not working for me.
$ gsutil cp <new_folder> gs://<bucketname>/
I am getting error - CommandException: No URLs matched: new_folder
Google Storage does not work like a regular file system as in Windows/Linux. It appears to have folders but in the background it behaves as it does not. It only allows us to create "folders" so we can organize better and for our comfort.
If you want to save data in specific folders from gsutil try this.
gsutil cp [filetocopy] gs://your-bucket/folderyouwant/your-file
It will store the item in a "folder".
Check this link for more gsutil cp information.
This is the logic behind Google Cloud Storage "Folders".
gsutil will make a bucket listing request for the named bucket, using
delimiter="/" and prefix="abc". It will then examine the bucket
listing results and determine whether there are objects in the bucket
whose path starts with gs://your-bucket/abc/, to determine whether to
treat the target as an object name or a directory name. In turn this
impacts the name of the object you create: If the above check
indicates there is an "abc" directory you will end up with the object
gs://your-bucket/abc/your-file; otherwise you will end up with the
object gs://your-bucket/abc.
Here you have more interesting information about this if you want.
Apparently the ability to create an empty folder using gsutil is a request that has been seen a few times but not yet satisfied. There appears to be some workarounds by using API that can then be scripted. The GitHub issue for the ability to create empty folders through scripting can be found here:
https://github.com/GoogleCloudPlatform/gsutil/issues/388
You cannot create or copy an empty folder to GCS with gsutil as far as I researched and tried about it. Yes, it's inconvenient somehow.
A folder must not be empty to be created or copied to GCS and don't forget the flag "-r" to create or copy a folder to GCS as shown below otherwise you will get error if a folder is empty or you forgot the flag -r:
gsutil cp -r <non-empty-folder> gs://your-bucket
// "-r" is needed for folder

move files within source folder to destination folder without deleting source folder

I am trying to move sub folders from one directory of a S3 bucket to another directory in the same bucket. After moving files within the sub folder, the main directory gets deleted, which must not happen for me.
aws s3 mv s3://Bucket-Name/Input-List/$i/ s3://Bucket-Name/Input-List-Archive/$i/ --recursive
COLLECTION_LIST=(A B C D E F)
for i in ${COLLECTION_LIST[#]}
do
if [ $i == "A" -o $i == "B" ]
then
aws s3 mv s3://Bucket-Name/Input-List/$i/ s3://Bucket-Name/Input-List-Archive/$i/ --recursive
else
aws s3 mv s3://Bucket-Name/Input-List/Others/$i/ s3://Bucket-Name/Input-List-Archive/Others/$i/ --recursive
Here all files within Input-List must be moved to Input-List-Archive without Input-List directory being deleted.
How about writing a script to copy the files recursively from sub folders and deleting those files from sub folder instead of using mv command?
Firstly, please note that directories/folders do not actually exist in Amazon S3.
For example, I could run this command:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This will work successfully, even if folder1 and folder2 do not exist.
The Amazon S3 management console will make those folders 'appear', but they do not actually exist.
If I then ran:
aws s3 rm s3://my-bucket/folder1/folder2/foo.txt
then the object would be deleted and the folders would 'disappear' (because they never actually existed).
Sometimes, however, people want a folder to appear. When a folder is created in the management console, a zero-length object is created with the Key (filename) set to the name of the folder. This will force an empty 'folder' to appear, but it is not actually a folder.
When listing objects in S3, API calls can return a common prefix which is similar in concept to a folder, but it is really just the "path portion" of a filename.
It is also worth mentioning that there is no "move" command in Amazon S3. Instead, when using the aws s3 mv command, the AWS CLI copies the object to a new object and then deletes the original object. This makes the object look like it was moved, but it actually was copied and deleted.
So, your options are:
Don't worry about folders. Just pretend they exist. They do not serve any purpose. OR
Create a new folder after the move. OR
Write your own program to Copy & Delete the objects without deleting the folder.
In fact, it is quite possible that the folder never existed in the first place (that is, there was no zero-length file with a Key matching the name of the folder), so it was never actually deleted. It's just that there was nothing to cause S3 to make the folder 'appear' to be there.

AWS S3 Bucket upload all only zip files

I'm trying to upload all my zip files in folder to my s3 bucket using this command
aws s3 cp recursive s3://<bucket-name>/%date:~4,2%-%date:~7,2%-%date:~10,4% --
recursive --include="*.zip" --exclude="*" --exclude="*/*/*"
the exclude only works in files but not in directory so my all my directory with zip files inside still uploading. Is there a way to upload only the zip files and exclude all kinds of other files and directories without specifying the name of directory/files.
https://docs.aws.amazon.com/cli/latest/reference/s3/index.html#use-of-exclude-and-include-filters
When there are multiple filters, the rule is the filters that appear later in the command take precedence over filters that appear earlier in the command.
Had a similar issue, turns out you need to put exclude="*" first.
aws s3 cp recursive s3://<bucket-name>/%date:~4,2%-%date:~7,2%-%date:~10,4% --
recursive --exclude="*" --exclude="*/*/*" --include="*.zip"
Should work