How to upload a file to S3 without a name - amazon-web-services

I am reading data from S3 bucket using Athena and the data from following file is correct.
# aws s3 ls --human s3://some_bucket/email_backup/email1/
2020-08-17 07:00:12 0 Bytes
2020-08-17 07:01:29 5.0 GiB email_logs_old1.csv.gz
When I change the path to _updated as shown below, I get an error.
# aws s3 ls --human s3://some_bucket/email_backup_updated/email1/
2020-08-22 12:01:36 5.0 GiB email_logs_old1.csv.gz
2020-08-22 11:41:18 5.0 GiB  
This is because of the extra file without name in the same location. I have no idea how I managed to upload a file without a name. I will like to know how to repeat it (so that I can avoid it)

All S3 files have a name (in fact the full path is actually the object key, which is the metadata to define your object name).
If you see a blank named file in the path of s3://some_bucket/email_backup_updated/email1/ you have likely created a file named s3://some_bucket/email_backup_updated/email1/.
As I mentioned earlier S3 objects use key, for this reason the file hierarchy does not exist. You simply are filtering by prefix instead.
You should be able to validate this by performing the following without the trailing slash aws s3 ls --human s3://some_bucket/email_backup_updated/email1.

If you add an extra non-breaking space at the end of the destination path, the file will be copied to S3 but with a blank name. for e.g.
aws s3 cp t.txt s3://some_bucket_123/email_backup_updated/email1/ 
(Note the non-breaking space after email1/ )
\xa0 is actually non-breaking space in Latin1, also chr(160). The non breaking space itself is the name of the file!
Using the same logic, I can remove the "space" file by adding the non-breaking space at the end.
aws s3 rm s3://some_bucket_123/email_backup_updated/email1/ 
I can also login to console and remove it from User Interface.

Related

AWS S3 file with same name does not get overwrite but gets characters added at the end of filename

Below is an example for my scenario,
I have a Django API which allows user to upload images to a certain directory, the images will be stored in an S3 bucket. Let's say the file name is 'example.jpeg'
User again uploads image with the same name 'example.jpeg' to the same directory.
Both of them correctly show up in the same directory but the second one gets additional characters at the end of the filename like this 'example_785PmrM.jpeg'. I suspect the additional characters are added by s3 but my research says s3 will overwrite the file with same name.
How can I enable the overwrite feature, I haven't seen any option for this.
Thanks
S3 itself does not change a key on it's own. The only option I see that can be impacting this is Django's storage backend for S3:
AWS_S3_FILE_OVERWRITE (optional: default is True)
By default files with the same name will overwrite each other. Set this to False to have extra characters appended.
So you should set AWS_S3_FILE_OVERWRITE to True to prevent this behavior.
Depending on your exact needs, consider enabling S3 versioning so you can access previous versions of a objects as they're overwritten in S3 in the future.

Is there a way to copy Google Cloud Storage object from SDK Shell to network drive like Box?

Is there a way to copy a GCS object via SDK Shell to a network drive like Box?
What i've tried is below. Thanks.
gsutil cp gs://your-bucket/some_file.tif C:/Users/Box/01. name/folder
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
There appears to be a typo in your destination:
C:/Users/Box/01. name/folder
There is a space after the period and before 'name' - you'll need to either wrap it in quotes or escape that space. Looks like you're on Windows; here's a doc on how to escape spaces in file paths.

Copy Folder Structure from one bucket to antother without copying actual data

We have a requirement to create a bucket with the same directory structure as another s3 bucket, as AWS s3 is object-based storage I could not able to find any directory listing or copying command.
any solution or workaround will be very helpful.
The most straightforward way to do this is to use an S3 replication rule on the bucket to copy the source to a destination bucket.
As you know, S3 does not have "folders" in the same way that a filesystem does. However, the AWS console simulates them with zero-length objects (doc):
The Amazon S3 console implements folder object creation by creating a zero-byte object with the folder prefix and delimiter value as the key
So if you want to replicate the "folder" structure, you will need to do the following:
Iterate all objects in the bucket.
For each object, strip off everything after the last slash (/). For example, if you have the object /foo/bar/baz, you want to end up with the key /foo/bar/ (the trailing slash is important).
Call PutObject in your language of choice, using that key and a zero-length body.
Repeat steps 2 and 3, removing subsequent trailing components (so /foo/bar/baz becomes /foo/bar/, and that becomes /foo/).
You should use whatever "set" implementation your language of choice supports, to avoid attempting to create the same "folder" multiple times.
The below piece of code (python) will create the empty folders structure the same as other s3 bucket folder structure.
src_bucketName = 'source_bucket'
dest_bucketName = 'dest_bucket'
uniqPath = []
s3 = boto3.resource('s3')
s3c = boto3.client('s3')
bucket = s3.Bucket(src_bucketName)
for my_bucket_object in bucket.objects.all():
path = my_bucket_object.key.rsplit('/',1)
uniqPath.append(path[0])
dir = set(uniqPath)
udir = list(dir)
for i in udir:
s3c.put_object(
Bucket=dest_bucketName,
Key=(i+'/')
)

uploading file to specific folder in S3 bucket using boto3

My code is working. The only issue I'm facing is that I cannot specify the folder within the S3 bucket that I would like to place my file in. Here is what I have:
with open("/hadoop/prodtest/tips/ut/s3/test_audit_log.txt", "rb") as f:
s3.upload_fileobj(f, "us-east-1-tip-s3-stage", "BlueConnect/test_audit_log.txt")
Explanation from #danimal captures pretty much everything. If you wanted to just create a folder-like object in s3, you can simply specify that folder-name and end it with "/", so that when you look at it from the console, it will look like a folder.
It's rather useless, an empty object, without a body (consider it as a key with null value) just for eye-candy but if you really want to do it, you can.
1) You can create it on the console interactively, as it gives you that option
2_ You can use aws sdk. boto3 has put_object method for s3 client, where you specify the key as "your_folder_name/", see example below:
import boto3
session = boto3.Session() # I assume you know how to provide credentials etc.
s3 = session.client('s3', 'us-east-1')
bucket = s3.create_bucket('my-test-bucket')
response = s3.put_object(Bucket='my-test-bucket', Key='my_pretty_folder/' # note the ending "/"
And there you have your bucket.
Again, when you are uploading a file you specify "my-test-bucket/my_file" and what you did there is create a "key" with name "my-test-bucket/my_file" and put the content of your file as its "value".
In this case you have 2 objects in the bucket. First object has null body and looks like a folder, while the second one looks like it is inside that but as #danimal pointed out in reality you created 2 keys in the same flat hierarchy, it just "looks-like" what we are used to seeing in a file system.
If you delete the file, you still have the other objects, so on the aws console, it looks like folder is still there but no files inside.
If you skipped creating the folder and simply uploaded the file like you did, you would still see the folder structure in AWS Console but you have a single object at that point.
When you however list the objects from command line, you would see a single object and if you delete it on the console it looks like folder is gone too.
Files ('objects') in S3 are actually stored by their 'Key' (~folders+filename) in a flat structure in a bucket. If you place slashes (/) in your key then S3 represents this to the user as though it is a marker for a folder structure, but those folders don't actually exist in S3, they are just a convenience for the user and allow for the usual folder navigation familiar from most file systems.
So, as your code stands, although it appears you are putting a file called test_audit_log.txt in a folder called BlueConnect, you are actually just placing an object, representing your file, in the us-east-1-tip-s3-stage bucket with a key of BlueConnect/test_audit_log.txt. In order then to (seem to) put it in a new folder, simply make the key whatever the full path to the file should be, for example:
# upload_fileobj(file, bucket, key)
s3.upload_fileobj(f, "us-east-1-tip-s3-stage", "folder1/folder2/test_audit_log.txt")
In this example, the 'key' of the object is folder1/folder2/test_audit_log.txt which you can think of as the file test_audit_log.txt, inside the folder folder1 which is inside the folder folder2 - this is how it will appear on S3, in a folder structure, which will generally be different and separate from your local machine's folder structure.

aws s3 replace file atomically

Environment
I copied a file, ./barname.bin, to s3, using the command aws s3 cp ./barname.bin s3://fooname/barname.bin
I have a different file, ./barname.1.bin that I want to upload in place of that file
How can I upload and replace (overwrite) the file at s3://fooname/barname.bin with ./barname.1.bin?
Goals:
Don't change the s3 url used to access the file (new file should also be available at s3://fooname/barname.bin).
zero/minimum 'downtime'/unavailability of the s3 link.
As I understand it, you've got an existing file located at s3://fooname/barname.bin and you want to replace it with a new file. To replace that, you should just upload a new one on top of the old one:
aws s3 cp ./barname.1.bin s3://fooname/barname.bin.
The old file will be replaced. According to the S3 docs, this is atomic, though due to EC2s replication pattern, requests for the key may still return the old file for some time.
Note (thanks #Chris Kuehl): though the replacement is technically atomic, it's possible for multipart downloads to end up with chunks from different versions of the file. 😬