How to get existing s3 bucket(subdirectory)? - amazon-web-services

For a CDK app I'm working on, trying to get an existing s3 bucket path and then copy few local files to that bucket. However, I get this error when the code tries to search for the bucket,
Failed to create resource. Command '['python3', '/var/task/aws', 's3', 'sync', '--delete', '/tmp/tmpew0gwu1w/contents', 's3://dir/subdir/']' returned non-zero exit status 1. using the code below,
If anyone could help me with this, that'd be great, not sure if the 'bucket-name' parameter can take bucket path or not.
Line of code is as follows,
IBucket byName = Bucket.fromBucketName(this, "bucket-name", "dir/subdir");
Note: If I try to copy the files to the main directory(dir in this case), it works fine.

The “path” is not part of the bucket name, it’s part of the object’s key (filename). Amazon S3 is an object store and doesn’t have directories like file systems do.
Technically, every object in a bucket is on the top level with the “path” being prefixed to its name.
So if you have something like s3://bucket/path/sub-path/file.txt, the bucket name is bucket and the object key (similar to a filename) is path/sub-path/file.txt with path/sub-path/ being the prefix.
When using the aws s3 sync CLI command, the prefix gets converted into a directory structure on the local drive, and vice versa.
For more details, please refer to How do I use folders in an S3 bucket?

Related

Deleting a file that doesn't exist on S3 deletes the folder if that is the last file. How do we prevent this?

So, I have the following structure on S3:
mainbucket
DataFeeds/
Statement/
We had incidents where the DataFeeds/ folder was being deleted! So, I tested with following:
aws s3api put-object --bucket mainbucket --key DataFeeds/.donotdelete
But, if I execute this (deleting blah.txt even if it does not exists) the DataFeeds/ folder gets deleted too:
aws s3 rm s3://mainbucket/DataFeeds/blah.txt
So, how do we prevent a folder from being deleted on S3?
Versions used:
aws-cli/2.2.46 Python/3.9.7 Darwin/20.6.0 source/x86_64 prompt/off
Folders do not exist in Amazon S3. It is a 'flat' storage service where the Key (filename) of an object includes the full path, including directories.
Amazon S3 will 'simulate' folders for you. For example, if you upload a file to invoices/january.txt, then the invoices directory will 'magically' appear. If you then delete that object, the directory will then 'disappear'. This is because it never actually existed.
If you use the Create folder button in the S3 management console, it will create a zero-length object with the same name as the directory. This will 'force' the directory to appear in the bucket listing. Deleting the zero-length object will cause the directory to disappear if there are no objects with that same Prefix (path).
The best advice for using S3 is to pretend that folders exist. You can place an object in any path and the (pretend) directories will magically appear. Do not worry about directories disappearing, since they never actually existed!
If you really need empty directories to stay around, use that Create folder button to create the zero-length object. It will stay around until you delete the zero-length object.

API call to get the list of files from s3 path in aws using boto3 python library

I am new to aws. I'm looking for python boto3 library API calls in aws for below scenarios.
API call to get the list of files using s3 path
API call to remove all the files under s3 path
API call to check if the given s3 path is exits or not
I appreciate If any one can help me on this.
"Paths" (directories, folders) do not actually exist in Amazon S3. It uses a flat (non-hierarchical) storage model where the filename (Key) of each object contains the full path of the object.
However, much of the functionality of paths is still provided by referencing a Prefix, which refers to the first part of a Key.
For example, let's say there is an object with a Key of: invoices/january/invoice.txt
It has a Prefix of invoices/ and also has a prefix of invoices/january/. A Prefix simply checks "Does the Key start with this string?"
Therefore, you can get the list of files using s3 path with:
import boto3
s3_resource = boto3.resource('s3')
for object in s3_resource.Bucket('my-bucket').objects.filter(Prefix='invoices/'):
print(object.key)
Or, using the client method:
import boto3
s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='my-bucket', Prefix='invoices/')
for object in response['Contents']:
print(object['Key'])
To remove all the files under s3 path you would need to use the above code to iterate through each object, and then call delete_object(). Alternatively, you could build a list of Keys to delete and then call delete_objects().
To check if the given s3 path is exits or not you can call head_object(). Please note that this will work on an object, but will not work on a "path" because directories do not actually exist.
However, if you create a Folder in the Amazon S3 management console, a zero-length object is created with the name of the directory. This will make it "appear" that there is a directory, but it is not required. You can create an object in any path without actually creating the directories. They will simply "appear". Then, when all objects in that directory are deleted, the directory will no longer be displayed. It's magic!
See also: Amazon S3 examples — Boto3 documentation

uploading file to specific folder in S3 bucket using boto3

My code is working. The only issue I'm facing is that I cannot specify the folder within the S3 bucket that I would like to place my file in. Here is what I have:
with open("/hadoop/prodtest/tips/ut/s3/test_audit_log.txt", "rb") as f:
s3.upload_fileobj(f, "us-east-1-tip-s3-stage", "BlueConnect/test_audit_log.txt")
Explanation from #danimal captures pretty much everything. If you wanted to just create a folder-like object in s3, you can simply specify that folder-name and end it with "/", so that when you look at it from the console, it will look like a folder.
It's rather useless, an empty object, without a body (consider it as a key with null value) just for eye-candy but if you really want to do it, you can.
1) You can create it on the console interactively, as it gives you that option
2_ You can use aws sdk. boto3 has put_object method for s3 client, where you specify the key as "your_folder_name/", see example below:
import boto3
session = boto3.Session() # I assume you know how to provide credentials etc.
s3 = session.client('s3', 'us-east-1')
bucket = s3.create_bucket('my-test-bucket')
response = s3.put_object(Bucket='my-test-bucket', Key='my_pretty_folder/' # note the ending "/"
And there you have your bucket.
Again, when you are uploading a file you specify "my-test-bucket/my_file" and what you did there is create a "key" with name "my-test-bucket/my_file" and put the content of your file as its "value".
In this case you have 2 objects in the bucket. First object has null body and looks like a folder, while the second one looks like it is inside that but as #danimal pointed out in reality you created 2 keys in the same flat hierarchy, it just "looks-like" what we are used to seeing in a file system.
If you delete the file, you still have the other objects, so on the aws console, it looks like folder is still there but no files inside.
If you skipped creating the folder and simply uploaded the file like you did, you would still see the folder structure in AWS Console but you have a single object at that point.
When you however list the objects from command line, you would see a single object and if you delete it on the console it looks like folder is gone too.
Files ('objects') in S3 are actually stored by their 'Key' (~folders+filename) in a flat structure in a bucket. If you place slashes (/) in your key then S3 represents this to the user as though it is a marker for a folder structure, but those folders don't actually exist in S3, they are just a convenience for the user and allow for the usual folder navigation familiar from most file systems.
So, as your code stands, although it appears you are putting a file called test_audit_log.txt in a folder called BlueConnect, you are actually just placing an object, representing your file, in the us-east-1-tip-s3-stage bucket with a key of BlueConnect/test_audit_log.txt. In order then to (seem to) put it in a new folder, simply make the key whatever the full path to the file should be, for example:
# upload_fileobj(file, bucket, key)
s3.upload_fileobj(f, "us-east-1-tip-s3-stage", "folder1/folder2/test_audit_log.txt")
In this example, the 'key' of the object is folder1/folder2/test_audit_log.txt which you can think of as the file test_audit_log.txt, inside the folder folder1 which is inside the folder folder2 - this is how it will appear on S3, in a folder structure, which will generally be different and separate from your local machine's folder structure.

AWS S3 Listing API - How to list everything inside S3 Bucket with specific prefix

I am trying to list all items with specific prefix in S3 bucket. Here is directory structure that I have:
Item1/
Item2/
Item3/
Item4/
image_1.jpg
Item5/
image_1.jpg
image_2.jpg
When I set prefex to be Item1/Item2, I get as a result following keys:
Item1/Item2/
Item1/Item2/Item3/Item4/image_1.jpg
Item1/Item2/Item3/Item5/image_1.jpg
Item1/Item2/Item3/Item5/image_2.jpg
What I would like to get is:
Item1/Item2/
Item1/Item2/Item3
Item1/Item2/Item3/Item4
Item1/Item2/Item3/Item5
Item1/Item2/Item3/Item4/image_1.jpg
Item1/Item2/Item3/Item5/image_1.jpg
Item1/Item2/Item3/Item5/image_2.jpg
Is there anyway to achieve this in golang?
Folders do not actually exist in Amazon S3. It is a flat object storage system.
For example, using the AWS Command-Line Interface (CLI) I could copy a command to an Amazon S3 bucket:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This work just fine, even though folder1 and folder2 do not exist. This is because objects are stored with a Key (filename) that includes the full path of the object. So, the above object actually has a Key (filename) of:
folder1/folder2/foo.txt
However, to make things easier for humans, the Amazon S3 management console makes it appear as though there are folders. In S3, these are called Common Prefixes rather than folders.
So, when you make an API call to list the contents of the bucket while specifying a Prefix, it simply says "List all objects whose Key starts with this string".
Your listing doesn't show any folders because they don't actually exist.
Now, just to contradict myself, it actually is possible to create a folder (eg by clicking Create folder in the management console). This actually creates a zero-length object with the same name as the folder. The folder will then appear in listings because it is actually listing the zero-length object rather than the folder.
This is probably why Item1/Item2/ appears in your listing, but Item1/Item2/Item3 does not. Somebody, at some stage, must have "created a folder" called Item1/Item2/, which actually created a zero-length object with that Key.

Sync command for OpenStack Object Storage (like S3 Sync)?

Using the S3 CLI, I can sync a local directory with an S3 bucket using the following command:
aws s3 sync s3://mybucket/ ./local_dir/
This command is a complete sync. It uploads new files, updates changed files, and deletes removed files. I am trying to figure out how to do something equivalent using the OpenStack Object Storage CLI:
http://docs.openstack.org/cli-reference/content/swiftclient_commands.html
The upload command has a --changed option. But I need a complete sync that is also capable of deleting local files that were removed.
Does anyone know if I can do something equivalent to s3 sync?
The link you mentioned has this :
`
objects –
A list of file/directory names (strings) or SwiftUploadObject instances containing a source for the created object, an object name, and an options dict (can be None) to override the options for that individual upload operation`
I'm thinking, if you pass the directory and the --changed option it should work.
I don't have a swift to test with. Can you try again?