How do I get the S3 key's created date with boto? - amazon-web-services

Boto's S3 Key object contains last_modified date (available via parse_ts) but the base_field "date" (i.e., ctime) doesn't seem to be accessible, even though it's listed in key.base_fields.
Based on the table at http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html, it does seem that it is always automatically created (and I can't imagine a reason why it wouldn't be). It's probably just a simple matter of finding it somewhere in the object attributes, but I haven't been able to find it so far, although I did find the base_fields attribute which contains 'date'. (They're just a set and don't seem to have an available methods and I haven't been able to find documentation regarding ways to inspect them.)
For example, Amazon S3 maintains object creation date and size metadata and uses this information as part of object management.
Interestingly, create_time (system metadata field "Date" in link above) does not show up in the AWS S3 console, either, although last_modified is visible.
TL;DR: Because overwriting an S3 object is essentially creating a new one, the "last modified" and "creation" timestamp will always be the same.

Answering the old question, just in case others run into the same issue.
Amazon S3 maintains only the last modified date for each object.
For example, the Amazon S3 console shows the Last Modified date in the object Properties pane. When you initially create a new object, this date reflects the date the object is created. If you replace the object, the date changes accordingly. So when we use the term creation date, it is synonymous with the term last modified date.
Reference: https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html

i suggest use
key.last_modified since key.date seems to return the last time you viewed the file
so something like this :
key = bucket.get_key(key.name)
print(key.last_modified)

After additional research, it appears that S3 key objects returned from a list() may not include this metadata field!
The Key objects returned by the iterator are obtained by parsing the results of a GET on the bucket, also known as the List Objects request. The XML returned by this request contains only a subset of the information about each key. Certain metadata fields such as Content-Type and user metadata are not available in the XML. Therefore, if you want these additional metadata fields you will have to do a HEAD request on the Key in the bucket. (docs)
In other words, looping through keys:
for key in conn.get_bucket(bucket_name).list():
print (key.date)
... does not return the complete key with creation date and some other system metadata. (For example, it's also missing ACL data).
Instead, to retrieve the complete key metadata, use this method:
key = bucket.get_key(key.name)
print (key.date)
This necessitates an additional HTTP request as the docs clearly state above. (See also my original issue report.)
Additional code details:
import boto
# get connection
conn = boto.connect_s3()
# get first bucket
bucket = conn.get_all_buckets()[0]
# get first key in first bucket
key = list(bucket.list())[0]
# get create date if available
print (getattr(key, "date", False))
# (False)
# access key via bucket.get_key instead:
k = bucket.get_key(key.name)
# check again for create_date
getattr(k, "date", False)
# 'Sat, 03 Jan 2015 22:08:13 GMT'
# Wait, that's the current UTC time..?
# Also print last_modified...
print (k.last_modified)
# 'Fri, 26 Apr 2013 02:41:30 GMT'

If you have versioning enabled for your S3 bucket, you can use list_object_versions and find the smallest date for the object you're looking for which should be the date it was created

Related

Is Amazon S3's ListObjectsV2 self-consistent over multiple pages?

ListObjectsV2 can only return 1000 results, at which point you have to go back for another page.
Since Amazon S3 is now strongly consistent, and other updates can be happening to the bucket while I am listing its contents, is the second page going to be more results from the same point in time as the first page? Or is it going to reflect the state of the bucket at the point in time when the second page was requested?
For example, if I list a bucket, get the first page, delete a key which would have appeared on the second page, and then get the second page, will I still see the key that is now deleted?
Indeed, Amazon S3 is now strongly consistent. This means once you upload an object, all people that read that object are guaranteed to get the updated version of the object. This does not meant that two different API calls are guaranteed to be in the same "state". Notably, for downloads, there is a situation where one download can get parts of two versions of the object if it's updated while being downloaded. More details are available in this answer.
As for you question, the same basic rules apply: S3 is strongly consistent from one call to the next, once you make a change to the bucket or objects, any call after that update is guaranteed to get the updated data. This means as you page through the list of objects, you will see the changes as each API call gets the latest state:
import boto3
BUCKET='example-bucket'
PREFIX='so_question'
s3 = boto3.client('s3')
# Create a bunch of items
for i in range(3000):
s3.put_object(Bucket=BUCKET, Key=f"{PREFIX}/obj_{i:04d}", Body=b'')
args = {'Bucket': BUCKET, 'Prefix': PREFIX + "/",}
result = s3.list_objects_v2(**args)
# This shows objects 0 to 999
print([x['Key'] for x in result['Contents']])
# Delete an object
s3.delete_object(Bucket=BUCKET, Key=f"{PREFIX}/obj_{1100:04d}")
# Request the next "page" of items
args['ContinuationToken'] = result['NextContinuationToken']
result = s3.list_objects_v2(**args)
# This will not show object 1100, showing objects 1000 to 2000
print([x['Key'] for x in result['Contents']])
The upside of this and there's no way to get a list of all objects in a bucket (assuming it has more than 1000 items) in one API call: there's no way I'm aware of to get a complete "snapshot" of the bucket at any point, unless you can ensure the bucket doesn't change during listing the objects, of course.

amazon S3 upload object LastModified date keep changed?

We know that if we download a large file in linux or mac, the file last modified time will keep changed. Is that same in S3? The object last modified time will keep changed during uploading, or it just a simple timestamp to record the start of upload operation?
Doc says After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata.,i believe in order to maintain atomicity, if the put operation is successful then only it will update time.
Last-Modified,comes under category of system-defined metadata.
Last-Modified-> Description-> Object creation date or the last modified date, whichever is the latest.which a successful put operation will only update lastmodified time in other words.
modified date/time is updated by the S3 system itself, and reflects the time when the file completed uploading fully to S3 (S3 will not show incomplete transfers.)
The last modified date of an object is a direct reflection of when the object was last put into S3.
even a similar answer says the same https://stackoverflow.com/a/40699793/13126651 - "The Last-Modified timestamp should match the Date value returned in the response headers from the successful PUT request."

Optimize photo storage nomenclature on Amazon S3

I have to store lots of photos (+1 000 000, one max 5MB) and I have a database, every record has 5 photos, so what is the best solution:
Create directory for each record's slug/id, and upload photos inside it
Put all photos into one directory, and in name contain id or slug of record
Put all photos into one directory, and in database to each record add field with names of photos.
I use Amazon S3 server.
i would suggest you to name your photos like this while uploading in batch:
user1/image1.jpeg
user2/image2.jpeg
Though these names would not effect the way objects are stored on s3 , these names will simply be 'keys' of 'objects', as there is no folder like hierarchical structure in s3 , but doing these will make objects appear in folders which will help to segregate images easily if you want later to do so.
for example , let us suppose you stored all images with unique names and you are using unique UUID to map records in database to images in your bucket.
But later on suppose you want all 5 photos of a particular user, then what will you have to do is
scan the database for particular username
Retrieve UUID's for the images of that user
and then using the UUID for fetching images from s3
But if you name images by prefixing username to it , you can directly fetch images from s3 without making any reference to your database.
For example, to list all photos of user1, you can use this small code snippet in python :
import boto3
s3 = boto3.resource('s3')
Bucket=s3.Bucket('bucket_name')
for obj in Bucket.objects.filter(Prefix='user1/'):
print(obj.key)
while if you don't use any user-id in key of object , then you have to refer database to do a mapping between photos and records even just to get a list of images of a particular user
A lot of this depends on your use-case, such as how the database and the photos will be used. There is not enough information here to give a definitive answer.
However, some recommendations for the storage side...
The easiest option is just to use a UUID for each photo. This is effectively a random name that has no meaning. Store that name in your database and your system will know which image relates to which record. There is no need to ever rename the images because the names are just Unique IDs and convey no further information.
When you want to provide access to a particular image, your application can generate an Amazon S3 pre-signed URL that grants time-limited access to an object. After the expiry time, the URL does not work so the object remains private. Granting access in this manner means that there is no need to group images into directories by "owner", since access is granted per-object rather than per-owner.
Also, please note that Amazon S3 doesn't actually support folders. Rather, the Key ("filename") of the object is the entire path (eg user-2/foo.jpg). This makes it more human-readable (because the objects 'appear' to be in folders), but doesn't actually impact the way data is stored behind-the-scenes.
Bottom line: It doesn't really matter how you store the images. What matters is that you store the image name in your database so you know which image matches which record. Avoid situations where you need to rename images - just give them a name and keep it.

Boto3 S3 update metadata of existing object

This is a strange thing that I can't wrap my head around just yet. Why is it that when I use Boto3 to put an "expires" datetime on an object that gets put to AWS S3 by put_object, it gets stored and shows in the AWS cosnole as "metadata." However, when I retrieve the object my "expires" datetime shows up as a datetime element of the object rather than a datetime element in the Metadata dictionary.
This question puzzled me but I worked around it without understanding it. Now it comes to me that using this method: How to update metadata of an existing object in AWS S3 using python boto3? which is copied below for ease of reading:
import boto3
s3 = boto3.resource('s3')
s3_object = s3.Object('bucket-name', 'key')
s3_object.metadata.update({'id':'value'})
s3_object.copy_from(
CopySource={'Bucket':'bucket-name', 'Key':'key'},
Metadata=s3_object.metadata, MetadataDirective='REPLACE')
Causes my "expires" metadata to be destroyed. Of course I tried this:
metakeys.metadata.update({'x-amz-meta-hell':'yes', 'expires': metakeys.expires})
But that throws: AttributeError: 'datetime.datetime' object has no attribute 'encode'
It is true that you can update the metadata effectively without destroying the "expires" element through the console. So to some extent I am suggesting that the method above is either A: Not viable or not correct, B: Broken, or C: both broken and not correct
The question is - what is the correct way to update metadata of an object without destroying this or future odd behaviors of AWS S3 objects?
If you do a put_object() with "Expires" parameter, you should get something like this.
{
'Expiration': 'string',
'ETag': 'string',
..........
}
However, the Expiration is an attributes, it is NOT your user custom metadata. All user custom meta data can only be STRING, and all will carry a prefix x-amz-meta- when your check the metadata.
metakeys.metadata.update({'x-amz-meta-hell':'yes', 'expires': metakeys.expires})
Above update will failed, if the given metakeys.expires is not string. it can be as simple as using isoformat() to convert it to string.
Although copy_object() allow you specify explicit expiration datetime, HOWEVER, the API documentation doesn't explicit mentioned that original file expiration datetime will be copy over to target object.

boto - What exactly is a key?

AS the title says, what is a key in boto?
What does it encapsulate (fields, data structures, methods etc.)?
How does one access the file contents for files in an AWS bucket using a key/boto?
I was not able to find this information on their official documentation or on any other third party website. Could anybody provide this info?
Here are some examples of the usage of the key object:
def download_file(key_name, storage):
key = bucket.get_key(key_name)
try:
storage.append(key.get_contents_as_string())
except:
print "Some error message."
and:
for key in keys_to_process:
pool.spawn_n(download_file, key.key, file_contents)
pool.waitall()
In your code example - key is the object reference to the unique identifier within a bucket.
Think of buckets as a table in a database
think of keys as the rows in the table
you reference the key (better known as an object) in the bucket.
often in boto (not boto3) works like this
from boto.s3.connection import S3Connection
connection = S3Connection() # assumes you have a .boto or boto.cfg setup
bucket = connection.get_bucket('my_bucket_name_here') # this is like the table name in SQL, select OBJECT form TABLENAME
key = bucket.get_key('my_key_name_here') this is the OBJECT in the above SQL example. key names are a string, and there is a convention that says if you put a '/' in the name, a viewer/tool should treat it like a path/folder for the user, e.g. my/object_name/is_this is really just a key inside the bucket, but most viewers will show a my folder, and an object_name folder, and then what looks like a file called is_this simply by UI convention
Since you appear to be talking about Simple Storage Service (S3), you'll find that information on Page 1 of the S3 documentation.
Each object is stored and retrieved using a unique developer-assigned key.
A key is the unique identifier for an object within a bucket. Every object in a bucket has exactly one key. Because the combination of a bucket, key, and version ID uniquely identify each object, Amazon S3 can be thought of as a basic data map between "bucket + key + version" and the object itself. Every object in Amazon S3 can be uniquely addressed through the combination of the web service endpoint, bucket name, key, and optionally, a version. For example, in the URL http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl, "doc" is the name of the bucket and "2006-03-01/AmazonS3.wsdl" is the key.
http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html
The key is just a string -- the "path and filename" of the object in the bucket, without a leading /.