This is a strange thing that I can't wrap my head around just yet. Why is it that when I use Boto3 to put an "expires" datetime on an object that gets put to AWS S3 by put_object, it gets stored and shows in the AWS cosnole as "metadata." However, when I retrieve the object my "expires" datetime shows up as a datetime element of the object rather than a datetime element in the Metadata dictionary.
This question puzzled me but I worked around it without understanding it. Now it comes to me that using this method: How to update metadata of an existing object in AWS S3 using python boto3? which is copied below for ease of reading:
import boto3
s3 = boto3.resource('s3')
s3_object = s3.Object('bucket-name', 'key')
s3_object.metadata.update({'id':'value'})
s3_object.copy_from(
CopySource={'Bucket':'bucket-name', 'Key':'key'},
Metadata=s3_object.metadata, MetadataDirective='REPLACE')
Causes my "expires" metadata to be destroyed. Of course I tried this:
metakeys.metadata.update({'x-amz-meta-hell':'yes', 'expires': metakeys.expires})
But that throws: AttributeError: 'datetime.datetime' object has no attribute 'encode'
It is true that you can update the metadata effectively without destroying the "expires" element through the console. So to some extent I am suggesting that the method above is either A: Not viable or not correct, B: Broken, or C: both broken and not correct
The question is - what is the correct way to update metadata of an object without destroying this or future odd behaviors of AWS S3 objects?
If you do a put_object() with "Expires" parameter, you should get something like this.
{
'Expiration': 'string',
'ETag': 'string',
..........
}
However, the Expiration is an attributes, it is NOT your user custom metadata. All user custom meta data can only be STRING, and all will carry a prefix x-amz-meta- when your check the metadata.
metakeys.metadata.update({'x-amz-meta-hell':'yes', 'expires': metakeys.expires})
Above update will failed, if the given metakeys.expires is not string. it can be as simple as using isoformat() to convert it to string.
Although copy_object() allow you specify explicit expiration datetime, HOWEVER, the API documentation doesn't explicit mentioned that original file expiration datetime will be copy over to target object.
Related
I have a lot of LogicPro files (.logicx) stored in an S3 bucket, and I want to extract the creation date from all of these files. This should not be the creation date of the object on s3, but the date for when it was created on my MacBook (the "Created" attribute in Finder).
I've tried to retrieve the metadata from the object using the HEAD action:
aws s3api head-object --bucket <my-bucket> --key <my-object-key>
The output did not contain any information about the creation date of the actual file.
{
"AcceptRanges":"bytes",
"LastModified":"2021-10-28T13:22:33+00:00",
"ContentLength":713509,
"ETag":"\"078c18ff0ab5322ada843a18bdd3914e\"",
"VersionId":"9tseZuMRenKol1afntNM8mkRbeXo9n2W",
"ContentType":"image/jpeg",
"ServerSideEncryption":"AES256",
"Metadata":{},
"StorageClass":"STANDARD_IA"
}
Is it possible to extract the file creation metadata attribute from an S3 object, without having to download the whole object?
Is it possible to extract the file creation metadata attribute from an S3 object, without having to download the whole object?
No, unfortunately.
To obtain metadata within the object itself like the data created attribute, you will need to download the entire file first. Amazon S3 does not store this information as it is more of an object store than a file storage service. It only sets & stores system-defined object metadata determined by S3, by default.
You could try extracting it before uploading & setting it as user-defined object metadata which would then be returned in the Metadata field above, or try to see if you can obtain what you need from a pre-defined byte range via byte-range fetches (essentially a HTTP range request).
I have a Lambda that runs when files are uploaded to S3-A bucket and moves those files to another bucket S3-B. The challenge is that I need create a folder inside S3-B bucket with a corresponding date of uploaded files and move the files to the folder. Any help or ideas are greatly apprecited. It might sound confusing so feel free to ask questions.Thank you!
Here's a Lambda function that can be triggered by an Amazon S3 Event and move the object to another bucket:
import json
import urllib
from datetime import date
import boto3
DEST_BUCKET = 'bucket-b'
def lambda_handler(event, context):
s3_client = boto3.client('s3')
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
dest_key = str(date.today()) + '/' + key
s3_client.copy_object(
Bucket=DEST_BUCKET,
Key=dest_key,
CopySource=f'{bucket}/{key}'
)
The only thing to consider is timezones. The Lambda function runs in UTC and you might be expecting a slightly different date in your timezone, so you might need to adjust the time accordingly.
Just to clear up some confusion, in S3 there is no such thing as a folder. What you see in the interface is actually running the ListObjects using a prefix. The prefix is what you are seeing as the folder hierarchy.
To help illustrate this an object might have a key (which is a piece of metadata that defines its name) of folder/subfolder/file.txt, in the console you're actually using a prefix of folder/subfolder/*. This makes sense if you think of S3 more like a key value store, where the value is the object itself.
For this reason you can make a key on a prefix that has not existed before without creating any other hierarchical features.
In your Lambda function, you will need to download the files locally and then upload them to their new object key (remembering to delete the old object). Some SDKS will have an automated function that will perform all of these steps for you (such as Boto3 with the copy function).
I'm trying to use put_object_lock_configuration() API call to disable object locking on an Amazon S3 bucket using python boto3.
This is how I use it:
response = s3.put_object_lock_configuration(Bucket=bucket_name,
ObjectLockConfiguration={
'ObjectLockEnabled': 'Disabled'});
I always get exception with the following error.
botocore.exceptions.ClientError: An error occurred (MalformedXML) when calling the PutObjectLockConfiguration operation: The XML you provided was not well-formed or did not validate against our published schema
I suspect I miss the 2 parameters 'Token' and 'ContentMD5'. Does anyone know how do I get these values?
The only value of 'ObjectLockEnabled' allowed is 'Enabled'. My intention is to disable object lock. but this is not possible. because object lock is defined during bucket creation time and it can't be changed afterward. However, I can provide empty rule and the retention mode will become 'None', which is essentially no object lock.
Here is the boto3 code for blank retention rule, the precondition is to use mode=GOVERNANCE in the first place.
client.put_object_retention(
Bucket=bucket_name, Key=object_key,
Retention={},
BypassGovernanceRetention=True
)
I'm using DynamoDB to make a Room Booking Website on Django. Every time I refresh the page, the console throws me a ResourceNotFoundException - Requested resource not found, which seems to be happening when I use table.scan(). On certain pages the table still loads, but on others' I'm shown a debug error.
Here's a part of my code:
dynamodb = boto3.resource(
'dynamodb',
aws_access_key_id="XXXXXX",
aws_secret_access_key="XXXXXX",
region_name="eu-west-2"
)
table = dynamodb.Table(table_name)
response = table.scan(TableName=table_name)
I'm entirely sure that the table_name value contains the correct string.
What could be the problem?
You are confusing the Client-level scan method with the Resource-level scan method. The former requires you to provide a TableName parameter, while the latter does not (because it's a method on an existing Table object, so the table name is implicitly known).
Also, see Difference in boto3 between resource, client, and session?
Boto's S3 Key object contains last_modified date (available via parse_ts) but the base_field "date" (i.e., ctime) doesn't seem to be accessible, even though it's listed in key.base_fields.
Based on the table at http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html, it does seem that it is always automatically created (and I can't imagine a reason why it wouldn't be). It's probably just a simple matter of finding it somewhere in the object attributes, but I haven't been able to find it so far, although I did find the base_fields attribute which contains 'date'. (They're just a set and don't seem to have an available methods and I haven't been able to find documentation regarding ways to inspect them.)
For example, Amazon S3 maintains object creation date and size metadata and uses this information as part of object management.
Interestingly, create_time (system metadata field "Date" in link above) does not show up in the AWS S3 console, either, although last_modified is visible.
TL;DR: Because overwriting an S3 object is essentially creating a new one, the "last modified" and "creation" timestamp will always be the same.
Answering the old question, just in case others run into the same issue.
Amazon S3 maintains only the last modified date for each object.
For example, the Amazon S3 console shows the Last Modified date in the object Properties pane. When you initially create a new object, this date reflects the date the object is created. If you replace the object, the date changes accordingly. So when we use the term creation date, it is synonymous with the term last modified date.
Reference: https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html
i suggest use
key.last_modified since key.date seems to return the last time you viewed the file
so something like this :
key = bucket.get_key(key.name)
print(key.last_modified)
After additional research, it appears that S3 key objects returned from a list() may not include this metadata field!
The Key objects returned by the iterator are obtained by parsing the results of a GET on the bucket, also known as the List Objects request. The XML returned by this request contains only a subset of the information about each key. Certain metadata fields such as Content-Type and user metadata are not available in the XML. Therefore, if you want these additional metadata fields you will have to do a HEAD request on the Key in the bucket. (docs)
In other words, looping through keys:
for key in conn.get_bucket(bucket_name).list():
print (key.date)
... does not return the complete key with creation date and some other system metadata. (For example, it's also missing ACL data).
Instead, to retrieve the complete key metadata, use this method:
key = bucket.get_key(key.name)
print (key.date)
This necessitates an additional HTTP request as the docs clearly state above. (See also my original issue report.)
Additional code details:
import boto
# get connection
conn = boto.connect_s3()
# get first bucket
bucket = conn.get_all_buckets()[0]
# get first key in first bucket
key = list(bucket.list())[0]
# get create date if available
print (getattr(key, "date", False))
# (False)
# access key via bucket.get_key instead:
k = bucket.get_key(key.name)
# check again for create_date
getattr(k, "date", False)
# 'Sat, 03 Jan 2015 22:08:13 GMT'
# Wait, that's the current UTC time..?
# Also print last_modified...
print (k.last_modified)
# 'Fri, 26 Apr 2013 02:41:30 GMT'
If you have versioning enabled for your S3 bucket, you can use list_object_versions and find the smallest date for the object you're looking for which should be the date it was created