amazon S3 upload object LastModified date keep changed? - amazon-web-services

We know that if we download a large file in linux or mac, the file last modified time will keep changed. Is that same in S3? The object last modified time will keep changed during uploading, or it just a simple timestamp to record the start of upload operation?

Doc says After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata.,i believe in order to maintain atomicity, if the put operation is successful then only it will update time.
Last-Modified,comes under category of system-defined metadata.
Last-Modified-> Description-> Object creation date or the last modified date, whichever is the latest.which a successful put operation will only update lastmodified time in other words.
modified date/time is updated by the S3 system itself, and reflects the time when the file completed uploading fully to S3 (S3 will not show incomplete transfers.)
The last modified date of an object is a direct reflection of when the object was last put into S3.
even a similar answer says the same https://stackoverflow.com/a/40699793/13126651 - "The Last-Modified timestamp should match the Date value returned in the response headers from the successful PUT request."

Related

Amazon S3 LastModified time vs Upload complete time

I am a bit confused on meaning of LastModified time in S3.
Suppose I start upload of a large file at 10:00 AM and say upload takes 4 minutes. I am seeing that instead of showing LastModified time as 10:04 AM its showing the same as 10:00 AM, i.e. when I initiated the upload.
In Azure Blob Storage however lastModified time however seems to be the time when upload completed.
Am I interpreting this incorrectly for S3 ? I mean how can we have lastModified time as the time when upload starts because technically object is not created until all bytes are uploaded, right ?
Looking at answers like: amazon S3 upload object LastModified date keep changed? its confusing as they seem to be mentioning LastModified to be the time when upload finished.
Can anyone please confirm ?
Last-Modified is defined to be:
Object creation date or the last modified date, whichever is the latest.
Last modified is more like creation date, as mentioned in the docs:
Amazon S3 maintains only the last modified date for each object. For example, the Amazon S3 console shows the Last Modified date in the object Properties pane. When you initially create a new object, this date reflects the date the object is created. If you replace the object, the date changes accordingly. So when we use the term creation date, it is synonymous with the term last modified date.
It seems that it uses the value of the Date header as demonstrated in the PutObject example here - this will be, as you've seen, when the upload request was started and not when it finished.
Why S3 uses the Date header and not the timestamp of when the file has finished uploading is something internal to AWS AFAIK.
I have not seen the answer to the question, "why?" in the docs.

How should I append data in single object of s3 using multipart upload?

I have a task in which i want to upload some data in a single object of s3 continuously. I have created a lambda function in which I have called initiated upload, upload part (twice), and completeMultipartUpload. So when I test this lambda I get data in destination object as expected i.e.data from all source files without overwriting. But when I test lambda for second time the data gets overwrite whereas I want this data to get append at the end of destination object. How could I do this? Any idea?
You can't really append data to an s3 object, it's all or nothing really - you can read the object back, add the new data to the object, and then re-save it to S3 in order to simulate the append - but that would get slow if you are adding data often to large objects.

How to Upload/download to S3 without changing Last Modified date?

I want to upload and download files to S3 using boto3 without changing their "LastModified" date so I can keep tabs on the age of the contents. Whenever I upload or download a file it takes on the date of this operation and I lose the date that the contents were modified.
I'm looking at the timestamp of the files using
fileObj.get('LastModified')
where the fileObj is taken from a paginator result. I'm using the following command to upload
s3Client.upload_fileobj(data, bucket_name, destpath)
and the following to download the files:
s3Client.download_file(bucket_name, key, localPath)
How can I stop the last modified date changing?
This is not possible.
The Last Modified Date is generated by Amazon S3 and cannot be overridden.
If you wish to maintain your own timestamps, you could add some User-Define Metadata and set the value yourself.
If you replicate the content using the AWS replication tool from an existing bucket to another therefore the last modified date would also be replicated. It is not a copying action it is a cloning action.

How to change file upload date in Amazon S3 using AWS CLI

I need to move some files (thousands) to Amazon S3 bucket, from where they will be displayed to the end-user by another application (instead of the current one).
Problem is, that these files have creation/upload date now (dates very between 2012 and 2017, when they were uploaded to current application), and when I move them they all start to be of the same date. That is a problem because when you look at the files in the new application, you don't understand the time hierarchy which is sometimes very important.
Is there any way I can modify upload date of a file(s) in S3?
The Last Modification Date is generated by Amazon S3 and cannot be set via the API.
If dates and other information (eg user) are important to your application, you can store it as metadata on the object. Then, retrieve the metadata when displaying dates, user, etc.
What I did was renaming the file to something else and then renaming it again to its original name.
As you cannot rename directly, you have to copy the file to a new name, and then copy it back to its original name. (and delete the auxiliary file, of course)
It is not optimal, but that's the solution when using AWS client. I hope one day AWS will have all function the FTP used to have.
You can just copy over the same object and the timestamp will update.
This technique is also used to prolong the expire of an object in a bucket with a lifecycle rule.

S3 last-modified timestamp for eventually-consistent overwrite PUTs

The AWS S3 docs state that:
Amazon S3 offers eventual consistency for overwrite PUTS and DELETES in all regions.
http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel
The timespan until full consistency is reached can vary. During this period GET requests may return the previous object or the udpated object.
My question is:
When is the last-modified timestamp updated? Is it updated immediately after the overwrite PUT succeeds but before full consistency is reached, or is it only updated after full consistency is achieved?
I suspect the former but I can't find any documentation which clearly states this.
The Last-Modified timestamp should match the Date value returned in the response headers from the successful PUT request.
To my knowledge, this is not explicitly documented, but it can be derived from what is documented.
When you overwrite an object, it's not the overwriting itself that may be delayed by the eventual consistency model -- it's the availability of the overwritten content at a given S3 node (S3 is replicated to multiple nodes within the S3 region).
The Last-Modified timestamp, like the rest of the metadata, is established at the time of object creation and immutable, thereafter.
It is, in fact, not the "modification" time of the object at all, it is the creation time of the object. The explanation may sound pedantic, but it is accurate in the strictest sense: S3 objects and their metadata cannot in fact be modified at all, they can only be overwritten. When you "overwrite" an object in S3, what you are actually doing is creating a new object, reusing the old object's key (path+file name). The availability of this new object at a given S3 node (replication) is what may be delayed by the eventual consistency model... not the actual creation of the new object that overwrites the old one... hence there would be no reason for Last-Modified to be impacted by the replication delay (assuming there is a replication delay -- eventual consistency can at times be indistinguishable from immediate consistency).
This is something S3 does that is absolutely terrible.
Basically in Linux you have the mtime which is the time the file was last modified on the filesystem. Any S3 client could gather the mtime and set the Last-Modified time on S3 so that it would maintain when things were actually last modified.
Instead, Amazon just does this based on the object creation and this is effectively a massive problem if you ever just want to use the data as data outside of the original application that put it there.
So if you download a file from S3, your client would likely set the modified time and if it was uploaded to s3 immediately as it was created then you would at least have a near correct timestamp. But the reality is that you might take a picture and it might not get from your phone through the app, through the stack and to S3 for days!
This is not even considering re-uploading the file to s3. Which would compound the problem, as you might re-upload it years later. S3 will just act like Last-Modified is years later when the file was not actually modified.
They really need to allow you to set it, but they remain ambiguous and over-documented in other areas to make this hard to figure out.
https://github.com/s3tools/s3cmd/issues/524