How to Upload/download to S3 without changing Last Modified date? - amazon-web-services

I want to upload and download files to S3 using boto3 without changing their "LastModified" date so I can keep tabs on the age of the contents. Whenever I upload or download a file it takes on the date of this operation and I lose the date that the contents were modified.
I'm looking at the timestamp of the files using
fileObj.get('LastModified')
where the fileObj is taken from a paginator result. I'm using the following command to upload
s3Client.upload_fileobj(data, bucket_name, destpath)
and the following to download the files:
s3Client.download_file(bucket_name, key, localPath)
How can I stop the last modified date changing?

This is not possible.
The Last Modified Date is generated by Amazon S3 and cannot be overridden.
If you wish to maintain your own timestamps, you could add some User-Define Metadata and set the value yourself.

If you replicate the content using the AWS replication tool from an existing bucket to another therefore the last modified date would also be replicated. It is not a copying action it is a cloning action.

Related

Amazon S3 LastModified time vs Upload complete time

I am a bit confused on meaning of LastModified time in S3.
Suppose I start upload of a large file at 10:00 AM and say upload takes 4 minutes. I am seeing that instead of showing LastModified time as 10:04 AM its showing the same as 10:00 AM, i.e. when I initiated the upload.
In Azure Blob Storage however lastModified time however seems to be the time when upload completed.
Am I interpreting this incorrectly for S3 ? I mean how can we have lastModified time as the time when upload starts because technically object is not created until all bytes are uploaded, right ?
Looking at answers like: amazon S3 upload object LastModified date keep changed? its confusing as they seem to be mentioning LastModified to be the time when upload finished.
Can anyone please confirm ?
Last-Modified is defined to be:
Object creation date or the last modified date, whichever is the latest.
Last modified is more like creation date, as mentioned in the docs:
Amazon S3 maintains only the last modified date for each object. For example, the Amazon S3 console shows the Last Modified date in the object Properties pane. When you initially create a new object, this date reflects the date the object is created. If you replace the object, the date changes accordingly. So when we use the term creation date, it is synonymous with the term last modified date.
It seems that it uses the value of the Date header as demonstrated in the PutObject example here - this will be, as you've seen, when the upload request was started and not when it finished.
Why S3 uses the Date header and not the timestamp of when the file has finished uploading is something internal to AWS AFAIK.
I have not seen the answer to the question, "why?" in the docs.

amazon S3 upload object LastModified date keep changed?

We know that if we download a large file in linux or mac, the file last modified time will keep changed. Is that same in S3? The object last modified time will keep changed during uploading, or it just a simple timestamp to record the start of upload operation?
Doc says After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata.,i believe in order to maintain atomicity, if the put operation is successful then only it will update time.
Last-Modified,comes under category of system-defined metadata.
Last-Modified-> Description-> Object creation date or the last modified date, whichever is the latest.which a successful put operation will only update lastmodified time in other words.
modified date/time is updated by the S3 system itself, and reflects the time when the file completed uploading fully to S3 (S3 will not show incomplete transfers.)
The last modified date of an object is a direct reflection of when the object was last put into S3.
even a similar answer says the same https://stackoverflow.com/a/40699793/13126651 - "The Last-Modified timestamp should match the Date value returned in the response headers from the successful PUT request."

is there any way to setup s3 bucket to get append to the existing object for each run?

We have a requirement to append to the existing S3 object, when we run the spark application every hour. I have tried this code:
df.coalesce(1).write.partitionBy("name").mode("append").option("compression", "gzip").parquet("s3n://path")
This application is creating new parquet files for every run. Hence, I am looking for a workaround to achieve this requirement.
Question is:
How we can configure the S3 bucket to get append to the existing object?
It is not possible to append to objects in Amazon S3. They can be overwritten, but not appended.
There is apparently a sneaky method where a file can be multi-part copied, with the 'source' set to the file and then set to some additional data. However, that cannot be accomplished in the method you show.
If you wish to add additional data to an External Table (eg used by EMR or Athena), then simply add an additional file in the correct folder for the desired partition.

How to change file upload date in Amazon S3 using AWS CLI

I need to move some files (thousands) to Amazon S3 bucket, from where they will be displayed to the end-user by another application (instead of the current one).
Problem is, that these files have creation/upload date now (dates very between 2012 and 2017, when they were uploaded to current application), and when I move them they all start to be of the same date. That is a problem because when you look at the files in the new application, you don't understand the time hierarchy which is sometimes very important.
Is there any way I can modify upload date of a file(s) in S3?
The Last Modification Date is generated by Amazon S3 and cannot be set via the API.
If dates and other information (eg user) are important to your application, you can store it as metadata on the object. Then, retrieve the metadata when displaying dates, user, etc.
What I did was renaming the file to something else and then renaming it again to its original name.
As you cannot rename directly, you have to copy the file to a new name, and then copy it back to its original name. (and delete the auxiliary file, of course)
It is not optimal, but that's the solution when using AWS client. I hope one day AWS will have all function the FTP used to have.
You can just copy over the same object and the timestamp will update.
This technique is also used to prolong the expire of an object in a bucket with a lifecycle rule.

S3 bucket script to add timestamp in filename on upload

I'm looking for a way to add a timestamp in every file that is uploaded to an S3 bucket, Amazon-side. There is, of course, an option to do this client-side before the upload, but I don't think this is as nice and clean as it would be to have some script to run in the bucket itself everytime a new file is uploaded. I didn't find anything in the docs, though.
There is no capability within Amazon S3 to change the Key (filename) of a file based upon upload time.
Given that your desire is to avoid name conflicts, some choices are:
Use a unique GUID or a timestamp to name the file when uploading. This will avoid naming conflicts.
Upload the file to Bucket A, then use a Lambda function triggered on ObjectCreation to copy the object to Bucket B with a unique name based on timestamp
You can try with a lambda function handling the ObjectCreated event. See this tutorial.
Not sure that works though.