Are writes to Amazon S3 atomic (all-or-nothing)? - amazon-web-services

I have a large number of files that I am reading and writing to S3.
I am just wondering if I need to code for the case where a file is "half written" e.g. the S3 PUT / Write only "half" worked.
Or are writes to S3 all-or-nothing?
I know there is a read-write eventual consistency issue which (I think) is largely a separate issue.

See S3 PUT documentation:
Amazon S3 never adds partial objects; if you receive a success response, Amazon S3 added the entire object to the bucket.

For all regions except US Standard (us-east-1) you get read-after-write-consistency. This means that if you get an HTTP 200 OK for your PUT, you can read the object right away.
If your request is dropped in the middle, you would not get and HTTP 200 and your object would not be written at all.
UPDATE: All regions now support read-after-write consistency (thanks #jeff-loughridge):
https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-s3-introduces-new-usability-enhancements/

From the docs:
Updates to a single key are atomic. For example, if you PUT to an existing key from one thread and perform a GET on the same key from a second thread concurrently, you will get either the old data or the new data, but never partial or corrupt data.
This answer is somewhat similar to the existing ones, but it stresses on the fact that not only there is no risk of leaving a partially-written object behind, but also that a reader will never be put at risk of seeing (reading) a partially-written object.

Related

S3 consistency for sucessful read after write

Could not find a definitive answer here.
According to Amazon S3 docs, the caveat for read after write is if I got 404 for GET, then PUT a new object, then GET.
My question is, after I do GET a successful read,
does subsequent reads will be successful too?
Example:
GET key 404
PUT key 200
GET key 404 # because caveat
GET key 200
From now on, does any subsequent GET key is guaranteed to be successful?
The caveat AWS describes in the S3 documentation suggests that they use a caching layer on top of the database they use to store details of objects in S3 like it's key and meta data.
If you do a PUT for a object as first operation and a GET afterwards, there will be a cache miss for the GET operation so the caching layer will fetch information about this object from the database.
If you do a GET before the PUT the caching layer will query the database, will receive the information that this object doesn't exist and cache that information, even though after the PUT creates the mentioned object shortly after. So the GET after the PUT will receive the information that the object doesn't exist from the cache.
That's probably why this caveat exists. Unfortunately that doesn't answer your question, because we don't know how that caching layer works. If this layer uses shared state, then you should receive a 200 response for all requests, once you received one response with 200. My guess is that they don't use shared state for the caching layer, as that's easier to scale. Without shared state it depends on your luck, the time-to-live for items in the cache and if they employ some kind of cache invalidation for updated objects whether you receive a 200 or a 404 for requests even after the first successful 200 request.
Because the details of the inner workings of S3 are unknown I wouldn't rely on ubsequent calls to succeed, but my guess is that the probability of receiving a 404 after a successful 200 is rather low. In the end you have to decide based on your use case if and how it makes sense to account for this situation or not.
Updated answer
Snippets from the official AWS blog
S3 is Now Strongly Consistent
After that overly-long introduction, I
am ready to share some good news!
Effective immediately, all S3 GET, PUT, and LIST operations, as well
as operations that change object tags, ACLs, or metadata, are now
strongly consistent. What you write is what you will read, and the
results of a LIST will be an accurate reflection of what’s in the
bucket. This applies to all existing and new S3 objects, works in all
regions, and is available to you at no extra charge! There’s no impact
on performance, you can update an object hundreds of times per second
if you’d like, and there are no global dependencies.

caveat of the read-after-write consistency for PUTS of new objects in a S3 bucket

From https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html :
Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.
I'm not sure if I understand the caveat correctly. Before creating the object: ok, I haven't yet created an object with the key K, therefore no object with the key K exists; I make a GET request to K... what does my request result to according to the explanation above?
I'm confused because the explanation tells about the eventual consistency for read-after-write. But there is no write so far.
Update 2020-12-02 This whole discussion is now outdated. Amazon S3 provides strong read-after-write consistency for PUTs and DELETEs of objects in your Amazon S3 bucket in all AWS Regions.
Update I rewrote the answer after reading a comment in this blog post.
I believe this caveat is talking about this scenario
client 1: GET key_a --> this could return an object even this request was sent earlier.
client 2: PUT key_a
This could be possible in case the request of client 1 reached later than the PUT request to a node.
This situation happens when you have a file to upload, but that file might already exist. So rather than overwrite the existing file, you do the following:
Try to GET the file. It doesn't exist, so you get a 404 with No such key
PUT the file.
Try to GET the file immediately afterward (for whatever reason).
In this sequence, step #3 may or may not return the file. Eventually you can retrieve the file, but how long that takes from the time of upload depends on the internals of S3 (I could speculate on why that happens, but it would only be speculation).

If an updated object in S3 serves as a lambda trigger, is there an inherent race condition?

If I update an object in an S3 Bucket, and trigger on that S3 PUT event as my Lambda trigger, is there a chance that the Lambda could operate on the older version of that object given S3’s eventual consistency model?
I’m having a devil of a time parsing out an authoritative answer either way...
Yes, there is a possibility that a blind GET of an object could fetch a former version.
There are at least two solutions that come to mind.
Weak: the notification event data contains the etag of the newly-uploaded object. If the object you fetch doesn't have this same etag in its response headers, then you know it isn't the intended object.
Strong: enable versioning on the bucket. The event data then contains the object versionId. When you download the object from S3, specify this exact version in the request. The consistency model is not as well documented when you overwrite an object and then download it with a specific version-id, so it is possible that this might result in an occasional 404 -- in which case, you almost certainly just spared yourself from fetching the old object -- but you can at least be confident that S3 will never give you a version other than the one explicitly specified.
If you weren't already using versioning on the bucket, you'll want to consider whether to keep old versions around, or whether to create a lifecycle policy to purge them... but one brilliantly-engineered feature about versioning is that the parts of your code that were written without awareness of versioning should still function correctly with versioning enabled -- if you send non-versioning-aware requests to S3, it still does exactly the right thing... for example, if you delete an object without specifying a version-id and later try to GET the object without specifying a version-id, S3 will correctly respond with a 404, even though the "deleted" version is actually still in the bucket.
How does the file get there in the first place? I'm asking, because if you could reverse the order, it'd solve your issue as you put your file in s3 via a lambda that before overwriting the file, can first get the existing version from the bucket and do whatever you need.

Is AWS S3 read guaranteed to return a newly created object?

I've been reading the docs regarding read-after-write consistency with AWS S3 but I'm still unsure about this.
If I write an object to S3 and after getting a successful response from my write operation, I immediately attempt to read it, is the read operation guaranteed to return the object?
In other words, is it possible that the read operation will fail because it can't find the object? Because the read happened too soon after the write?
I'm only talking about new PUTs here, not updates to existing objects.
Yes guaranteed to return the object (only for new objects) with one caveat:
As per AWS documentation:
Amazon S3 provides read-after-write consistency for PUTS of new
objects in your S3 bucket in all regions with one caveat. The caveat
is that if you make a HEAD or GET request to the key name (to find if
the object exists) before creating the object, Amazon S3 provides
eventual consistency for read-after-write.
Amazon S3 offers eventual consistency for overwrite PUTS and DELETES
in all regions.
EDIT: credits to #Michael - sqlbot, more on HEAD (or) GET caveat:
If you send a GET or HEAD before the object exists, such as to check whether there's an object there before you upload, then the upload is not immediately consistent for read requests even after the upload is complete, because S3 has already made the only immediately consistent internal query it's going to make for that object, discovering, authoritatively, that there's no such key. The object creation becomes eventually consistent, since the creation has to "overwrite" the previous lookup that found nothing.
Based on following table provided in the link, "consistent reads" will never be stale.
Above provided link has nice example regarding how "read-after-write consistency" & "eventual consistency" works.
I would like to add this caution note to this answer to make things more clear:
Amazon S3 achieves high availability by replicating data across multiple servers within Amazon's data centers. If a PUT request is successful, your data is safely stored. However, information about the changes must replicate across Amazon S3, which can take some time, and so you might observe the following behaviors:
A process writes a new object to Amazon S3 and immediately lists keys
within its bucket. Until the change is fully propagated, the object
might not appear in the list.

S3 last-modified timestamp for eventually-consistent overwrite PUTs

The AWS S3 docs state that:
Amazon S3 offers eventual consistency for overwrite PUTS and DELETES in all regions.
http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel
The timespan until full consistency is reached can vary. During this period GET requests may return the previous object or the udpated object.
My question is:
When is the last-modified timestamp updated? Is it updated immediately after the overwrite PUT succeeds but before full consistency is reached, or is it only updated after full consistency is achieved?
I suspect the former but I can't find any documentation which clearly states this.
The Last-Modified timestamp should match the Date value returned in the response headers from the successful PUT request.
To my knowledge, this is not explicitly documented, but it can be derived from what is documented.
When you overwrite an object, it's not the overwriting itself that may be delayed by the eventual consistency model -- it's the availability of the overwritten content at a given S3 node (S3 is replicated to multiple nodes within the S3 region).
The Last-Modified timestamp, like the rest of the metadata, is established at the time of object creation and immutable, thereafter.
It is, in fact, not the "modification" time of the object at all, it is the creation time of the object. The explanation may sound pedantic, but it is accurate in the strictest sense: S3 objects and their metadata cannot in fact be modified at all, they can only be overwritten. When you "overwrite" an object in S3, what you are actually doing is creating a new object, reusing the old object's key (path+file name). The availability of this new object at a given S3 node (replication) is what may be delayed by the eventual consistency model... not the actual creation of the new object that overwrites the old one... hence there would be no reason for Last-Modified to be impacted by the replication delay (assuming there is a replication delay -- eventual consistency can at times be indistinguishable from immediate consistency).
This is something S3 does that is absolutely terrible.
Basically in Linux you have the mtime which is the time the file was last modified on the filesystem. Any S3 client could gather the mtime and set the Last-Modified time on S3 so that it would maintain when things were actually last modified.
Instead, Amazon just does this based on the object creation and this is effectively a massive problem if you ever just want to use the data as data outside of the original application that put it there.
So if you download a file from S3, your client would likely set the modified time and if it was uploaded to s3 immediately as it was created then you would at least have a near correct timestamp. But the reality is that you might take a picture and it might not get from your phone through the app, through the stack and to S3 for days!
This is not even considering re-uploading the file to s3. Which would compound the problem, as you might re-upload it years later. S3 will just act like Last-Modified is years later when the file was not actually modified.
They really need to allow you to set it, but they remain ambiguous and over-documented in other areas to make this hard to figure out.
https://github.com/s3tools/s3cmd/issues/524