The docs for the Java SDK for SimpleDB say the following regarding the getAttributes(GetAttributesRequest request) operation:
If the item does not exist on the replica that was accessed for this operation, an empty set is returned. The system does not return an error as it cannot guarantee the item does not exist on other replicas.
I understand that I can use GetAttributesRequest#setConsistentRead(true), but the docs for the operation don't mention that, and that comment above is worrying me. It seems to suggest that an item request will succeed, if you're lucky!
Is this just an omission from the docs? Am I guaranteed to get back an item using a getAttributes request if I explicitly ask for a consistent read? (provided, of course, that a previous operation successfully put/updated the item).
Related
Can we use DynamoDb optimistic locking with batchWriteItem request? AWS docs on Optimistic locking mention that a ConditionalCheckFailedException is thrown when the version value is different while updating the request. In case of batchWriteItem request, will the whole batch fail or only that record with a different version value? Will the record that failed due to different version value be returned as an unprocessed record?
You cannot. You can be sure by looking at the low level syntax and notice there’s no ability to specify a condition expression.
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html
Could not find a definitive answer here.
According to Amazon S3 docs, the caveat for read after write is if I got 404 for GET, then PUT a new object, then GET.
My question is, after I do GET a successful read,
does subsequent reads will be successful too?
Example:
GET key 404
PUT key 200
GET key 404 # because caveat
GET key 200
From now on, does any subsequent GET key is guaranteed to be successful?
The caveat AWS describes in the S3 documentation suggests that they use a caching layer on top of the database they use to store details of objects in S3 like it's key and meta data.
If you do a PUT for a object as first operation and a GET afterwards, there will be a cache miss for the GET operation so the caching layer will fetch information about this object from the database.
If you do a GET before the PUT the caching layer will query the database, will receive the information that this object doesn't exist and cache that information, even though after the PUT creates the mentioned object shortly after. So the GET after the PUT will receive the information that the object doesn't exist from the cache.
That's probably why this caveat exists. Unfortunately that doesn't answer your question, because we don't know how that caching layer works. If this layer uses shared state, then you should receive a 200 response for all requests, once you received one response with 200. My guess is that they don't use shared state for the caching layer, as that's easier to scale. Without shared state it depends on your luck, the time-to-live for items in the cache and if they employ some kind of cache invalidation for updated objects whether you receive a 200 or a 404 for requests even after the first successful 200 request.
Because the details of the inner workings of S3 are unknown I wouldn't rely on ubsequent calls to succeed, but my guess is that the probability of receiving a 404 after a successful 200 is rather low. In the end you have to decide based on your use case if and how it makes sense to account for this situation or not.
Updated answer
Snippets from the official AWS blog
S3 is Now Strongly Consistent
After that overly-long introduction, I
am ready to share some good news!
Effective immediately, all S3 GET, PUT, and LIST operations, as well
as operations that change object tags, ACLs, or metadata, are now
strongly consistent. What you write is what you will read, and the
results of a LIST will be an accurate reflection of what’s in the
bucket. This applies to all existing and new S3 objects, works in all
regions, and is available to you at no extra charge! There’s no impact
on performance, you can update an object hundreds of times per second
if you’d like, and there are no global dependencies.
We have a service which inserts into dynamodb certain values. For sake of this question let's say its key:value pair i.e., customer_id:customer_email. The inserts don't happen that frequently and once the inserts are done, that specific key doesn't get updated.
What we have done is create a client library which, provided with customer_id will fetch customer_email from dynamodb.
Given that customer_id data is static, what we were thinking is to add cache to the table but one thing which we are not sure that what will happen in the following use-case
client_1 uses our library to fetch customer_email for customer_id = 2.
The customer doesn't exist so API Gateway returns not found
APIGateway will cache this response
For any subsequent calls, this cached response will be sent
Now another system inserts customer_id = 2 with its email id. This system doesn't know if this response has been cached previously or not. It doesn't even know that any other system has fetched this specific data. How can we invalidate cache for this specific customer_id when it gets inserted into dynamodb
You can send a request to the API endpoint with a Cache-Control: max-age=0 header which will cause it to refresh.
This could open your application up to attack as a bad actor can simply flood an expensive endpoint with lots of traffic and buckle your servers/database. In order to safeguard against that it's best to use a signed request.
In case it's useful to people, here's .NET code to create the signed request:
https://gist.github.com/secretorange/905b4811300d7c96c71fa9c6d115ee24
We've built a Lambda which takes care of re-filling cache with updated results. It's a quite manual process, with very little re-usable code, but it works.
Lambda is triggered by the application itself following application needs. For example, in CRUD operations the Lambda is triggered upon successful execution of POST, PATCH and DELETE on a specific resource, in order to clear the general GET request (i.e. clear GET /books whenever POST /book succeeded).
Unfortunately, if you have a View with a server-side paginated table you are going to face all sorts of issues because invalidating /books is not enough since you actually may have /books?page=2, /books?page=3 and so on....a nightmare!
I believe APIG should allow for more granular control of cache entries, otherwise many use cases aren't covered. It would be enough if they would allow to choose a root cache group for each request, so that we could manage cache entries by group rather than by single request (which, imho, is also less common).
Did you look at this https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-caching.html ?
There is way to invalidate entire cache or a particular cache entry
I've been reading the docs regarding read-after-write consistency with AWS S3 but I'm still unsure about this.
If I write an object to S3 and after getting a successful response from my write operation, I immediately attempt to read it, is the read operation guaranteed to return the object?
In other words, is it possible that the read operation will fail because it can't find the object? Because the read happened too soon after the write?
I'm only talking about new PUTs here, not updates to existing objects.
Yes guaranteed to return the object (only for new objects) with one caveat:
As per AWS documentation:
Amazon S3 provides read-after-write consistency for PUTS of new
objects in your S3 bucket in all regions with one caveat. The caveat
is that if you make a HEAD or GET request to the key name (to find if
the object exists) before creating the object, Amazon S3 provides
eventual consistency for read-after-write.
Amazon S3 offers eventual consistency for overwrite PUTS and DELETES
in all regions.
EDIT: credits to #Michael - sqlbot, more on HEAD (or) GET caveat:
If you send a GET or HEAD before the object exists, such as to check whether there's an object there before you upload, then the upload is not immediately consistent for read requests even after the upload is complete, because S3 has already made the only immediately consistent internal query it's going to make for that object, discovering, authoritatively, that there's no such key. The object creation becomes eventually consistent, since the creation has to "overwrite" the previous lookup that found nothing.
Based on following table provided in the link, "consistent reads" will never be stale.
Above provided link has nice example regarding how "read-after-write consistency" & "eventual consistency" works.
I would like to add this caution note to this answer to make things more clear:
Amazon S3 achieves high availability by replicating data across multiple servers within Amazon's data centers. If a PUT request is successful, your data is safely stored. However, information about the changes must replicate across Amazon S3, which can take some time, and so you might observe the following behaviors:
A process writes a new object to Amazon S3 and immediately lists keys
within its bucket. Until the change is fully propagated, the object
might not appear in the list.
I have a large number of files that I am reading and writing to S3.
I am just wondering if I need to code for the case where a file is "half written" e.g. the S3 PUT / Write only "half" worked.
Or are writes to S3 all-or-nothing?
I know there is a read-write eventual consistency issue which (I think) is largely a separate issue.
See S3 PUT documentation:
Amazon S3 never adds partial objects; if you receive a success response, Amazon S3 added the entire object to the bucket.
For all regions except US Standard (us-east-1) you get read-after-write-consistency. This means that if you get an HTTP 200 OK for your PUT, you can read the object right away.
If your request is dropped in the middle, you would not get and HTTP 200 and your object would not be written at all.
UPDATE: All regions now support read-after-write consistency (thanks #jeff-loughridge):
https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-s3-introduces-new-usability-enhancements/
From the docs:
Updates to a single key are atomic. For example, if you PUT to an existing key from one thread and perform a GET on the same key from a second thread concurrently, you will get either the old data or the new data, but never partial or corrupt data.
This answer is somewhat similar to the existing ones, but it stresses on the fact that not only there is no risk of leaving a partially-written object behind, but also that a reader will never be put at risk of seeing (reading) a partially-written object.