Amazon S3: how parallel PUTs to the same key are resolved in versioned buckets - amazon-web-services

Amazon S3 data consistency model contains following note:
Amazon S3 does not currently support object locking for concurrent updates. If two PUT requests are simultaneously made to the same key, the request with the latest timestamp wins. If this is an issue, you will need to build an object-locking mechanism into your application.
For my application I am considering resolving conflicts caused by concurrent writes by analyzing all object versions on the client side in order to assemble a single composite object, hopefully reconciling the two changes.
However, I am not able to find a definitive answer to how the part in bold plays out in versioned buckets.
Specifically, after reading how Object Versioning works, it appears that S3 will create new object version with a unique version Id on each PUT right away, after which I assume that data replication will kick in and S3 will have to determine which of the two concurrent writes to retain.
So the questions that I have on this part are:
Will S3 keep track of a separate version for each of the two concurrent writes for the object?
Will I see both versions when querying list of versions for the object using API, with one of them being arbitrarily marked as current?

I found this question while debugging our own issue caused by concurrent PUTs to the same key. Returning to fulfill my obligations per [1].
Rapid concurrent PUTs to the same S3 key do sometimes collide, even with versioning enabled. In such cases S3 will return a 503 for one of the requests (the one with the oldest timestamp, per the doc snippet you pasted above).
Here's a blob from S3's engineering team, passed onto us by our business support contact:
While S3 supports request rates of up to 3500 REST.PUT.OBJECT requests per second to a single partition, in some rare scenarios, rapid concurrent REST.PUT.OBJECT requests to the same key may result in a 503 response. In such cases, further partitioning also does not help because the requests for the same key will land on the same partition. In your case, we looked at the reason your request received a 503 response and determined that it was because there was a concurrent REST.PUT.OBJECT request for the same key that our system was processing. In such cases, retrying the failed request will most likely result in success.
Your definition of rare may vary. We were using 4 threads and seeing a 503 for 1.5 out of every 10 requests.
The snippet you quoted is the only reference I can find in the docs to this, and it doesn't explicitly mention 503s (which are usually rate limiting due to >3500 requests).
When the requests don't collide and return a 503, it works how you would expect (new version per request, most recent request timestamp is the most recent version).
Hopefully this post will help someone with the same issue in the future.
[1] https://xkcd.com/979/

Related

How do you ensure it does work with google cloud pub/sub?

I am currently working on a distributed crawling service. When making this, I have a few issues that need to be addressed.
First, let's explain how the crawler works and the problems that need to be solved.
The crawler needs to save all posts on each and every bulletin board on a particular site.
To do this, it automatically discovers crawling targets and publishes several messages to pub/sub. The message is:
{
"boardName": "test",
"targetDate": "2020-01-05"
}
When the corresponding message is issued, the cloud run function is triggered, and the data corresponding to the given json is crawled.
However, if the same duplicate message is published, duplicate data occurs because the same data is crawled. How can I ignore the rest when the same message comes in?
Also, are there pub/sub or other good features I can refer to for a stable implementation of a distributed crawler?
because PubSub is, by default, designed to deliver AT LEAST one time the messages, it's better to have idempotent processing. (Exact one delivery is coming)
Anyway, your issue is very similar: twice the same message or 2 different messages with the same content will cause the same issue. There is no magic feature in PubSub for that. You need an external tool, like a database, to store the already received information.
Firestore/datastore is a good and serverless place for that. If you need low latency, Memory store and it's in memory database is the fastest.

Querying AWS SQS queue URL using Java SDK

Most operations of the AWS Java SDK for SQS require a queue url.
Given a queue name, the queue url can be queried using the GetQueueUrl operation.
Does the AmazonSQS client automatically cache the result of this operation, or is it up to the application to cache the queue url to avoid repeated queries?
If we look in the AWS Java SDK code on GitHub, we see that getQueueUrl() triggers the usual client preparation hooks (which doesn't appear to include caching), and then immediately jumps to executeGetQueueUrl() which makes the request, also without caching. Interestingly, there does appear to be a URI cachedEndpoint = null; that doesn't appear to be used anywhere (maybe I'm missing something?).
Taking a step back, this makes sense. Auto-caching the response on the SDK could be dangerous for applications using it, so the decision to cache or not cache is left to the application logic where it belongs. So, if you need to cache the responses, it's up to you to decide how long you want to cache it and where/how to store it.

How to explain variation between cloudfront update times?

I know cloudfront updates it's servers ~24 hours [source].
My question is why does sometimes it take less than 24 hours? Like sometimes I update s3 and bam the new content is available from XXdomain.com immediately. Other times it seems like it takes the full 24 hours.
How can anyone explain the variation? Why does it seem like a non-standard amount of time to update?
It depends upon whether the request is cached or not. If a request did not a POP (Point of Presence) it need to fetch from origin. If that is the first request, then it will need to contact the origin and serve whatever is there.
In other cases, if already cached, then it will serve whatever is cached. We don't modify contents for longer time. If you purge, usually it takes longer time like 24 hours varies based on the pop location, network availability, cacheable size for a domain, etc.,
You can use either cache headers or set you cache configuration to your desired time.
Hope it helps.
I know cloudfront updates it's servers ~24 hours
That isn't really an accurate description of what happens.
More correctly, we can say that by default, an object cached in a CloudFront edge will be evicted after 24 hours.
There is no active update process. The CloudFront cache is a passive, pull-through cache. When a request arrives, it is served from cache if a cached copy is available and not stale, otherwise a request is sent to the origin server, the object is stored in the cache, and returned to the viewer.
If the cache does not contain a fresh copy of the object, it fetches it immediately from the origin upon request. Thus, the timing of requests made by you and others will determine how frequently it appears that CloudFront is "updating," even though "updating" isn't really an accurate term for what is occurring.
The CloudFront cache is also not monolithic. If you are in the Eastern U.S. and your user base is in Western Europe, you would potentially see the update sooner, because the edge that is handling your request handles less traffic and is this less likely to have a handled a recent request and have a cached copy available.
After updating your content in S3, create a CloudFront invalidation request for /*. This marks everything cached previously as expired so that all subsequent requests will be sent to the origin server and all viewers will see fresh content. Each AWS account can create 1,000 invalidation requests per month (across all distributions combined), at no cost.

Does Terraform offer strong consistency with S3 and DynamoDB?

Terraform offers a few different backend types for saving its state. AWS S3 is probably the most popular one, but it only offers eventual read-after-write consistency for overriding objects. This means that when two people apply a terraform change at approx. the same time, they might create a resource twice or get errors because a resource was deleted in the meantime.
Does Terraform solve that using DynamoDB? WRITES in DynamoDB are strongly consistent. READS, by default, are only eventually consistent, though.
So the question is whether there is strong consistency when working with S3 as a backend for Terraform.
tl;dr: Using DynamoDB to lock state provides a guarantee of strongly consistent reads or at least erroring if the read is not consistent. Without state locking you have a chance of eventual consistency biting you but it's unlikely.
Terraform doesn't currently offer DynamoDB as an option for remote state backends.
When using the S3 backend it does allow for using DynamoDB to lock the state so that multiple apply operations cannot happen concurrently. Because the lock is naively attempted as a put with a condition that that the lock doesn't already exist this gives you the strongly consistent action you need to make sure that it won't write twice (while also avoiding a race condition from making a read of the table followed by the write).
Because you can't run a plan/apply while a lock is in place this allows the first apply in a chain to complete before the second one is allowed to read the state. The lock table also holds an MD5 digest of the state file so if during plan time the state hasn't been updated it won't match the MD5 digest and so will fail hard with the following error:
Error refreshing state: state data in S3 does not have the expected content.
This may be caused by unusually long delays in S3 processing a previous state
update. Please wait for a minute or two and try again. If this problem
persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
to manually verify the remote state and update the Digest value stored in the
DynamoDB table to the following value: 9081e134e40219d67f4c63f4fef9c875
If, for some reason, you aren't using state locking then Terraform does read back the state from S3 to check that it's what it expects it is (and currently retries every 2 seconds for 10 seconds until they match or fails if that timeout is exceeded) but I think that it is still technically possible in an eventually consistent system for a read to show the update only for a second read to not show the update when it hits another node. In my experience this certainly happens in IAM which is a global service with eventual consistency, leading to much slower eventual consistency times.
All that said I have never seen any issues caused by the eventual consistency on the S3 buckets and would expect to see lots of orphaned resources because of things like that, particularly in a previous job where we were executing huge amounts of Terraform jobs concurrently and on a tight schedule.
If you wanted to be more certain of this you could probably test this by having Terraform create an object with a key of a UUID/timestamp that Terraform generates so that every apply will delete the old object and create a new one and then run that in a tight loop, checking the amount of objects in the bucket and exiting if you ever have 2 objects in the bucket.

Log delay in Amazon S3

I have recently hosted in Amazon S3, and I need the log files to calculate the statistics for the "get", "put", "list" operations in the objects.
And I've observed that the log files are organized weirdly. I don't know when the log will appear(not immediatly, at least 20 minutes after the operation) and how many lines of logs will be contained in one log file.
After that, I need to download these log files and analyse them. But I can't figure out how often I will do this.
Can somebody help? Thanks.
What you describe (log files being made available with delays and being in unpredictable order) is exactly what is declared by AWS as behaviour to expect. This is by nature of distributed system, AWS S3 is using to provide S3 service, the same request may be served each time from different server - I have seen 5 different IP addresses being provided for publishing.
So the only solution is: accept the delay, see the delay you experience and add some extra time and learn living with this total delay (I would expect something like 30 to 60 minutes, but statistics could tell more).
If you need log records ordered, you have either sort them yourself, or search for some log processing solutions - I have seen some applications being offered exactly for this purpose.
In case, you really need to get your log file with very short delay, you have to make the logs yourself and this means, you have to write and run some frontend, which gives access to your files on S3 and at the same time keeps logging as needed.
I run such a solution, users get user name and password and url of my frontend. As they send the request, I evaluate, if they provide proper credentials and if they are allowed to see given resource, and if so, I create few minutes valid temporary url for that resource and redirect the request to that.
But such a fronted costs money (you have to run your frontend somewhere) and is less robust, then accessing directly the AWS S3.
Good luck, Lulu.
A lot has changed since the time that the question was originally posted. The delay is still there, but one of OP concerns was when to download the logs to analyze them.
One option right now would be to leverage Event Notifications: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/setup-event-notification-destination.html
This way, whenever an object is created in the access logs bucket, you can trigger a notification either to SNS, SQS or Lamba, and based on that download and analyze the log files.