Does Terraform offer strong consistency with S3 and DynamoDB? - amazon-web-services

Terraform offers a few different backend types for saving its state. AWS S3 is probably the most popular one, but it only offers eventual read-after-write consistency for overriding objects. This means that when two people apply a terraform change at approx. the same time, they might create a resource twice or get errors because a resource was deleted in the meantime.
Does Terraform solve that using DynamoDB? WRITES in DynamoDB are strongly consistent. READS, by default, are only eventually consistent, though.
So the question is whether there is strong consistency when working with S3 as a backend for Terraform.

tl;dr: Using DynamoDB to lock state provides a guarantee of strongly consistent reads or at least erroring if the read is not consistent. Without state locking you have a chance of eventual consistency biting you but it's unlikely.
Terraform doesn't currently offer DynamoDB as an option for remote state backends.
When using the S3 backend it does allow for using DynamoDB to lock the state so that multiple apply operations cannot happen concurrently. Because the lock is naively attempted as a put with a condition that that the lock doesn't already exist this gives you the strongly consistent action you need to make sure that it won't write twice (while also avoiding a race condition from making a read of the table followed by the write).
Because you can't run a plan/apply while a lock is in place this allows the first apply in a chain to complete before the second one is allowed to read the state. The lock table also holds an MD5 digest of the state file so if during plan time the state hasn't been updated it won't match the MD5 digest and so will fail hard with the following error:
Error refreshing state: state data in S3 does not have the expected content.
This may be caused by unusually long delays in S3 processing a previous state
update. Please wait for a minute or two and try again. If this problem
persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
to manually verify the remote state and update the Digest value stored in the
DynamoDB table to the following value: 9081e134e40219d67f4c63f4fef9c875
If, for some reason, you aren't using state locking then Terraform does read back the state from S3 to check that it's what it expects it is (and currently retries every 2 seconds for 10 seconds until they match or fails if that timeout is exceeded) but I think that it is still technically possible in an eventually consistent system for a read to show the update only for a second read to not show the update when it hits another node. In my experience this certainly happens in IAM which is a global service with eventual consistency, leading to much slower eventual consistency times.
All that said I have never seen any issues caused by the eventual consistency on the S3 buckets and would expect to see lots of orphaned resources because of things like that, particularly in a previous job where we were executing huge amounts of Terraform jobs concurrently and on a tight schedule.
If you wanted to be more certain of this you could probably test this by having Terraform create an object with a key of a UUID/timestamp that Terraform generates so that every apply will delete the old object and create a new one and then run that in a tight loop, checking the amount of objects in the bucket and exiting if you ever have 2 objects in the bucket.

Related

AWS Lambda - Store state of a queue

I'm currently tasked with building a serverless architecture for communication between government agencies and citizens, and a main component is some form of queue that contains some form of object/pointer to each citizens request, sorted by priority. The government workers can then process an element when available. As Lambda is stateless, I need to save the queue outside in some manner.
For saving state I've gathered that you can use DynamoDB or S3 Buckets and use event triggers to invoke related Lambda methods. Some also suggest using Parameter Store to save some state variables. Storing things globally has also come up, though as you can't guarantee that the Lambda doesn't terminate, it doesn't seem like a good idea.
Finally, I've also read a bit about SQS, though I have no idea if it is at all applicable to this case.
What is the best-practice / suggested approach when working with Lambda in this way? I'm leaning towards S3 Buckets, due to event triggering, and not using DynamoDB as our DB.
Storing things globally has also come up, though as you can't guarantee that the Lambda doesn't terminate, it doesn't seem like a good idea.
Correct -- this is not viable at all. Note that what you are actually referring to when you say "the Lambda" is the process inside the container... and any time your Lambda function is handling more than one invocation concurrently, you are guaranteed that they will not be running in the same container -- so "global" variables are only useful for optimization, not state. Any two concurrent invocations of the same function have two entirely different global environments.
Forgetting all about Lambda for a moment -- I am not saying don't use Lambda; I'm saying that whether or not you use Lambda isn't relevant to the rest of what is written, below -- I would suggest that parallel/concurrent actions in general are perhaps one of the most important factors that many developers tend to overlook when trying to design something like you are describing.
How you will assign work from this work "queue" is extremely important to consider. You can't just "find the next item" and display it to a worker.
You must have a way to do all of these things:
finding the next item that appears to be available
verify that it is indeed available
assign it to a specific worker
mark it as unavailable for assignment
Not only that, but you have to be able to do all of these things atomically -- as a single logical action -- and without collisions.
A naïve implementation runs the risk of assigning the same work item to two or more people, with the first assignment being blindly and silently overwritten by subsequent assignments that happen at almost the same time.
DynamoDB allows conditional updates -- update a record if and only if a certain condition is true. This is a critical piece of functionality that your solution needs to accommodate -- for example, assign work item x to user y if and only if item x is currently unassigned. A conditional update will fail, and changes nothing, if the condition is not true at the instant the update happens and therein lies the power of the feature.
S3 does not support conditional updates, because unlike DynamoDB, S3 operates only on an eventual-consistency model in most cases. After an object in S3 is updated or deleted, there is no guarantee that the next request to S3 will return the most recent version or that S3 will not return an item that has recently been deleted. This is not a defect in S3 -- it's an optimization -- but it makes S3 unsuited to the "work queue" aspect.
Skip this consideration and you will have a system that appears to work, and works correctly much of the time... but at other times, it "mysteriously" behaves wrongly.
Of course, if your work items have accompanying documents (scanned images, PDF, etc.), it's quite correct to store them in S3... but S3 is the wrong tool for storing "state." SSM Parameter Store is the wrong tool, for the same reason -- there is no way for two actions to work cooperatively when they both need to modify the "state" at the same time.
"Event triggers" are useful, of course, but from your description, the most notable "event" is not from the data, or the creation of the work item, but rather it is when the worker says "I'm ready for my next work item." It is at that point -- triggered by the web site/application code -- when the steps above are executed to select an item and assign it to a worker. (In practice, this could be browser → API Gateway → Lambda). From your description, there may be no need for the creation of a new work item to trigger an "event," or if there is, it is not the most significant among the events.
You will need a proper database for this. DynamoDB is a candidate, as is RDS.
The queues provided by SQS are designed to decouple two parts of your application -- when two processes run at different speeds, SQS is used as a buffer, allowing X to safely store the work needing to be done and then continue with something else until Y is able to do the work. SQS queues are opaque -- you can't introspect what's in the queue, you just take the next message and are responsible for handling it. On its face, that seems to partially describe what you need, but it is not a clean match for this use case. Queues are limited in how long messages can be retained, and once a message is successfully processed, it is completely gone.
Note also that SQS is only a match to your use case with the FIFO queue feature enabled, which guarantees perfect in-order delivery and exactly-once delivery -- standard SQS queues, for performance optimization reasons, do not guarantee perfect in-order delivery and may under certain conditions deliver the same message more than once, to the same consumer or a different consumer. But the SQS FIFO queue feature does not coexist with event triggers, which require standard queues.
So SQS may have a role, but you need an authoritative database to store the work and the results of the business process.
If you need to store the message, then SQS is not the best tool here, because your Lambda function would then need to process the message and finally store it somewhere, making SQS nothing but a broker.
The S3 approach gives what you need out of the box, considering you can store the files (messages) in an S3 bucket and then have one Lambda consume its event. Your Lambda would then process this event and the file would remain safe and sound on S3.
If you eventually need multiple consumers for this message, then you can send the S3 event to SNS instead and finally you could subscribe N Lambda Functions to a given SNS topic.
You appear to be worrying too much about the infrastructure at this stage and not enough on the application design. The fact that it will be serverless does not change the basic functionality of the application — it will still present a UI to users, they will still choose options that must trigger some business logic and information will still be stored in a database.
The queue you describe is merely a datastore of messages that are in a particular state. The application will have some form of business logic for determining the next message to handle, which could be based on creation timestamp, priority, location, category, user (eg VIP users who get faster response), specialization of the staff member asking for the next message, etc. This is not a "queue" but rather a calculation to be performed against all 'unresolved' messages to determine the next message to assign.
If you wish to go serverless, then the back-end will certainly be using Lambda and a database (eg DynamoDB or Amazon RDS). The application should store everything in the database so that data is available for the application's business logic. There is no need to use SQS since there really isn't a "queue", and Parameter Store is merely a way of sharing parameters amongst application components — it is not meant for core data storage.
Determine the application functionality first, then determine the appropriate architecture to make it happen.

How to get CloudFormation to respect Kinesis simultaneous stream creation limits

I have a CloudFormation stack that contains multiple Kinesis streams. If the stream count is less than 5 during creation or update, no problems. If I have more than 5 an error occurs and the whole stack is rolled back.
The issue is compounded by streams in the template being added dynamically from config files, so order is not deterministic.
Is there a way to use wait conditions to say only do 5 of these at a time? Even this I think will be an issue because I won't know of streams that are being deleted.
OR is there some way to have CloudFormation back off a creation attempt, wait and try again without ROLLBACK on the whole stack?
WaitConditions aren't really designed for this. They are more for setting up servers that can ping when they're done.
There is no creation strategy for streams at this time.
According to the AWS response in this thread the only way is to build up a dependsOn chain. They suggest batching but I had to do a linked list since I wouldn't know what other stacks are up to. Still not full proof but won't have more than 5 stacks building at once.

AWS services appropriate for concurrent access to a resource

I'm designing a system where a cluster of EC2 instances do some computing and then update a large file continually. What would be ideal is if I could have the file in S3, and have all the instances take turns writing to it one at a time, performing calculations while they wait.
As it stands if 2 instances PUT to S3 at the same time, 1 will simply override the other.
How can I solve this concurrency issue?
AWS has a preview service called EFS (http://aws.amazon.com/documentation/efs/) that is an NFS4 that can be shared among EC2 instances. But such service alone does not solve your problem as you may still have concurrency issues. Consider having something more sophisticated such as exploiting "embarrassingly parallel processing" such as having N processes creating N file chunks and finally having a single file joining all pieces together when everything is done.
As it is Amazon states that if you receive a success code then your S3 object is committed. Amazon also adds that there wouldn't be any dirty writes or overlapping inconsistency - you would read either of a fully committed write.
If you need more control you might be able to do it application like implementing a critical section.
It certainly makes sense to enable versioning the bucket so that you get to maintain all the writes and later you can specify which version as the latest.
You can also leverage the life cycle rules delete ( keep deleting ) the last n version to save cost.

How long should I wait after applying an AWS IAM policy before it is valid?

I'm adding and removing AWS IAM user policies programmatically, and I'm getting inconsistent results from the application of those policies.
For example, this may or may not succeed (I'm using the Java 1.6.6 SDK):
Start with a user that can read from a particular bucket
Clear user policies (list policies then call "deleteUserPolicy" for each one)
Wait until the user has no user policies (call "listUserPolicies" until it returns an empty set)
Attempt to read from the bucket (this should fail)
If I put in a breakpoint between #3 and #4 and wait a few seconds, the user cannot read from the bucket, which is what I expect. If I remove breakpoints, the user can read from the bucket, which is wrong.
(This is also inconsistent when I add a policy then access a resource)
I'd like to know when a policy change has had an effect on the component (S3, SQS, etc), not just on the IAM system. Is there any way to get a receipt or acknowledgement from this? Or maybe there is a certain amount of time to wait?
Is there any documentation on the internals of policy application?
(FYI I've copied my question from https://forums.aws.amazon.com/thread.jspa?threadID=140383&tstart=0)
The phrase "almost immediately" is used 5 times in the IAM FAQ, and is, of course, somewhat subjective.
Since AWS is a globally-distributed system, your changes have to propagate, and the system as a whole seems to be designed to favor availability and partition tolerance as opposed to immediate consistency.
I don't know whether you've considered it, but it's entirely within the bounds of possibility that you might actually, at step 4 in your flow, see a sequence of pass, fail, pass, pass, fail, fail, fail, fail... because neither a bucket nor an object in a bucket are actually a single thing in a single place, as evidenced by the mixed consistency model of different actions in S3, where new objects are immedately-consistent while overwrites and deletes are eventually consistent... so the concept of a policy having "had an effect" or not on the bucket or an object isn't an entirely meaningful concept since the application of the policy is, itself, almost certainly, a distributed event.
To confirm such an application of policies would require AWS to expose the capability of (at least indirectly) interrogating every entity that has a replicated copy of that policy to see whether it had the current version or not... which would be potentially impractical or unwieldy to say the least in a system as massive as S3, which has grown beyond a staggering 2 trillion objects, and serves peak loads in excess of 1.1 million requests per second.
Official AWS answers to this forum post provide more information:
While changes you make to IAM entities are reflected in the IAM APIs immediately, it can take noticeable time for the information to be reflected globally. In most cases, changes you make are reflected in less than a minute. Network conditions may sometimes increase the delay, and some services may cache certain non-credential information which takes time expire and be replaced.
The accompanying answer to what to do in the mean time was "try again."
We recommend a retry loop after a slight initial delay, since in most circumstances you'll see your changes reflected quite quickly. If you sleep, your code will be waiting far too long in most cases, and possibly not long enough for the rare exceptions.
We actively monitor the performance of the replication system. But like S3, we guarantee only eventual consistency, not any particular upper bound.
I have a far less scientific answer here... but I think it will help some other people feel less insane :). I kept thinking things were not working while they were just taking more time than I expected.
Last night I was adding an inline policy to allow a host to get parameters from the system manager. I thought it wasn't working because many minutes after the change (maybe 5 or so), my CLI commands were still failing. Then, they started working. So, that was a fairly large delay.
Just now, I removed that policy and it took 2-3 minutes (enough to google this and read a couple other pages) before my host lost access.
Generally things are quite snappy for me as well, but if you're pretty sure something should work and it's not, just do yourself a favor and wait 10 minutes. Unfortunately, this makes automation after IAM changes sound harder than I thought!

DynamoDB's PutItem is multiple zones safe?

Accoring to the link [1]
Amazon DynamoDB has built-in fault tolerance, automatically and synchronously
replicating your data across three Availability Zones in a Region for high
availability and to help protect your data against individual machine, or even
facility failures.
So can I assume that, at the time I get result for a success write, it is already replicated into three Availability zones?
[1] http://aws.amazon.com/dynamodb/
I think it depends on how you do the read:
from http://aws.amazon.com/dynamodb/faqs/
Q: What is the consistency model of Amazon DynamoDB?
When reading data from Amazon DynamoDB, users can specify whether they want the read to be eventually consistent or strongly consistent:
Eventually Consistent Reads (Default) – the eventual consistency option maximizes your read throughput. However, an eventually consistent read might not reflect the results of a recently completed write. Consistency across all copies of data is usually reached within a second. Repeating a read after a short time should return the updated data.
Strongly Consistent Reads — in addition to eventual consistency, Amazon DynamoDB also gives you the flexibility and control to request a strongly consistent read if your application, or an element of your application, requires it. A strongly consistent read returns a result that reflects all writes that received a successful response prior to the read.
Yes, you can rely on the data being there if PutItem succeeded.
automatically and synchronously replicating your data across three Availability Zones
The keyword is synchronously, meaning at the same time. At the same time it accepts your data, it's writing to all three Availability Zones. If PutItem returned before copmleting those writes, DynamoDB wouldn't have the consistency and durability guarantees advertised.