DynamoDB Eventually consistent reads vs Strongly consistent reads - amazon-web-services

I recently came to know about two read modes of DynamoDB. But I am not clear about when to choose what. Can anyone explain the trade-offs?

Basically, if you NEED to have the latest values, use a fully consistent read. You'll get the guaranteed current value.
If your app is okay with potentially outdated information (mere seconds or less out of date), then use eventually consistent reads.
Examples of fully-consistent:
Bank balance (Want to know the latest amount)
Location of a locomotive on a train network (Need absolute certainty to guarantee safety)
Stock trading (Need to know the latest price)
Use-cases for eventually consistent reads:
Number of Facebook friends (Does it matter if another was added in the last few seconds?)
Number of commuters who used a particular turnstile in the past 5 minutes (Not important if it is out by a few people)
Stock research (Doesn't matter if it's out by a few seconds)

Apart from the other answers shortly the reason for this read modes is:
Lets say you have table User in eu-west-1 region. Without you being aware there are multiple Availability Zones AWS handles in the background. Like replicates your data in case of failure etc..Basically there are copies of your tables and once you insert a table there needs to be multiple resources to be updated.
But now when you wanna read there might be a chance that you are reading from not-yet-updated table without being aware of. Usually takes under a second for dynamodb to update. This is why its called eventually consistent. It will eventually be consistent in a short amount of time :)
When making decision knowing this reasoning helps me to understand and design my use cases.

Related

Are materialized views in redshift worth their costs

I'm working on a project in aws redshift with a few billion rows where the main queries are rollups on time units. The current implementation has mvs for all these rollups. It seems to me that if redshift is all it's cracked up to be and the dist and sort keys are defined correctly the mvs should not be necessary and their costs in extra storage and maintenance (refresh). I'm wondering if anyone has analyzed this in a similar application.
You're thinking along the right path but the real world doesn't always allow for 'just do it better'.
You are correct that sometimes MVs are just used to forego the effort of optimizing a complex query but sometimes not. The selection of keys, especially distribution key, is a compromise between optimizing different workloads. Distribute one way and query A gets faster but query B gets slower. But if the results of query B don't need to be completely up to date, one can make an MV out of B and only pay the price on refresh.
Sometimes queries are very complex and time consuming (and not because they aren't optimized). The results of this query doesn't need to include the latest info to be valid so an MV can can make the cost of this query infrequent. [In reality MVs often represent complex subqueries that are referenced by a number of other queries which makes accentuates the frequent vs. infrequent value of the MV.]
Sometimes query types don't match well to Redshift's distributed, columnar nature and just don't perform well. Again, current-ness of data can be played off against cluster workload and these queries can be run at low usage times.
With all that said I think you are on the right path as I've also been trying to get people to see that many, many queries are just poorly written. Too often in the data world functionally correct equals done and in reality this is only half done. I've rewritten queries that were taking 90 minutes to execute (browning out the cluster when they ran) and got them down to 17 seconds. So keep up the good fight but use MVs as a last resort when compromise is the only solution.

Basics to how a distributed, consistent key-value storage system return the latest key when dealing with concurrent requests?

I am getting up to speed on distributed systems (studying for an upcoming interview), and specifically on the basics for how a distributed system works for a distributed, consistent key-value storage system managed in memory.
My specific questions I am stuck on that I would love just a high level answer on if it's no trouble:
#1
Let's say we have 5 servers that are responsible to act as readers, and I have one writer. When I write the value 'foo' to the key 'k1', I understand it has to propagate to all of those servers so they all store the value 'foo' for the key k1. Is this correct, or does the writer only write to the majority (quorum) for this to work?
#2
After #1 above takes place, let's say concurrently a read comes in for k1, and a write comes in to replace 'foo' with 'bar', however not all of the servers are updated with 'bar. This means some are 'foo' and some are 'bar'. If I had lots of concurrent reads, it's conceivable some would return 'foo' and some 'bar' since it's not updated everywhere yet.
When we're talking about eventual consistency, this is expected, but if we're talking about strong consistency, how do you avoid #2 above? I keep seeing content about quorum and timestamps but on a high level, is there some sort of intermediary that sorts out what the correct value is? Just wanted to get a basic idea first before I dive in more.
Thank you so much for any help!
In doing more research, I found that "consensus algorithms" such as Paxos or Raft is the correct solution here. The idea is that your nodes need to arrive at a consensus of what the value is. If you read up on Paxos or Raft you'll learn everything you need to - it's quite complex to explain here, but there are videos/resources out there that cover this well.
Another thing I found helpful was learning more about Dynamo and DynamoDB. They handle the subject as well, although not strongly consistent/distributed.
Hope this helps someone, and message me if you'd like more details!
Read the CAP theorem will help you solve your problem. You are looking for consistence and network partition in this question, so you have to sacrifice the availability. The system needs to block and wait until all nodes finish writing. In other word, the change can not be read before all nodes have updated it.
In theoretical computer science, the CAP theorem, also named Brewer's
theorem after computer scientist Eric Brewer, states that any
distributed data store can only provide two of the following three
guarantees:
Consistency Every read receives the most recent write or an error.
Availability Every request receives a (non-error) response, without
the guarantee that it contains the most recent write.
Partition tolerance The system continues to operate despite an arbitrary number
of messages being dropped (or delayed) by the network between nodes.

How to deal with big aggregate roots with lots of children?

I'm quite new to DDD and in our recent project we faced a problem I didn't expect.
In our domain we model the budget of a company.
Keeping things simple, the budget is a table with a bunch of rows and columns. Every department in a company has a different budget for a given year. The table could have quite a bit of rows and fixed amount of columns (basically, a name and values for every month).
Business rules our business person want to enforce are applied at a whole table level, like you can't have two rows with same name or can't edit locked table, etc.
So, after some elaboration we're decided to make a single table an aggregate in this bounding context.
Then things started to be interesting.
Basically, we have two problems now:
One table could be edited by many users at the same time, and even if someone is editing a single cell and another person edits completely different cell, from the aggregate point of view those operations are parallel edits for the single aggregate, and every one of them requires us to load the whole aggregate, apply a change, check business rules and save it back to the database.
Since we were using event sourcing, load operation becomes slower and slower with every event we commit to the database, so we decided to use a snapshot approach. We do the snapshot every X events so the load operation won't take a lot of time. But at some point we realised that after a week one of our tables had thousands of edits and the snapshot event is a giant json string like 1_000_000 characters long. Even transfering it from the database is quite slow.
At this point I started to think that making the entire table an aggregate was a mistake, and we could take a more granular approach, but I don't know any such rule in DDD I could refer to, and don't quite understand how I could enforce business rules on the entire table if I just split the aggregate to rows or something like that.
Could any of you please tell me where I was wrong and what I could do to improve the model, with references to sources I could reason about with the team?
Aggregates are consistency boundaries, which generally means that you want to keep them small for the reasons you've encountered.
It probably makes sense to have the table be an aggregate so that table-level constraints and transitions (e.g. locked/unlocked state, for sure) can be enforced/handled. But I'm not sure the table needs to contain all the contents of the rows: the content of each row can be modeled as its own aggregate, with the table aggregate tracking the sequence of rows (obviously, accessing the rows would be through the aggregate root).
In this approach, the table aggregate can enforce at write-time invariants that only touch the table, and the row aggregate can enforce at write-time invariants which only touch a single row. Enforcement of invariants crossing multiple rows will have to be done via at least one projection, which means that the system cannot guarantee to reject all writes which violates the invariant.
This implies that there will almost surely eventually be a case where the desired invariant is violated (in which case, calling it an invariant is a little bit of an abuse of terminology, but bear with me...), and the system (including users, operators, etc.) will need to take a compensating action to restore the invariant. The nature of the compensating action is a business concern: sometimes it can be automated, sometimes it's a matter of alerting for manual action.
If that's actually unacceptable, and the business demands are such that the table needs to be able to enforce invariants covering the content of multiple rows, then you are pretty well stuck with the giant table aggregate (note that you'd hit this problem even if you weren't doing DDD).
Depending on which language you're implementing in, you may find that taking advantage of the actor model and having each table be the internal state of a single actor pays off in performance terms: the actor is effectively serving as an in-memory always-current snapshot (because the actor will only process one message at a time) of the aggregate (the events since the latest persisted snapshot and the events since then only need to be replayed once to rehydrate the actor). If the table state is a million bytes of serialized JSON, it does not seem like data size forces you to take advantage of the clustering features of some actor frameworks, so any actor model implementation should work. This talk explores aspects of this approach.
If you have defined the budgeting domain around tables and rows then you may need to reconsider your design. I understand that many budgets are probably defined in spreadsheets, which is where the thinking comes from, but the domain terminology (ubiquitous language) would probably not be around a table/sheet and rows.
Chances are that the domain expert refers to budgeting items and probably states that an item is defined on a row and that column A is the item name and B is the January budget, etc.
The domain should be defined along the lines of the actual business language which would probably make a lot more sense and keep aggregates substantially smaller and more manageable. I've worked on some pretty horrendous budgeting spreadsheets in my career so I can imagine trying to store the history as a serious of events would explode the storage quite quickly (even with snapshots). Well, not even a real spreadsheet stores the data as a serious of events. There is an undo history in the current session but that is about it.
I'm no financial guru but you probably have a budget item of sorts and then for each year you'll have a budget with the values for each month. One budget item is quite a bit simpler than everything in one large aggregate. The budget items can be grouped and categorised as a separate exercise.
It isn't inconceivable that I've misinterpreted your scenario but that would be my take on this.

When a ConcurrencyException occurs, what gets written?

We use RavenDB in our production environment. It stores millions of documents, and gets updated pretty much constantly during the day.
We have two boxes load-balanced using a round-robin strategy which replicate to one another.
Every week or so, we get a ConcurrencyException from Raven. I understand that this basically means that one of the servers was told to insert or update the same document within in short timeframe - it's kind of like a conflict exception, except occurring on the same server instead of two replicating servers.
What happens when this error occurs? Can I assume that at least one of the writes succeeded? Can I predict which one? Is there anything I can do to make these exceptions less likely?
ConcurrencyException means that on a single server, you have two writes to the same document at the same instant.
That lead to:
One write is accepted.
One write is rejected (with concurrency exception).

Should I be concerned with bit flips on Amazon S3?

I've got some data that I want to save on Amazon S3. Some of this data is encrypted and some is compressed. Should I be worried about single bit flips? I know of the MD5 hash header that can be added. This (from my experience) will prevent flips in the most unreliable portion of the deal (network communication), however I'm still wondering if I need to guard against flips on disk?
I'm almost certain the answer is "no", but if you want to be extra paranoid you can precalculate the MD5 hash before uploading, compare that to the MD5 hash you get after upload, then when downloading calculate the MD5 hash of the downloaded data and compare it to your stored hash.
I'm not sure exactly what risk you're concerned about. At some point you have to defer the risk to somebody else. Does "corrupted data" fall under Amazon's Service Level Agreement? Presumably they know what the file hash is supposed to be, and if the hash of the data they're giving you doesn't match, then it's clearly their problem.
I suppose there are other approaches too:
Store your data with an FEC so that you can detect and correct N bit errors up to your choice of N.
Store your data more than once in Amazon S3, perhaps across their US and European data centers (I think there's a new one in Singapore coming online soon too), with RAID-like redundancy so you can recover your data if some number of sources disappear or become corrupted.
It really depends on just how valuable the data you're storing is to you, and how much risk you're willing to accept.
I see your question from two points of view, a theoretical and practical.
From a theoretical point of view, yes, you should be concerned - and not only about bit flipping, but about several other possible problems. In particular section 11.5 of the customer agreements says that Amazon
MAKE NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, WHETHER EXPRESS, IMPLIED, STATUTORY OR OTHERWISE WITH RESPECT TO THE SERVICE OFFERINGS. (..omiss..) WE AND OUR LICENSORS DO NOT WARRANT THAT THE SERVICE OFFERINGS WILL FUNCTION AS DESCRIBED, WILL BE UNINTERRUPTED OR ERROR FREE, OR FREE OF HARMFUL COMPONENTS, OR THAT THE DATA YOU STORE WITHIN THE SERVICE OFFERINGS WILL BE SECURE OR NOT OTHERWISE LOST OR DAMAGED.
Now, in practice, I'd not be concerned. If your data will be lost, you'll blog about it and (although they might not face any legal action), their business will be pretty much over.
On the other hand, that depends on how much vital your data is. Suppose that you were rolling your own stuff in your own data center(s). How would you plan for disaster recovery there? If you says: I'd just keep two copies in two different racks, just use the same technique with Amazon, maybe keeping two copies in two different datacenters (since you wrote that you are not interested in how to protect against bit flips, I'm providing only a trivial example here)
Probably not: Amazon is using checksums to protect against bit flips, regularly combing through data at rest, ensuring that no bit flips have occurred. So, unless you have corruption in all instances of the data within the interval of integrity check loops you should be fine.
Internally, S3 uses MD5 checksums throughout the system to detect/protect against bitflips. When you PUT an object into S3, we compute the MD5 and store that value. When you GET an object we recompute the MD5 as we stream it back. If our stored MD5 doesn't match the value we compute as we're streaming the object back we'll return an error for the GET request. You can then retry the request.
We also continually loop through all data at rest, recomputing checksums and validating them against the MD5 we saved when we originally stored the object. This allows us to detect and repair bit flips that occur in data at rest. When we find a bit flip in data at rest, we repair it using the redundant data we store for each object.
You can also protect yourself against bitflips during transmission to and from S3 by providing an MD5 checksum when you PUT the object (we'll error if the data we received doesn't match the checksum) and by validating the MD5 when GET an object.
Source:
https://forums.aws.amazon.com/thread.jspa?threadID=38587
There are two ways of reading your question:
"Is Amazon S3 perfect?"
"How do I handle the case where Amazon S3 is not perfect?"
The answer to (1) is almost certainly "no". They might have lots of protection to get close, but there is still the possibility of failure.
That leaves (2). The fact is that devices fail, sometimes in obvious ways and other times in ways that appear to work but give an incorrect answer. To deal with this, many databases use a per-page CRC to ensure that a page read from disk is the same as the one that was written. This approach is also used in modern filesystems (for example ZFS, which can write multiple copies of a page, each with a CRC to handle raid controller failures. I have seen ZFS correct single bit errors from a disk by reading a second copy; disks are not perfect.)
In general you should have a check to verify that your system is operating is you expect. Using a hash function is a good approach. What approach you take when you detect a failure depends on your requirements. Storing multiple copies is probably the best approach (and certainly the easiest) because you can get protection from site failures, connectivity failures and even vendor failures (by choosing a second vendor) instead of just redundancy in the data itself by using FEC.