Suppose we have a master-slave replication for a database system. Master node goes down, and suppose it has some of the writes which are not yet replicated to any slave node yet. So now system makes one of the follower as leader, meaning when older leader comes back its unreplicated writes are discarded.
Suppose system is using an autoincrement strategy for key generation, as a result new leader may generate keys which were already generated by older leader when it was up and running.
This may cause data consistency issue, particularly if data is consumed by some other systems, like cache. Cache may have key from older leader, but with new leader same key is now pointing to different data.
How is such a problem handled?
Related
I have an Amazon DynamoDB table which is used for both read and write operations. Write operations are performed only when the batch job runs at certain intervals whereas Read operations are happening consistently throughout the day.
I am facing a problem of increased Read latency when there is significant amount of write operations are happening due to the batch jobs. I explored a little bit about having a separate read replica for DynamoDB but nothing much of use. Global tables are not an option because that's not what they are for.
Any ideas how to solve this?
Going by the Dynamo paper, the concept of a read-replica for a record or a table does not exist in Dynamo. Within the same region, you will have multiple copies of a record depending on the replication factor (R+W > N) where N is the replication factor. However when the client reads, one of those records are returned depending on the cluster health.
Depending on how the co-ordinator node is chosen either at the client library or at the cluster, the client can only ask for a record (get) or send a record(put) to either the cluster co-ordinator ( 1 extra hop ) or to the node assigned to the record (single hop to record). There is just no way for the client to say 'give me a read replica from another node'. The replicas are there for fault-tolerance, if one of the nodes containing the master copy of the record dies, replicas will be used.
I am researching the same problem in the context of hot keys. Every record gets assigned to a node in Dynamo. So a million reads on the same record will lead to hot keys, loss of reads/writes etc. How to deal with this ? A read-replica will work great because I can now manage the hot keys at the application and move all extra reads to read-replica(s). This is again fraught with issues.
I am doing some POC around creating a cluster from a snapshot. But I am uncertain about the time it takes to restore from an existing snapshot. Sometimes it takes around 10 mins but sometimes it also takes as long as 30 min.
Is there any data(size of snapshot) vs time breakup is available?
What operations does redshift perform in the background during the restore process?
Redshift restore from snapshot does not require a full repopulate of data before the cluster is available. Cluster availability is based on having the hardware, OS, and application up alone with populating the leader node (blocklist mostly). Once these are in place the cluster can take queries and if the table data is not yet loaded into the cluster from the snapshot the restore of the data blocks needed will be prioritized and the query will run slow until these blocks are populated. Since most queries are based on a minority of "hot" blocks the query speed for most will be as fast as usual fairly quickly.
I know this just complicates the analysis you are performing but this is how restore works. I expect you are seeing variability based on many factors and a small one of these is the size of the blocklist table on the leader node. How does the time for creating an empty cluster compare? How variable is this?
Let's say that I have several AWS Lambda functions that make up my API. One of the functions reads a specific value from a specific key on a single Redis node. The business logic goes as follows:
if the key exists:
serve the value of that key to the client
if the key does not exist:
get the most recent item from dynamoDB
insert that item as the value for that key, and set an expiration time
delete that item from dynamoDB, so that it only gets read into memory once
Serve the value of that key to the client
The idea is that every time a client makes a request, they get the value they need. If the key has expired, then lambda needs to first get the item from the database and put it back into Redis.
But what happens if 2 clients make an API call to lambda simultaneously? Will both lambda processes read that there is no key, and both will take an item from a database?
My goal is to implement a queue where a certain item lives in memory for only X amount of time, and as soon as that item expires, the next item should be pulled from the database, and when it is pulled, it should also be deleted so that it won't be pulled again.
I'm trying to see if there's a way to do this without having a separate EC2 process that's just keeping track of timing.
Is redis+lambda+dynamoDB a good setup for what I'm trying to accomplish, or are there better ways?
A Redis server will execute commands (or transactions, or scripts) atomically. But a sequence of operations involving separate services (e.g. Redis and DynamoDB) will not be atomic.
One approach is to make them atomic by adding some kind of lock around your business logic. This can be done with Redis, for example.
However, that's a costly and rather cumbersome solution, so if possible it's better to simply design your business logic to be resilient in the face of concurrent operations. To do that you have to look at the steps and imagine what can happen if multiple clients are running at the same time.
In your case, the flaw I can see is that two values can be read and deleted from DynamoDB, one writing over the other in Redis. That can be avoided by using Redis's SETNX (SET if Not eXists) command. Something like this:
GET the key from Redis
If the value exists:
Serve the value to the client
If the value does not exist:
Get the most recent item from DynamoDB
Insert that item into Redis with SETNX
If the key already exists, go back to step 1
Set an expiration time with EXPIRE
Delete that item from DynamoDB
Serve the value to the client
According to the Amazon Kinesis Streams documentation, a record can be delivered multiple times.
The only way to be sure to process every record just once is to temporary store them in a database that supports Integrity checks (e.g. DynamoDB, Elasticache or MySQL/PostgreSQL) or just checkpoint the RecordId for each Kinesis shard.
Do you know a better / more efficient way of handling duplicates?
We had exactly that problem when building a telemetry system for a mobile app. In our case we were also unsure that producers where sending each message exactly once, therefore for each received record we calculated its MD5 on the fly and checked whether it is presented in some form of a persistent storage, but indeed what storage to use is the trickiest bit.
Firstly, we tried trivial relational database, but it quickly became a major bottleneck of the whole system as this isn't just read-heavy but also write-heavy case, since the volume of data going though Kinesis was quite significant.
We ended up having a DynamoDB table storing MD5's for each unique message. The issue we had was that it wasn't so easy to delete the messages - even though our table contained partition and sort keys, DynamoDB does not allow to drop all records with a given partition key, we had to query all of the to get sort key values (which wastes time and capacity). Unfortunately, we had to just simply drop the whole table once in a while. Another way suboptimal solution is to regularly rotate DynamoDB tables which store message identifiers.
However, recently DynamoDB introduced a very handy feature - Time To Live, which means that now we can control the size of a table by enabling auto-expiry on a per record basis. In that sense DynamoDB seems to be quite similar to ElastiCache, however ElastiCache (at least Memcached cluster) is much less durable - there is no redundancy there, and all data residing on terminated nodes is lost in case of scale in operation or failure.
The thing you mentioned is a general problem of all queue systems with "at least once" approach. Also, not just the queue systems, the producers and consumers both may process the same message multiple times (due to ReadTimeout errors etc.). Kinesis and Kafka both uses that paradigm. Unfortunately there is not an easy answer for that.
You may also try to use an "exactly-once" message queue, with stricter transaction approach. For example AWS SQS does that: https://aws.amazon.com/about-aws/whats-new/2016/11/amazon-sqs-introduces-fifo-queues-with-exactly-once-processing-and-lower-prices-for-standard-queues/ . Be aware, SQS throughput is far smaller than Kinesis.
To solve your problem, you should be aware of your application domain and try to solve it internally like you suggested (database checks). Especially when you communicate with an external service (let's say an email server for example), you should be able to recover the operation state in order to prevent double processing (because double sending in the email server example, may result in multiple copies of the same post in the recipient's mailbox).
See also the following concepts;
At-least-once Delivery: http://www.cloudcomputingpatterns.org/at_least_once_delivery/
Exactly-once Delivery: http://www.cloudcomputingpatterns.org/exactly_once_delivery/
Idempotent Processor: http://www.cloudcomputingpatterns.org/idempotent_processor/
I am using RDS's read replica mechanism for a schema update to a very large Mysql table.
I ran an Alter command which locks the table for a long period of time (more than 24 hours).
In that period of time my read replica was not getting updated and I noticed the Replica lag value was slowly increasing.
When the table update was complete I saw that the Replica lag was slowly decreasing until the read replica finally caught up with the original DB.
While my Alter command was running, I did a small experiment and occasionally updated a specific row so I can follow it on my read replica. My experiment showed that the updates to this specific row indeed eventually happened also in the read replica (after the table was unlocked).
Based on the above experiment result I assume all updates that were blocked while my read replica was updating eventually were also performed on my replicated DB after the table modification but it would be hard to prove something like that for such a big table and such a long period of time.
I couldn't find any official documentation on how this mechanism works and I was wondering where exactly all these updates are buffered and what would be the limit of this buffer (e.g. when will I start loosing changes that occured on my master DB)?
This is covered in the documentation. Specifically, the replica ("slave") server's relay log is the place where the changes usually wait until they are actually executes on the replica.
http://dev.mysql.com/doc/refman/5.6/en/slave-logs.html
But, the limit to how far behind a replica can be -- but still, eventually, have data identical to the master -- is a combination of factors. It should not ever quietly "misplace" any of the buffered changes, as long as it's being monitored.
Each time the data on the master database changes, the master writes a replication event to its binary log, and these logs are delivered to the replica, usually in near-real-time, where they are stored, pretty much as-sent, in the relay logs, as the first step in a 2-step process.
The second step is for the replica to read through those logs, sequentially, and modify its local data set, according to what the master server sent. The statements are typically executed sequentially.
The two biggest factors that determine how far behind a replica can safely become are the amount of storage available for relay logs on the replica and the amount of storage plus log retention time on the master. RDS has additional logic on top of "stock" MySQL Server to prevent the master from purging its copy of the log until the replica(s) have received them.