RDS - max replica lag when using read replica - amazon-web-services

I am using RDS's read replica mechanism for a schema update to a very large Mysql table.
I ran an Alter command which locks the table for a long period of time (more than 24 hours).
In that period of time my read replica was not getting updated and I noticed the Replica lag value was slowly increasing.
When the table update was complete I saw that the Replica lag was slowly decreasing until the read replica finally caught up with the original DB.
While my Alter command was running, I did a small experiment and occasionally updated a specific row so I can follow it on my read replica. My experiment showed that the updates to this specific row indeed eventually happened also in the read replica (after the table was unlocked).
Based on the above experiment result I assume all updates that were blocked while my read replica was updating eventually were also performed on my replicated DB after the table modification but it would be hard to prove something like that for such a big table and such a long period of time.
I couldn't find any official documentation on how this mechanism works and I was wondering where exactly all these updates are buffered and what would be the limit of this buffer (e.g. when will I start loosing changes that occured on my master DB)?

This is covered in the documentation. Specifically, the replica ("slave") server's relay log is the place where the changes usually wait until they are actually executes on the replica.
http://dev.mysql.com/doc/refman/5.6/en/slave-logs.html
But, the limit to how far behind a replica can be -- but still, eventually, have data identical to the master -- is a combination of factors. It should not ever quietly "misplace" any of the buffered changes, as long as it's being monitored.
Each time the data on the master database changes, the master writes a replication event to its binary log, and these logs are delivered to the replica, usually in near-real-time, where they are stored, pretty much as-sent, in the relay logs, as the first step in a 2-step process.
The second step is for the replica to read through those logs, sequentially, and modify its local data set, according to what the master server sent. The statements are typically executed sequentially.
The two biggest factors that determine how far behind a replica can safely become are the amount of storage available for relay logs on the replica and the amount of storage plus log retention time on the master. RDS has additional logic on top of "stock" MySQL Server to prevent the master from purging its copy of the log until the replica(s) have received them.

Related

Is it possible to write to DynamoDB only when spare capacity is available?

I am working on an application which receives very predictable, heavy traffic during working hours. Users typically interact with the app for about 40 minutes at a time. DynamoDB table A receives a steady stream of writes throughout user sessions and handles things without difficulty. We attempt to write a large amount of data to table B at the end of each session, however, and early in the day this can result in throttling. Our tables are billed on-demand (no, this is not something I am able to change), but the sudden spike in writes still causes throttling, which is expected.
The data being written to table A is both critical and time sensitive. The data going to table B is critical and must not be lost, but delays in data availability from table B on the order of a few hours is acceptable, but not ideal. So I'm looking for a way to say "please write this to the table ASAP, but only as long as it won't cause throttling". Provisioning for the expected capacity is not an option (don't ask). An SQS queue with a long message delay doesn't really fit the bill because (a) 15 minutes may not be long enough and (b) it doesn't meet the "ASAP" part of the story. I've considered pre-warming the table, but that's just cludgy.
So... you take all the expected ways to handle this that were designed and provided by AWS then say you can't use them. That... doesn't leave you much options.
You're pretty much left with designing some custom architecture. Throttling, provisioning, burst provisioning, on demand, and all are all part of the package for handling these kinds of bursts. If you can't use them, then you'll have to do something like write the entry as a json to an s3 bucket and have some cron event pick them up in an hour or something one a time and batch write them to the table.
You may want to take a look at how your table is arranged. If you are having to make a lot of writes all at once (ie, because you have to duplicate data through multiple PK/SK combinations in order to be able to recall it with a single query) then an RDS may be better suited for the task at hand. Dynamo is more for quick and snappy queries and not really for extended data logging or storage.
Here's the secret to DDB on-demand...
From the page you linked to
For new on-demand tables, you can immediately drive up to 4,000 write
request units or 12,000 read request units, or any linear combination
of the two. For an existing table that you switched to on-demand
capacity mode, the previous peak is half the previous provisioned
throughput for the tableā€”or the settings for a newly created table
with on-demand capacity mode, whichever is higher. For more
information, see Initial throughput for on-demand capacity mode.
And the Inital throughput for on-demand capacity mode page says:
Initial Throughput for On-Demand Capacity Mode If you recently
switched an existing table to on-demand capacity mode for the first
time, or if you created a new table with on-demand capacity mode
enabled, the table has the following previous peak settings, even
though the table has not served traffic previously using on-demand
capacity mode:
Newly created table with on-demand capacity mode: The previous peak is
2,000 write request units or 6,000 read request units. You can drive
up to double the previous peak immediately, which enables newly
created on-demand tables to serve up to 4,000 write request units or
12,000 read request units, or any linear combination of the two.
Existing table switched to on-demand capacity mode: The previous peak
is half the maximum write capacity units and read capacity units
provisioned since the table was created, or the settings for a newly
created table with on-demand capacity mode, whichever is higher. In
other words, your table will deliver at least as much throughput as it
did prior to switching to on-demand capacity mode.
The key thing to realize is that DDB on-demand "peaks" are never lowered..
So if you have a table that at some point peaked at 20K WCU, you can scale cleanly from 1-20K without throttling.
In other words, you shouldn't continue to see throttling in an app unless you hit a new peak.
You can also artificially set the peak by changing the table to provisioned at double the expected peak. Then when you convert it back to on-demand, you'll have a "peak" set for half the provisioned capacity.

Amazon DynamoDB read latency while writing

I have an Amazon DynamoDB table which is used for both read and write operations. Write operations are performed only when the batch job runs at certain intervals whereas Read operations are happening consistently throughout the day.
I am facing a problem of increased Read latency when there is significant amount of write operations are happening due to the batch jobs. I explored a little bit about having a separate read replica for DynamoDB but nothing much of use. Global tables are not an option because that's not what they are for.
Any ideas how to solve this?
Going by the Dynamo paper, the concept of a read-replica for a record or a table does not exist in Dynamo. Within the same region, you will have multiple copies of a record depending on the replication factor (R+W > N) where N is the replication factor. However when the client reads, one of those records are returned depending on the cluster health.
Depending on how the co-ordinator node is chosen either at the client library or at the cluster, the client can only ask for a record (get) or send a record(put) to either the cluster co-ordinator ( 1 extra hop ) or to the node assigned to the record (single hop to record). There is just no way for the client to say 'give me a read replica from another node'. The replicas are there for fault-tolerance, if one of the nodes containing the master copy of the record dies, replicas will be used.
I am researching the same problem in the context of hot keys. Every record gets assigned to a node in Dynamo. So a million reads on the same record will lead to hot keys, loss of reads/writes etc. How to deal with this ? A read-replica will work great because I can now manage the hot keys at the application and move all extra reads to read-replica(s). This is again fraught with issues.

How does the snapshot size affects restore process in Amazon Redshift?

I am doing some POC around creating a cluster from a snapshot. But I am uncertain about the time it takes to restore from an existing snapshot. Sometimes it takes around 10 mins but sometimes it also takes as long as 30 min.
Is there any data(size of snapshot) vs time breakup is available?
What operations does redshift perform in the background during the restore process?
Redshift restore from snapshot does not require a full repopulate of data before the cluster is available. Cluster availability is based on having the hardware, OS, and application up alone with populating the leader node (blocklist mostly). Once these are in place the cluster can take queries and if the table data is not yet loaded into the cluster from the snapshot the restore of the data blocks needed will be prioritized and the query will run slow until these blocks are populated. Since most queries are based on a minority of "hot" blocks the query speed for most will be as fast as usual fairly quickly.
I know this just complicates the analysis you are performing but this is how restore works. I expect you are seeing variability based on many factors and a small one of these is the size of the blocklist table on the leader node. How does the time for creating an empty cluster compare? How variable is this?

Concurrent Queries, COPY and Connections in AWS Redshift

I am trying to understand the difference between concurrent connections and concurrent queries in Redshift. As per documents, We can make 500 concurrent connections to a Redshift cluster but it says maximum 15 queries can be run at the same time in a cluster. Now what is the exact value?
How many queries can be in running state in a cluster at the same time ? If it is 15, does it include RETURNING state queries as well ?
How many concurrent COPY statement can run in a cluster ?
We are evaluating Redshift as our primary reporting data store. If we cannot run a large number of queries simultaneously it may be difficult for us to go with this model.
I think, you have misread somewhere, Max concurrent queries are 50 per WLM. Refer below thread for Amazon support response for more detail.
How many queries can be in running state in a cluster at the same time ? If it is 15, does it include RETURNING state queries as well ?
At a time, Max 50 queries could be running concurrently. Yes it does include INSERT/UPDATE/DELETE etc all.
How many concurrent COPY statement can run in a cluster ?
Ideally, you could Max go up to 50 concurrently, but Copy works bit differently.
Amazon Redshift automatically loads in parallel from multiple data files.
If you use multiple concurrent COPY commands to load one table from multiple files, Amazon Redshift is forced to perform a serialized load, which is much slower and requires a VACUUM at the end if the table has a sort column defined. For more information about using COPY to load data in parallel, see Loading Data from Amazon S3.
Meaning, you could run concurrent Copy commands but make sure one copy command at a time per table.
So practically, it doesn't depend on Nodes on cluster, but Number of tables as well.
So if you have only 1 table, you would like to execute 50 insert concurrently, it will result only 1 Copy concurrently.

Update DynamoDB table for Provisioned Throughput

In the API of DynamoDB there is a way to increase/decrease table Provisioned Throughput but there is some Active mode that needs to be updated, what if there is two scripts that running on the same table at the same time and one of them read and the other update the table, What's going to happened with the one that reading? Is it going to failed?
I think maybe before every reading I can check if the table is on Active mode and if not just wait until it does but each time that I'm Query/Scan the database I need to make this check. Maybe it's not necessary.
Is anyone know about this?
It's not necessary, you can still read from the table when it's been updated.
EDIT:
from http://aws.amazon.com/dynamodb/faqs/
Q: Does Amazon DynamoDB remain available when I ask it to scale up or down by changing the provisioned throughput?
Yes. Amazon DynamoDB is designed to scale its provisioned throughput up or down while still remaining available.
DynamoDB reads are "eventually consistent", so the query/scan may not see the updated rows but the request will not fail. You can request consistent reads if you need them though (though they consume slightly more Read Capacity).
See the docs for more information.