I ran a query which resulted in the below stats.
Elapsed time: 12.1 sec
Slot time consumed: 14 hr 12 min
total_slot_ms: 51147110 ( which is 14 hr 12 min)
We are on an on-demand pricing plan. So the max slots would be 2000. That being said, if I used 2000 slots for the whole 12.1 seconds span then I should end up with total_slot_ms as 24200000 ( which is 2000x12.1x1000). However, the total_slot_ms is 51147110. Average number of slots used are 51147110/121000 = 4225 ( which is way above 2000). Can some explain to me how I ended up using more than 2000 slots?
In a course of Google, there is an example where a query shows 13 "elapsed time" seconds and 50 minutes of "slot time consumed". They says:
Hey, across all of our workers, we did essentially 50 minutes of work massively in parallel, 50 minutes so that your query could be returned back in 13 seconds. Best of all for you, you don't need to worry about spinning up those workers, moving data in-between them, making sure they're sharing all their results between their aggregations. All you care about is writing the SQL, finding the insights, and then running that query in a very fast turnaround. But there is abstracted from you a lot of distributed parallel processing that's happening.
Increasing Bigquery slot capacity significantly improves overall query performance, despite the fact that slots amount is actually the subject for Quotas restriction along Bigquery on-demand pricing plan, exceeding slots limit does not charge you for additional costs:
BigQuery slots are shared among all queries in a single project.
BigQuery might burst beyond this limit to accelerate your queries.
To check how many slots you're using, see Monitoring BigQuery using
Cloud Monitoring.
BigQuery on-demand supports limited bursting. https://cloud.google.com/bigquery/docs/release-notes#December_10_2019
You might want to check execution plan for the query and understand all different slot_time_ms for wait, read, write activities at each stage. Since this is on-demand slots, you may see lots of wait time, that will add up into total time.
Besides bursting, each stage of explain pan will help you understand that total time is not necessarily actual slot consumption but equivalent slot consumption.
Related
We have created an Amazon Web Services Lambda Function in python and added code to get the execution time because we noticed that the runtime is varying significantly based on how long it has been since the last lambda call and provision concurrency seems to have little impact. For all the runs in the table below, we were running the same lambda with the same inputs and the program is deterministic, so everything related to our code is held constant. Below is a table of what we found. Interval means how long we waited before making another call to the lambda; for example 2 hours means that if we called a lambda at 1:00 PM, we called it again at 3:00 PM and 5:00 PM etc. (but not in between). # Invocations is the total number of times the lambda was called in succession for a given interval.
Provisioned Concurrency
Interval
# Invocations
Avg Duration
p90 Duration
off
2 hours
42
17.9 sec
32.0 sec
off
10 minutes
304
14.6 sec
21.2 sec
off
1 minute
364
4.5 sec
6.3 sec
on
2 hours
50
18.1 sec
31.2 sec
on
10 minutes
49
10.2 sec
29.6 sec
on
1 minute
404
4.6 sec
6.3 sec
So, a few questions that seem odd that maybe someone else has experience or knows something about:
We thought there would be a "hot" versus "cold" setting for the lambdas where provision concurrency would keep the lambdas "hot" all the time. However, 10 minutes versus 2 hours seems to be in a "warm" state (i.e. 10 minutes is faster than 2 hours, but not much faster). Then, at 1 minute, the lambda appears to be "hot" and consistently fast. Any idea what is going on here?
Also, we thought that provisioned concurrency would essentially make the lambda stay in the "hot" state regardless of invocation interval; however, that is not the case as can be seen with the 2 hour interval actually taking slightly longer. Having said that though, the 10 minute interval is slightly shorter with provision concurrency on.
Any help/incite on this is greatly appreciated.
Extra information based on comments:
The Lambda is triggered via API Gateway from an HTTP POST request.
The runtime is a custom Docker image. We are doing this so we can use a particular version of Tensorflow.
The Lambda does use other services: S3 and DynamoDB. We added logs to time each section, and we have identified that the majority of the time spent is inside of our self-contained section. Meaning, it doesn’t appear that S3 download/upload or DynamoDB read/write is taking much time at all. Most of the time is spent in our CPU intensive algorithm, which does image processing with Tensoflow. For those familiar with Deep Learning, the program loads a trained model and we do inference with that model and each test used the same image as input.
We’ve also tried adjusting the CPU/Memory of the lambda, and found it doesn’t seem to improve much beyond a certain point, and it does not fix the problem. The experiments from the table were gathered using 4GB Lambdas, and our logs report that less than 1GB of memory is ever used.
Introduction
We are trying to "measure" the cost of usage of a specific use case on one of our Aurora DBs that is not used very often (we use it for staging).
Yesterday at 18:18 hrs. UTC we issued some representative queries to it and today we were examining the resulting graphs via Amazon CloudWatch Insights.
Since we are being billed USD 0.22 per million read/write IOs, we need to know how many of those there were during our little experiment yesterday.
A complicating factor is that in the cost explorer it is not possible to group the final billed costs for read/write IOs per DB instance! Therefore, the only thing we can think of to estimate the cost is from the read/write volume IO graphs on CLoudwatch Insights.
So we went to the CloudWatch Insights and selected the graphs for read/write IOs. Then we selected the period of time in which we did our experiment. Finaly, we examined the graphs with different options: "Number" and "Lines".
Graph with "number"
This shows us the picture below suggesting a total billable IO count of 266+510=776. Since we have choosen the "Sum" metric, this we assume would indicate a cost of about USD 0.00017 in total.
Graph with "lines"
However, if we choose the "Lines" option, then we see another picture, with 5 points on the line. The first and last around 500 (for read IOs) and the last one at approx. 750. Suggesting a total of 5000 read/write IOs.
Our question
We are not really sure which interpretation to go with and the difference is significant.
So our question is now: How much did our little experiment cost us and, equivalently, how to interpret these graphs?
Edit:
Using 5 minute intervals (as suggested in the comments) we get (see below) a horizontal line with points at 255 (read IOs) for a whole hour around the time we did our experiment. But the experiment took less than 1 minute at 19:18 (UTC).
Wil the (read) billing be for 12 * 255 IOs or 255 ... (or something else altogether)?
Note: This question triggered another follow-up question created here: AWS CloudWatch insights graph — read volume IOs are up much longer than actual reading
From Aurora RDS documentation
VolumeReadIOPs
The number of billed read I/O operations from a cluster volume within
a 5-minute interval.
Billed read operations are calculated at the cluster volume level,
aggregated from all instances in the Aurora DB cluster, and then
reported at 5-minute intervals. The value is calculated by taking the
value of the Read operations metric over a 5-minute period. You can
determine the amount of billed read operations per second by taking
the value of the Billed read operations metric and dividing by 300
seconds. For example, if the Billed read operations returns 13,686,
then the billed read operations per second is 45 (13,686 / 300 =
45.62).
You accrue billed read operations for queries that request database
pages that aren't in the buffer cache and must be loaded from storage.
You might see spikes in billed read operations as query results are
read from storage and then loaded into the buffer cache.
Imagine AWS report these data each 5 minutes
[100,150,200,70,140,10]
And you used the Sum of 15 minutes statistic like what you had on the image
F̶i̶r̶s̶t̶,̶ ̶t̶h̶e̶ ̶"̶n̶u̶m̶b̶e̶r̶"̶ ̶v̶i̶s̶u̶a̶l̶i̶z̶a̶t̶i̶o̶n̶ ̶r̶e̶p̶r̶e̶s̶e̶n̶t̶ ̶o̶n̶l̶y̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶a̶g̶g̶r̶e̶g̶a̶t̶e̶d̶ ̶g̶r̶o̶u̶p̶.̶ ̶I̶n̶ ̶y̶o̶u̶r̶ ̶c̶a̶s̶e̶ ̶o̶f̶ ̶1̶5̶ ̶m̶i̶n̶u̶t̶e̶s̶ ̶a̶g̶g̶r̶e̶g̶a̶t̶i̶o̶n̶,̶ ̶i̶t̶ ̶w̶o̶u̶l̶d̶ ̶b̶e̶ ̶(̶7̶0̶+̶1̶4̶0̶+̶1̶0̶)̶
Edit: First, the "number" visualization represent the whole selected duration, aggregated with would be the total of (100+150+200+70+140+10)
The "line" visualization will represent all the aggregated groups. which would in this case be 2 points (100+150+200) and (70+140+10)
It can be a little bit hard to understand at first if you are not used to data points and aggregations. So I suggest that you set your "line" chart to Sum of 5 minutes you will need to get value of each points and devide by 300 as suggested by the doc then sum them all
Added images for easier visualization
Let's say I have:
A table with 100 RCUs
This table has 200 items
Each item has 4kb
As far as I understand, RCU are calculated per second and you spend 1 full RCU per 4kb (with a strongly consistent read).
1) Because of this, if I spend more than 100 RCU in one second I should get an throttling error, right?
2) How can I predict that a certain request will require more than my provisioned througput? It feels scary that at any time I can compromise the whole database by making a expensive request.
3) Let's say I want to do a scan on the whole table (get all items), so that should require 200 RCUS, But that will depend on how fast dynamodb does it right? If its too fast it will give me an error, but if it takes 2 seconds or more it should be fine, how do I account for this? How to take in account DynamoDB speed to know how much RCUs I will need? What it DynamoDB "speed"?
4) What's the difference between throttling and throughput limit exceeded?
Most of your questions are theoretical at this point , because you now (as of Nov 2018) have the option of simply telling dynamodbv to use 'on demand' mode where you no longer need to calculate or worry about RCU's. Simply enable this option, and forget about it. I had similar problems in the past because of very uneven workloads - periods of no activity and then periods where I needed to do full table scans to generate a report - and struggled to get it all working seemlessly.
I turned on 'on demand' mode, cost went down by about 70% in my case, and no more throttling errors. Your cost profile may be different, but I would defintely check out this new option.
https://aws.amazon.com/blogs/aws/amazon-dynamodb-on-demand-no-capacity-planning-and-pay-per-request-pricing/
I'm using Ganglia + RRDTool for monitoring a web farm. Many graphs are very clear but when I see load_one metric, I don't have Y-axis legend.
So, what the Y-axis means?
Thanks.
Load_one is the load average over one minute. It is the number of threads (kernel level) that are runnable and queued while waiting for CPU resources, averaged over one minute.
The number should be interpreted in relation with the number of hardware threads available on the machine and the time it takes to drain the run queue. The latter can be evaluated by looking at five and fifteen minute load averages, as long as these stay reasonable, you should be OK
Is it averaged per second? Per minute? Per hour?
For example.. if I pay for 10 "read units" which allows for 10 highly consistent reads per second, will I be throttled if I try to do 20 reads in a single second, even if it was the only 20 reads that occurred in the last hour? The Amazon documentation and FAQ do not answer this critical question anywhere that I could find.
The only related response I could find in the FAQ completely ignores the issue of how usage is calculated and when throttling may happen:
Q: What happens if my application performs more reads or writes than
my provisioned capacity?
A: If your application performs more
reads/second or writes/second than your table’s provisioned throughput
capacity allows, requests above your provisioned capacity will be
throttled and you will receive 400 error codes. For instance, if you
had asked for 1,000 write capacity units and try to do 1,500
writes/second of 1 KB items, DynamoDB will only allow 1,000
writes/second to go through and you will receive error code 400 on
your extra requests. You should use CloudWatch to monitor your request
rate to ensure that you always have enough provisioned throughput to
achieve the request rate that you need.
It appears that they track writes in a five minute window and will throttle you when your average over the last five minutes exceeds your provisioned throughput.
I did some testing. I created a test table with throughput of 1 write/second. If I don't write to it for a while and then send a stream of requests, Amazon seems to accept about 300 before it starts throttling.
The caveat, of course, is that this is not stated in any official Amazon documentation and could change at any time.
The DynamoDB provides 'Burst Capacity' which allows for spikes in amount of data read from table. You can read more about it under: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Bursting
Basically it's what #abjennings noticed - It uses 5min window to average number of reads from a table.
If I pay for 10 "read units" which allows for 10 highly consistent
reads per second, will I be throttled if I try to do 20 reads in a
single second, even if it was the only 20 reads that occurred in the
last hour?
Yes, this is due to the very concept of Amazon DynamoDB being fast and predictable performance with seamless scalability - the quoted FAQ is actually addressing this correctly already (i.e. you have to take operations/second literally), though the calculation is better illustrated in Provisioned Throughput in Amazon DynamoDB indeed:
A unit of Write Capacity enables you to perform one write per second
for items of up to 1KB in size. Similarly, a unit of Read Capacity
enables you to perform one strongly consistent read per second (or two
eventually consistent reads per second) of items of up to 1KB in size.
Larger items will require more capacity. You can calculate the number
of units of read and write capacity you need by estimating the number
of reads or writes you need to do per second and multiplying by the
size of your items (rounded up to the nearest KB).
Units of Capacity required for writes = Number of item writes per
second x item size (rounded up to the nearest KB)
Units of Capacity
required for reads* = Number of item reads per second x item size
(rounded up to the nearest KB) * If you use eventually consistent reads you’ll get twice the throughput in terms of reads per second.
[emphasis mine]
Getting these calculations right for real world use cases is potentially complex though, please make sure to check further details like e.g. the Provisioned Throughput Guidelines in Amazon DynamoDB as well accordingly.
My guess would be that they don't state it explicitly on purpose. It's probably liable to change/have regional differences/depend on the position of the moon and stars, or releasing the information would encourage abuse. I would do my calculations on a worst-scenario basis.
From AWS :
DynamoDB currently retains up five minutes (300 seconds) of unused read and write capacity
DynamoDB provides some flexibility in the per-partition throughput provisioning. When you are not fully utilizing a partition's throughput, DynamoDB retains a portion of your unused capacity for later bursts of throughput usage. DynamoDB currently retains up five minutes (300 seconds) of unused read and write capacity. During an occasional burst of read or write activity, these extra capacity units can be consumed very quickly—even faster than the per-second provisioned throughput capacity that you've defined for your table. However, do not design your application so that it depends on burst capacity being available at all times: DynamoDB can and does use burst capacity for background maintenance and other tasks without prior notice.
We set our 'write-limit' to 10 units/sec for one of the tables. Cloudwatch graph (see image) shows we exceeded this by one unit (11 writes/sec). I'm assuming there's a small wiggle room (<= 10%). Again , i'm just assuming ...
https://aws.amazon.com/blogs/developer/rate-limited-scans-in-amazon-dynamodb/
Using google guava library to use rateLimiter class to limit the consumed capacity is possible.