What is the formula or algorithm used to figure out what my data transfer rate - cbperipheral

I'm currently learning the fundamentals of Technical Supports and one thing has got me confused. I understand 1 byte is equals to 8 bits and to transfer a 1MB file in a second, I need to have an 8Mb per second transfer rate. So my question is how do I figure out what my data transfer rate would be if I'm to transfer a 40MB file? I know its 320Mb per second but I want to know the formula to get to this conclusion.

If you have a 40MB file (equivalent to 40 x 8= 320Mb), the transfer rate is 40MB/period, where the period is the time needed to do the transfer.
For example, if the period is 4 seconds then 40MB/4 = 10MB/s that is equivalent to 10MB/s x 8 = 80Mb/s

Related

RRD database does not store historical data

I don't think it's a bug but it's tough to find the correct answer on the Internet to understand what's happening. So I create an RRD(1minute step) database with 3 RRAs:
RRA:AVERAGE:0.5:1m:1d
RRA:AVERAGE:0.5:1h:6M
RRA:AVERAGE:0.5:1d:1y
So I assume when I update the data point I should have the capability to save 1-year data. However, I can see 24 hours data only whenever long I emit the data points to the RRD database.
This is the rrdtool info output from one RRD database I created: https://gist.github.com/meow-watermelon/206a10a83c937c771f6cfc5fa7a2e948
Is there anything I missed or any unknown corner cases that I hit which caused only 24 hours data is shown?
Thanks.
The RRA consolodated data points (cdp) are only written to the RRA when there are sufficient to make one. Thus, with a 1-minute interval, and an xff of 0.5, you would need to be collecting data every minute for more than 12 hours (plus 1 minute!) to make up a full cdp.
In addition, the cdp update on boundaries relative to UCT; this means that for your largest 1d size RRA, you would need to have at least 12 hours of data collected in the 24 hours prior to 00:00 UCT, and then the next update would write the cdp.
This means that you should collect data at the standard interval (60s) for more than 24 hours before you can be certain of seeing your cdp appear in the largest-granularity RRA; the best test is to collect data every minute for 48 hours and then check your 1d-granularity RRA

Do multiple smaller SQL queries result in more IOPs & more cost using Amazon Aurora?

Amazon Aurora pricing page mentions that:
For I/O charges, let’s assume the same database reads 100 data pages
from storage per second to satisfy the queries running on it. This
would result in 262.8 million read I/Os per month (100 pages per
second x 730 hours x 60 minutes x 60 seconds).
What is meant by "data pages" here?
Similarly, let’s assume your application makes changes to the database affecting an average of 10 data pages per second. Aurora will charge one I/O operation for up to 4 KB of changes on each data page. If the volume of data changed per page is less than 4 KB, this would result in 10 write I/Os per second.
Do multiple smaller SQL queries result in more IOPs than a single large SQL query?
What is meant by "data pages" here?
Each database page is 16 KB for MySQL-compatible Aurora & 8 KB for PostgreSQL-compatible Aurora (source: Amazon Aurora FAQs).
Do multiple smaller SQL queries result in more IOPs than a single large SQL query?
Not necessarily but it is possible.
The key here is that you optimise your queries to read/write only as much as you need and to not split un-necessarily.
Too many small writes less than 4KB will mean that you will pay more in the long run for no reason & you'll be better off making changes of at least 4KB or more to get the most 'bang for your buck'.
Example
Let's say we want to write 22KB of data to the database.
If done in one query, you would be charged for 6 I/O operations.
-> 22KB / 4KB = 5, remainder 2
-> 5 I/O operations with an extra 1 I/O op. (to account for the remaining 2KB left over)
If done in 5 different queries, split by you, you would also be charged for only 6 I/O operations (hence why I said not necessarily).
However, if done in queries split so that each query is less than 4KB, you would then be paying more than needed as you would be consuming more I/O operations.
e.g. if your queries each write only 2KB to the database, you would theoretically1be charged for 11 I/O operations, which would be an extra 5 I/O operations.
-> 22KB / 2KB = 11 I/O operations
1 As the documentation mentions, the number of write operations can potentially be less but that is subject to internal Aurora write I/O optimizations that can combine write operations less than 4 KB in size together under certain circumstances. In other words, they may combine operations together for you but it is not guaranteed.

AWS CloudWatch interpreting insights graph -- how many read/write IOs will be billed?

Introduction
We are trying to "measure" the cost of usage of a specific use case on one of our Aurora DBs that is not used very often (we use it for staging).
Yesterday at 18:18 hrs. UTC we issued some representative queries to it and today we were examining the resulting graphs via Amazon CloudWatch Insights.
Since we are being billed USD 0.22 per million read/write IOs, we need to know how many of those there were during our little experiment yesterday.
A complicating factor is that in the cost explorer it is not possible to group the final billed costs for read/write IOs per DB instance! Therefore, the only thing we can think of to estimate the cost is from the read/write volume IO graphs on CLoudwatch Insights.
So we went to the CloudWatch Insights and selected the graphs for read/write IOs. Then we selected the period of time in which we did our experiment. Finaly, we examined the graphs with different options: "Number" and "Lines".
Graph with "number"
This shows us the picture below suggesting a total billable IO count of 266+510=776. Since we have choosen the "Sum" metric, this we assume would indicate a cost of about USD 0.00017 in total.
Graph with "lines"
However, if we choose the "Lines" option, then we see another picture, with 5 points on the line. The first and last around 500 (for read IOs) and the last one at approx. 750. Suggesting a total of 5000 read/write IOs.
Our question
We are not really sure which interpretation to go with and the difference is significant.
So our question is now: How much did our little experiment cost us and, equivalently, how to interpret these graphs?
Edit:
Using 5 minute intervals (as suggested in the comments) we get (see below) a horizontal line with points at 255 (read IOs) for a whole hour around the time we did our experiment. But the experiment took less than 1 minute at 19:18 (UTC).
Wil the (read) billing be for 12 * 255 IOs or 255 ... (or something else altogether)?
Note: This question triggered another follow-up question created here: AWS CloudWatch insights graph — read volume IOs are up much longer than actual reading
From Aurora RDS documentation
VolumeReadIOPs
The number of billed read I/O operations from a cluster volume within
a 5-minute interval.
Billed read operations are calculated at the cluster volume level,
aggregated from all instances in the Aurora DB cluster, and then
reported at 5-minute intervals. The value is calculated by taking the
value of the Read operations metric over a 5-minute period. You can
determine the amount of billed read operations per second by taking
the value of the Billed read operations metric and dividing by 300
seconds. For example, if the Billed read operations returns 13,686,
then the billed read operations per second is 45 (13,686 / 300 =
45.62).
You accrue billed read operations for queries that request database
pages that aren't in the buffer cache and must be loaded from storage.
You might see spikes in billed read operations as query results are
read from storage and then loaded into the buffer cache.
Imagine AWS report these data each 5 minutes
[100,150,200,70,140,10]
And you used the Sum of 15 minutes statistic like what you had on the image
F̶i̶r̶s̶t̶,̶ ̶t̶h̶e̶ ̶"̶n̶u̶m̶b̶e̶r̶"̶ ̶v̶i̶s̶u̶a̶l̶i̶z̶a̶t̶i̶o̶n̶ ̶r̶e̶p̶r̶e̶s̶e̶n̶t̶ ̶o̶n̶l̶y̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶a̶g̶g̶r̶e̶g̶a̶t̶e̶d̶ ̶g̶r̶o̶u̶p̶.̶ ̶I̶n̶ ̶y̶o̶u̶r̶ ̶c̶a̶s̶e̶ ̶o̶f̶ ̶1̶5̶ ̶m̶i̶n̶u̶t̶e̶s̶ ̶a̶g̶g̶r̶e̶g̶a̶t̶i̶o̶n̶,̶ ̶i̶t̶ ̶w̶o̶u̶l̶d̶ ̶b̶e̶ ̶(̶7̶0̶+̶1̶4̶0̶+̶1̶0̶)̶
Edit: First, the "number" visualization represent the whole selected duration, aggregated with would be the total of (100+150+200+70+140+10)
The "line" visualization will represent all the aggregated groups. which would in this case be 2 points (100+150+200) and (70+140+10)
It can be a little bit hard to understand at first if you are not used to data points and aggregations. So I suggest that you set your "line" chart to Sum of 5 minutes you will need to get value of each points and devide by 300 as suggested by the doc then sum them all
Added images for easier visualization

DynamoDB read capacity metric explanation

from this question DynamoDB read/write capacity explanation someone answered that each query of dynamoDB would take 3 read capacity.
However, after viewing the metrics I got this:
The latest point shows 0.3333333
However, I used 2 GetItem in a single script. So is there any explanation for this? Shouldn't it be 2 read capacity?
Thanks! I'm new to DynamoDB and the read/write capacity can be confusing :(
What you are looking at it averaged over 1 minute so that is a capacity of 60 reads per minute for 1 read unit.
If you only run one test of 2 reads it will smear out to a small number. You need to run over a longer period to get a true measure of you read requirements.

How is Amazon DynamoDB throughput calculated and limited?

Is it averaged per second? Per minute? Per hour?
For example.. if I pay for 10 "read units" which allows for 10 highly consistent reads per second, will I be throttled if I try to do 20 reads in a single second, even if it was the only 20 reads that occurred in the last hour? The Amazon documentation and FAQ do not answer this critical question anywhere that I could find.
The only related response I could find in the FAQ completely ignores the issue of how usage is calculated and when throttling may happen:
Q: What happens if my application performs more reads or writes than
my provisioned capacity?
A: If your application performs more
reads/second or writes/second than your table’s provisioned throughput
capacity allows, requests above your provisioned capacity will be
throttled and you will receive 400 error codes. For instance, if you
had asked for 1,000 write capacity units and try to do 1,500
writes/second of 1 KB items, DynamoDB will only allow 1,000
writes/second to go through and you will receive error code 400 on
your extra requests. You should use CloudWatch to monitor your request
rate to ensure that you always have enough provisioned throughput to
achieve the request rate that you need.
It appears that they track writes in a five minute window and will throttle you when your average over the last five minutes exceeds your provisioned throughput.
I did some testing. I created a test table with throughput of 1 write/second. If I don't write to it for a while and then send a stream of requests, Amazon seems to accept about 300 before it starts throttling.
The caveat, of course, is that this is not stated in any official Amazon documentation and could change at any time.
The DynamoDB provides 'Burst Capacity' which allows for spikes in amount of data read from table. You can read more about it under: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Bursting
Basically it's what #abjennings noticed - It uses 5min window to average number of reads from a table.
If I pay for 10 "read units" which allows for 10 highly consistent
reads per second, will I be throttled if I try to do 20 reads in a
single second, even if it was the only 20 reads that occurred in the
last hour?
Yes, this is due to the very concept of Amazon DynamoDB being fast and predictable performance with seamless scalability - the quoted FAQ is actually addressing this correctly already (i.e. you have to take operations/second literally), though the calculation is better illustrated in Provisioned Throughput in Amazon DynamoDB indeed:
A unit of Write Capacity enables you to perform one write per second
for items of up to 1KB in size. Similarly, a unit of Read Capacity
enables you to perform one strongly consistent read per second (or two
eventually consistent reads per second) of items of up to 1KB in size.
Larger items will require more capacity. You can calculate the number
of units of read and write capacity you need by estimating the number
of reads or writes you need to do per second and multiplying by the
size of your items (rounded up to the nearest KB).
Units of Capacity required for writes = Number of item writes per
second x item size (rounded up to the nearest KB)
Units of Capacity
required for reads* = Number of item reads per second x item size
(rounded up to the nearest KB) * If you use eventually consistent reads you’ll get twice the throughput in terms of reads per second.
[emphasis mine]
Getting these calculations right for real world use cases is potentially complex though, please make sure to check further details like e.g. the Provisioned Throughput Guidelines in Amazon DynamoDB as well accordingly.
My guess would be that they don't state it explicitly on purpose. It's probably liable to change/have regional differences/depend on the position of the moon and stars, or releasing the information would encourage abuse. I would do my calculations on a worst-scenario basis.
From AWS :
DynamoDB currently retains up five minutes (300 seconds) of unused read and write capacity
DynamoDB provides some flexibility in the per-partition throughput provisioning. When you are not fully utilizing a partition's throughput, DynamoDB retains a portion of your unused capacity for later bursts of throughput usage. DynamoDB currently retains up five minutes (300 seconds) of unused read and write capacity. During an occasional burst of read or write activity, these extra capacity units can be consumed very quickly—even faster than the per-second provisioned throughput capacity that you've defined for your table. However, do not design your application so that it depends on burst capacity being available at all times: DynamoDB can and does use burst capacity for background maintenance and other tasks without prior notice.
We set our 'write-limit' to 10 units/sec for one of the tables. Cloudwatch graph (see image) shows we exceeded this by one unit (11 writes/sec). I'm assuming there's a small wiggle room (<= 10%). Again , i'm just assuming ...
https://aws.amazon.com/blogs/developer/rate-limited-scans-in-amazon-dynamodb/
Using google guava library to use rateLimiter class to limit the consumed capacity is possible.