Suggestion regarding Cost reduction on Using Dynamodb - amazon-web-services

I'm newbie to dynamodb, I have just 10 items and 1 global secondary Index with Read/write capacity units 5(minimum) which cost around $6/2days which is unacceptable for me because I have used only 0.01% of my actual needs I have gone through some documentation in aws dynamodb price reduction non helped me, because AWS recommend to avoid sudden spike read using query or scan, which is impossible to get more than one item with Partition key alone.
Tables as follows
Add_Employee
Add_Stocks
Add_vendor
All above table have read/write capacity units 1 and each have one global secondary index read/write capacity units 1. All tables are config within specified region Asia Pacific (Mumbai)
Here is my billing for reference
$0.00 per hour for 25 units of read capacity for a month (free tier)18,600 ReadCapacityUnit-Hrs $0.00
$0.00 per hour for 25 units of write capacity for a month (free tier)18,600 WriteCapacityUnit-Hrs $0.00
$0.000148 per hour for units of read capacity beyond the free tier 6,723 ReadCapacityUnit-Hrs $1.00
$0.00074 per hour for units of write capacity beyond the free tier 6,723 WriteCapacityUnit-Hrs $4.98
Thanks in advance

You're not just paying for actual throughput, you're paying for provisioned throughput.
Looking at The Dynamo cost page, this means you are paying $0.0065 per throughput-hour each table exists per month, minus the free-tier hours.
Based on your table names, I'm guessing you are not following the best practice of using 1 de-normalized table table for everything. You may be better off using an RDS instance, which will not charge by the table, but by the hour (it's an EC2 instance behind the scenes).
Cost Breakdown
The default is 5 provisioned read/write units, and there are 720 hours in a 30-day month
$0.0065 * 5 * 720 = $24.37 a month per table
The free tier generally allows one table for free a month.
Per AWS docs you must have at least 1 provisioned unit.
How to Save
Make sure you're following the best practice of using 1 de-normalized table
For any dev work, make sure both read and write provisions are set to 1 ($0.0065 * 1 * 720 = $4.68 a month per table)
If you know you're going to be away for a while, remove the stack from AWS. You're only charged while the table(s) exists.
By limiting read/write units you should be able to bring the cost down to ~$5.00 a table per dev.
DO NOT TURN ON AUTO-SCALING
A commenter suggested auto-scaling. Per docs, you'll be charged for at least 5 units, which is what you are paying now.
This AWS forum link is about the same thing.

Related

Read/Write Capacity Unit in AWS DynamoDB - Price per hour

I have recently explored AWS DynamoDB and was reading about Read/Write Capacity Units. I understood that they are as under:
WCU: A write capacity unit represents one write per second, for
an item up to 1 KB in size.
RCU: A read capacity unit represents
one strongly consistent read per second, or two eventually consistent
reads per second, for an item up to 4 KB in size.
So, my table has 1 WCU and 1 RCU. This means I would be charged for every read and every write I do to my table. Good so far. Plus I would be charged additionally for data storage.
However, when I look at this link (scroll down to DynamoDB detailed feature pricing/Read and Write Requests), it shows me WCU and RCU in Price per hour which is $0.00065 per WCU or $0.00013 per RCU.
What is the meaning or Price per hour?
Would I be wrong to assume $0.00065 per WCU or $0.00013 per RCU and ignore the hour part completely? Meaning, per write would cost me $0.00065 and per read would cost me $0.00013.
There are two pricing models for DynamoDB: Provisioned and On-Demand.
Provisioned capacity means that you will be charged each hour for the provisioned unit whether you consume it or not -- and your requests will be throttled if you go over provisioned capacity in a given second. This model allows for considerable savings when you have stable demand and you have a method within your solution to deal with throttles gracefully.
If you are in the dev mode, or your application has unpredictable peaks, you should consider On-Demand mode where you will be billed a flat rate per request irrespectively of the rate of requests.
Hope this helps.
Would I be wrong to assume $0.00065 per WCU or $0.00013 per RCU and ignore the hour part completely? Meaning, per write would cost me $0.00065 and per read would cost me $0.00013.
Yes, for this configuration you would be charged as follows:
Writes
$0.00065 * 730 = $0.4745 per month
Reads
$0.00013 * 730 = $0.0949 per month
In provisioned mode you pay for the capacity regardless of how many requests you make. Having 1WCU and 1RCU will allow you to write 1 * 1KB items per second and read 1 * 4KB per second. Moreover, DynamoDB free tier allows you 25WCU and 25RCU per region per month. You can apply this capacity to a single table or multiple tables as you wish. This is enough capacity to provide 200M requests per month.
In contrast, On-demand mode allows you to pay per request, however it's not as cost effective as provisioned mode. It simplifies using DynamoDB without needing to think about setting capacity or auto-scaling. It does not have a free tier.
DynamoDB free tier also provides 25GB of storage per month per region.
Have a look over the Docs… it's filled with valuable information to help you understand the difference in capacity modes.

Why Is Dynamodb sets read/write capacity for the on-demand table?

I created an on-demand DynamoDb table, and as I know that Dynamodb automatically scales the write/read capacity on on-demand mode.
But AWS Glue job gives error as "An error occurred while calling o201.pyWriteDynamicFrame. DynamoDB write exceeds max retry 10" because of the write capacity. How is this possible if the table is on on-demand mode? I didn't set any read/write capacity and and the table isn't even on the provisioned mode.
Dynamodb Table:
AWS Glue job output:
Dynamodb Tabble Throttled:
Thanks.
Here is what you need to know about On-Demand mode tables
On Demand
If you recently switched an existing table to on-demand capacity mode for the first time, or if you created a new table with on-demand capacity mode enabled, the table has the following previous peak settings, even though the table has not served traffic previously using on-demand capacity mode:
Following are examples of possible scenarios.
A provisioned table configured as 100 WCU and 100 RCU. When this table is switched to on-demand for the first time, DynamoDB will ensure it is scaled out to instantly sustain at least 4,000 write units/sec and 12,000 read units/sec.
A provisioned table configured as 8,000 WCU and 24,000 RCU. When this table is switched to on-demand, it will continue to be able to sustain at least 8,000 write units/sec and 24,000 read units/sec at any time.
A provisioned table configured with 8,000 WCU and 24,000 RCU, that consumed 6,000 write units/sec and 18,000 read units/sec for a sustained period. When this table is switched to on-demand, it will continue to be able to sustain at least 8,000 write units/sec and 24,000 read units/sec. The previous traffic may further allow the table to sustain much higher levels of traffic without throttling.
A table previously provisioned with 10,000 WCU and 10,000 RCU, but currently provisioned with 10 RCU and 10 WCU. When this table is switched to on-demand, it will be able to sustain at least 10,000 write units/sec and 10,000 read units/sec.
Important
If you need more than double your previous peak on table, DynamoDB automatically allocates more capacity as your traffic volume increases to help ensure that your workload does not experience throttling. However, throttling can occur if you exceed double your previous peak within 30 minutes. For example, if your application’s traffic pattern varies between 25,000 and 50,000 strongly consistent reads per second where 50,000 reads per second is the previously reached traffic peak, DynamoDB recommends spacing your traffic growth over at least 30 minutes before driving more than 100,000 reads per second.
Above information is directly from AWS Docs src
Glue Workers
Now, when you begin to write using AWS Glue, you will very quickly exceed the 4000 WCU limit, which means you have exceeded the rule which is double your previous peak (4000) within 30 minutes.... So what now??
Pre-warming your table
DynamoDB provides you capacity in the form of partitions, where each partition is capable of providing you 1000 WCU and 3000 RCU. DynamoDB only ever scales partitions out, never merging in.
For that reason, we can "pre-warm" our DynamoDB tables by creating them in Provisioned-mode and allocating our peak WCU. For eg. let's imagine we expect Glue to consume 40,000 WCU, then we will be sure our table can handle that following these steps:
Create table in provisioned mode
No Autoscaling
40,000 WCU
40,000 RCU
When table is marked as Active (1-2 mins)
Switch capacity mode to On-Demand
Now, you have a new DynamoDB table in On-Demand which is capable of providing 40,000 WCU out of the gates, not the 4,000 WCU provided by default. This will eliminate throttling from Glue.
DynamoDB sets read/write capacity for its on-demand tables in order to balance performance and cost. The read/write capacity units determine the rate at which DynamoDB can read and write data to the table, with a larger number of units allowing for a higher rate of read/write operations. By setting these values, users can control the performance of their DynamoDB table and ensure that it meets the demands of their application. Additionally, setting the capacity units helps DynamoDB automatically manage the distribution of data and traffic, ensuring low latency and high reliability.

AWS DynamoDB - Free tier confusion

According to this page, the DynamoDB is always free for 25 RCU and 25 WCU with 25GB of storage.
However, in the capacity tab of a table, it shows me an estimate cost for 10 RCU and 10 WCU to be $5.81 / month.
Will I be charged or not charged for this amount?
The estimation that you see within the DynamoDB page is not directly related to the billing calculation, therefore it will not take free tier into account. It is a simple calculator that calculates the AWS charge based on the configuration that you provided for DynamoDB.
Free tier calculations and deductions are applied at time of billing, as long as you are equal to or less than the usage for a free tier service you will not be billed for it. If you exceed this you will either be fully charged (in the case of EC2) or will pay the difference (as is the case in DynamoDB).
In DynamoDBs case this an accumulative deduction across all regions and tables, and if your account if part of an organization across all billed accounts under the organization.

For Dynamodb DAX are the requests charged even when there is a cache hit i.e item is fetched from dax cache

Lets suppose I have a dynamodb table with 10 frequently accessed items of around 8KB each.
I decided to use dax infront of the table.
I got total 1 million read requests for the items.
a. Will I be charged for 10 dynamodb requests, since only 10 requests made it to dynamodb and rest were fetched from dax cache itself,
or
b. will I be charged for all 1 million dynamodb requests.
I had a similar question, and asked AWS. The answer I received was:
Whenever DAX has the item available (a cache hit), DAX returns the
item to the application without accessing DynamoDB. In that case, the
request will not consume read capacity units (RCUs) from DynamoDB
table and hence there will not be any DynamoDB cost for that
request. Therefore, if you have 10k requests and out of that if
only 2k requests goes to DynamoDB, the total charge which gets charged
will be for the 2k read request charge for DynamoDB, running cost for
DAX cluster and data transfer charges (if applicable).
DynamoDB charges for DAX capacity by the hour and your DAX instances
run with no long-term commitments. Pricing is per node-hour consumed
and is dependent on the instance type you select. Each partial
node-hour consumed is billed as a full hour. Pricing applies to all
individual nodes in the DAX cluster. For example, if you have a
three-node DAX cluster, you are billed for each of the separate nodes
(three nodes in total) on an hourly basis.
https://aws.amazon.com/dynamodb/pricing/on-demand/

Are S3 storage costs based on total current usage or on total volume ingested

Suppose I have a script which uploads a 100GB object every day to my S3 bucket. This same script will delete any file older than 1 week from the bucket. How much will I be charged at the end of the month?
Let's use pricing from the us-west-2 region. Suppose this is a 30-day month and I start with no data in the bucket at the beginning of the month.
If charged for maximum bucket volume per month, I would have 700GB at the end of the month and be charged $0.023 * 7 * 100 = $16.10. Also some money for my PUT requests ($0.005 per 1,000 requests so effectively 0).
If charged for total amount of data that had transited through the bucket over the course of that month, I would be charged $0.023 * 30 * 100 = $69. (again +effectively $0 for PUT requests)
I'm not clear on which of these two cases Amazon bills. This becomes very important for me, since I expect to have a high amount of churn in my bucket.
Both of your calculations are incorrect, although the first one comes close to the right answer, for the wrong reason. It is neither peak nor end-of-month that matters.
The charge for storage is calculated hourly. For all practical purposes, this is the same as saying that you are billed for your average storage over the course of a month -- not your maximum, and not the amount you uploaded.
Storing 30 GB for 30 days or storing 900 GB for 1 day would cost the same amount, $0.69.
The volume of storage billed in a month is based on the average storage used throughout the month. This includes all object data and metadata stored in buckets that you created under your AWS account. We measure your storage usage in “TimedStorage-ByteHrs,” which are added up at the end of the month to generate your monthly charges.
https://aws.amazon.com/s3/faqs/#billing
This is true for STANDARD storage.
STANDARD_IA and GLACIER are also billed hourly, but there is a notable penalty for early deletion: Each object stored in these classes has a minimum billable lifetime of 30 days in IA or 90 days in Glacier, no matter when you delete it. Both of these alternate storage classes are only appropriate for data you do not intend to delete soon or retrieve often, by design.
REDUCED_REDUNDANCY storage follows the same rules as STANDARD (hourly billing, no early delete penalty) but after the most recent round of price decreases, it is now only less expensive than STANDARD in regions with higher costs. It is an older offering that is no longer competitively priced in regions where STANDARD pricing is lowest.
Your bill will for storage will be closer to your #1 example, perhaps a bit higher because for brief amounts of time, while uploading the 8th day, you still have 7 days of storage accruing charges, but you won't be charged anywhere near your #2 example.
Firstly, you don't need a script to delete files older than 1 week. You can set a transition cycle on the bucket which will automatically do that; or might be transfer contents to Glacier ( with 10% cost ) if you might need them later.
Secondly, storage cost might not be huge.. Probably better idea would be to that script first deletes data from S3 ( if u want script to do that ) and then you add more data.. so that your bucket overall never have more data and you are always charged on consistent storage basis.
Thirdly, your main charges could be bandwidth charges (if not handled well) which can be really huge as you are transferring so much data. If all this data is generated internally from your grid then make sure u create VPC endpoint to your S3 so that you don't pay "bandwidth charges" as then this data transfer will be considered to be transferred on intranet.