AWS Backup DynamoDB billing - amazon-web-services

I'd like to understand better the billing composition regardless of AWS Backup on DynamoDB resources since I got an unexpected increase in my billing.
I'm aware of AWS Backup billing itself thanks to the documentation, anyway, when I access the Billing service I can notice an exponential billing pricing in DynamoDB service, on the section Amazon DynamoDB USE1-TimedBackupStorage-ByteHrs the description allows me to see that I'll be paying $0.10 per GB-month of storage used for on-demand backup, showing me that I've used 14,247.295 GB-Month (This makes sense with the billing I got) but where my doubt comes from is, where does all those GB come from? The last snapshot-size just shows 175.5 GB
I've configured my backup plan with the following parameters:
{
"ruleName": "hourly-basis",
"scheduleExpression": "cron(0 * ? * * *)",
"startWindowMinutes": 60,
"completionWindowMinutes": 180,
"lifecycle": {
"toDeletedAfterDays": 30
}
}
I'm also copying snapshots into a second region on us-west-2
As you can see, I'm handling a schedule expression on an hourly basis backup because of compliance requirements. Is this enough justification for the high billing? I'm aware that backups with low RPO are commonly expensive but I just want to be sure that this billing is not higher than it should be because of any wrong Backup configuration.
Thanks in advance!

Just for the record, and for anyone who may have a similar problem, the root cause was as described by Caldazar, that handling hourly snapshots for a whole month makes you have 126,000 GB-Month if every snapshot has 175 GB.
In addition to this, at the moment of writing this answer, AWS Backup doesn't support DynamoDB incremental snapshots which is also another cause for the high billing.
Depending on your compliance requirements, you can handle snapshots for Dynamo as suggested by Ross Williams, rely on DynamoDB PITR and use AWS Backup for weekly / monthly snapshots. In my case, we use PITR alongside hourly, daily and weekly backups in case of the entire Region goes down and we cannot access the Dynamo service, again, this is more for our compliance requirements.
Hope it helps somebody!

Related

AWS Backup copy job costs when existing snapshot exists

I have been trying to contact AWS and look for information in their own knowledge articles but haven't been successful.
I'm trying to figure out how the billing works for AWS Backup.
Let's say I have a 100gb bucket and I back it up daily with a retention of 31 days in region eu-central-1.
I then also create a copy job that moves the backup to a secondary vault in the region eu-north-1.
On the 1st day I pay the full price for copying 100gb from eu-central-1 to eu-north-1.
On the 2nd day I have added 10gb of data and made some modifications to existing files.
Will my copy job on the 2nd day be billed for a transfer of 110gb to eu-north-1 or only the delta(The 10gb + changes)?
Billing in AWS is complex. Everything costs money, even things you think probably won't, probably will cost money. In this example you are likely going to be charged ingress/egress charges, backup storage charges for the services you use and all the supporting services.
Take a look at https://calculator.aws/ for an idea for what things cost. It's sometimes easier to get a rough guide from the pricing pages and this calculator, turn something on and keep a close eye on it in the early days to make sure this matches your expectation.
For finer grained control over billing, make sure you tag your resources to help you break down costs associated with your financial metrics to help to keep track better.
Not much of a specific answer, but hope that helps to get you going in the right direction.
So, to answer my own question. The copy job on the second day will only be billed for the extra data and not the full size.
I used AWS support as well as my own tests to confirm the result.

EBS storage for Amazon Elasticsearch

Im learning about AWS for a subject in the university.
About 20 days ago I started to learn about Elasticsearch because I need querys that DynamoDB can't do.
I'm trying to use only the Free Tier and I created some domains, put data through Lambda (like 100 KiB) and then deleted it.
Then I checked the Billing and I realized that 4.9GB has been used for EBS storage. The Free Tier provide 10GB per month but the problem is that I don't know how I used all that storage and if there is a way to limit it because I dont want to exceed the usage limits.
I will be grateful for any kind of explanation or advice to not exceed the limit.
I'm unaware with preventive step which can restrict your billing.
However, using Cloudwatch billing alarm, you'd be notified immediately as soon as it breaches billing threshold.
Please have look at here for detailed AWS documentation on it.

AWS Glue pricing against AWS EMR

I am doing some pricing comparison between AWS Glue against AWS EMR so as to chose between EMR & Glue.
I have considered 6 DPUs (4 vCPUs + 16 GB Memory) with ETL Job running for 10 minutes for 30 days. Expected crawler requests is assumed to be 1 million above free tier and is calculated at $1 for the 1 million additional requests.
On EMR I have considered m3.xlarge for both EC2 & EMR (pricing at $0.266 & $0.070 respectively) with 6 nodes, running for 10 minutes for 30 days.
On calculating for a month, I see that AWS Glue works out to be around $14.64, whereas for EMR it works out to be around $10.08. I have not taken into account other additional expenses such as S3, RDS, Redshift, etc. & DEV Endpoint which is optional, since my objective is to compare ETL job price benefits
Looks like EMR is cheaper when compared to AWS Glue. Is the EMR pricing correct, can someone please suggest if anything missing? I have tried the AWS price calculator for EMR, but confused, and not clear if normalized hours are billed into it.
Regards
Yuva
Yes, EMR does work out to be cheaper than Glue, and this is because Glue is meant to be serverless and fully managed by AWS, so the user doesn't have to worry about the infrastructure running behind the scenes, but EMR requires a whole lot of configuration to set up. So it's a trade off between user friendliness and cost, and for more technical users EMR can be the better option.
#user2889316 - Did you check my question wherein I had provided a comparison numbers?
Also please note Glue is roughly about 0.44 per hour / DPU for a job. I don't think you will have any AWS Glue JOB that is expected to running throughout the day? Are you talking about the Glue Dev end point or the Job?
A AWS Glue job requires a minimum of 2 DPUs to run, which means 0.88 per hour, which I think roughly about $21 per day? This is only for the GLUE job and there are additional charges such as S3, and any database / connection charges / crawler charges, etc.
Corresponding instance for EMR is m3.xlarge & its charges are (pricing at $0.266 & $0.070 respectively). This would be approximately less than $16 for 2 instance per day? plus other S3, database charges, etc. Am considering 2 EMR instances against the default DPUs for AWS Glue job.
Hope this would give you an idea.
Thanks
If your infrastructure doesn't need drastic scaling (and is mostly with fixed configuration), use EMR. But if it is needed, Glue is better choice as it is serverless. By just changing DPUs, your infrastructure is scaled. However in EMR, you have to decide on cluster type, number of nodes, auto-scaling rules. For each change, you will need to change cluster creation script, test it, deploy it - basically add overhead of standard release cycle for change. With change in infra config, you may want to change spark config to optimize jobs accordingly. So time to make new version release is higher with change in infra configuration. If you add high configuration to start, it will cost more. If you add low configuration to start, you need frequent changes in script.
Having said that, AWS Glue has fixed infra configuration for each DPU - e.g. 16GB memory per core. If your ETL demands more memory per core, you may have to shift to EMR. However, if your ETL is designed such a way that it will not exceed 11GB driver memory with 1 executor or 5.5GB with 2 executors (e.g. Take additional data volume in parallel on new core or divide volume in 5gb/11gb batch and run in for loop on same core), Glue is right choice.
If your ETL is complex and all jobs are going to keep cluster busy throughout day, I would recommend to go with EMR with dedicated devops team to manage EMR infra.
If you use Spot instance of EMR instead of On-Demand it will cost 1/3rd of on-Demand price and will turn out to be much cheaper. AWS Glue doesn't have that pricing benefits.

What are hidden costs or "NOT obvious" costs on AWS

AWS says that everything is "pay as you use". But are there any hidden costs or "NOT obvious" costs on AWS ?
Costs which generally are ignored by people and can give shock:
It is recommended that we deploy our application in Multi AZ for High availability. We assume that data transfer between these servers will be free as this is like intranet; but that is not true. There are charges ( around 10% of internet bandwidth charges ) for data transfer across AZ in same region.
Data transfer within AWS and across AWS regions is also charged.
On AWS Aurora; by default provisioned IOPS are enabled which leads to a huge bill.
If Versioning is enabled on S3; then u need to pay for all versions of every object.
These are not hidden charges but can give you a shock:
Even on other RDS; if u use provisioned IOPS it leads to a huge bill depending on usage.
I think one of the most confusing parts of AWS is the 'EC2-Other' cost category. Most of these costs are based on utilization and can get out of control quickly. I did a write up on how to break down EC2-Other here: EC2-Other Cost Breakdown

Does the AWS Billing Management Dashboard take into account Free Tier usage

About a month ago I opened an AWS account to try out Amazon's own tutorial for EC2 services, only to give up after encountering an error.
Today I accessed my account once again, only to find out three tasks have been running in the background the whole month. My Billing Management Dashboard shows a hefty total in the upper right, but in the "free usage" tier the only exceeded entry is S3 Puts, of about 10%.
I can't seem to find a soruce anywhere in the documentation explaining whether the total billing in the upper right takes into account the Free Tier or not. At the end of this month, will I be billed entirely or only the % difference? I'm more or less okay with the latter, but I can't really afford the former.
I've obviously opened a support ticket right away, but since I'm on the basic plan I'm afraid they might answer me after the current bill becomes active.
Thank you for any answers.
You will be billed only for the % difference.
All services that offer a free tier have limits on what you can use without being charged. Many services have multiple types of limits. For example, Amazon EC2 has limits on both the type of instance you can use, and how many hours you can use in one month. Amazon S3 has a limit on how much memory you can use, and also on how often you can call certain operations each month. For example, the free tier covers the first 20,000 times you retrieve a file from Amazon S3, but you are charged for additional file retrievals. Each service has limits that are unique to that service.
Source: http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/free-tier-limits.html