Cost breakdown for a Cloud Data Lake Implementation - amazon-web-services

We have a client in need of a data lake on the cloud.
We need to provide the client the chance to breakdown costs between their areas in just one AWS Account.
We are talking about query and data transfer costs also.

You can achieve this, at least most of it, with so calle dcost allocation tags:
https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html
It is a way of assigning tags to resources that allows you to split up costs later on in the Cost explorer, allowing you to see a projects or "areas" share of the overall cost.
But not everything can be tagged yet, so there might be "blind spots" where you do not have data. You could split up the remaining difference evenly between all areas.

Related

Which one is better among DynamoDB and AWS ElasticSearchService for querying and storing logs?

I'm building a GUI tool for querying logs and looking for a cheaper option. DDB will fetch logs from an S3 bucket using lambda whereas ES will get the same logs streamed from CloudWatch. The thing is my queries are gonna be simple, not complex ones so I'm inclining towards DDB. Any inputs will be appreciated.
If you have fixed access patterns that can be queried using the partition key and sort key, staying within the limits of querying on a sort key, then DynamoDB is certainly a very good option. There are other factors, like size of the data, and number of records in a partition.
If you can do most of the filtering with the above, but need to further reduce the data based on values outside the key you still can use DynamoDB, but your milage may vary on how good it is. It becomes very dependent on data size and filtering complexity.
There is certainly a point where the complexity of the queries goes beyond what DynamoDB is designed for. At that point ES is often a good answer. Keep in mind that ES isn't a fully managed service, and it's a paid for the time it's running, regardless of use. I tend to try to avoid these types of services when I can, but if cost is not a significant factor for you, and you feel comfortable managing the ES cluster, then ES is a great option for advanced querying.

bigstore increasing almost linearly Google Cloud

I use many api's from Google Cloud. Recently I noticed that the bigstore is gradually increasing on a daily basis. I am worried that if this continues I wont be able to pay the bill.
I do not know however how to check where this increase is coming from. Is there a way to see which cloud functions are causing this increased traffic?
The reason I am surprised about the increase in the traffic of bigstore is because I have cron jobs that are running multiple times per day to store the data in BigQuery. I have not changed these settings, so I would assume that this traffic should not increase as shown on the chart.
One other explanation I can think of is that the amount of data that I am storing has increased, which is indeed true on a daily basis. But why does this increase the traffic?
What is the way to check this?
There are two main data sources you should use:
GCP-wide billing export. This will tell you an exact breakdown of your costs. This is important to make sure you target your effort where the cost is largest to you. It also provides some level of detail about what the usage is.
Enable access & storage logging. The access log will give you an exact accounting of incoming requests down to the number of bytes transferred. The storage logs give you similar granularity into the cost of storage itself.
In addition, if you have a snapshot of your bigstore, as time goes on and you replace or even rename files, your storage charges will increase because where once you had 2 views of the same storage, as the files change each file forks in 2 copies (one is the current view of your storage, one is the snapshot.)

Estimate AWS cost

The company which I work right now planning to use AWS to host a new website for a client. Their old website had roughly 75,000 sessions and 250,000 page views per year. We haven't used AWS before and I need to give a rough cost estimate to my project manager.
This new website is going to be mostly content-driven with a cms backend (probably WordPress) + a cost calculator for their services. Can anyone give me a rough idea about the cost to host such kind of a website in aws?
I have used simple monthly calculator with a single Linux t2.small 3 Year upfront which gave me around 470$.
(forgive my English)
The only way to know the cost is to know the actual services you will consume (Amazon EC2, Amazon EBS, database, etc). It is not possible to give an accurate "guess" of these requirements because it really does depend upon the application and usage patterns.
It is normally recommended that you implement the system and run it for a while before committing to Reserved Instances so that you have a chance to measure performance and test a few different instance types.
Be careful using T2 instances for production workloads. They are very powerful instances, but if the CPU Credits run out, the amount of CPU is limited.
Bottom line: Implement, measure, test. Then you'll know what is right for your needs.
Take Note
When you are new in AWS you have a 1 year free tier on a single t2.micro
Just pulled it out, looking into your requirement you may not need this
One load balancer and App server should be fine (Just use route53 to serve some static pages from s3 while upgrading or scalling )
Use of email subscription and processing of Some document can be handled with AWS Lambda, SNS and SWQ which may further reduce the cost ( you may reduce the server size and do all the hevay lifting from Lambda)
A simple webpage with 3000 request/monthly can be handled by T2 micro which is almost free for one year as mentioned above in the note
You don't have a lot of details in your question. AWS has a wide variety of services that you could be using in that scenario. To accurately estimate costs, you should gather these details:
What will the AWS storage be used for? A database, applications, file storage?
How big will the objects be? Each type of storage has different limits on individual file size, estimate your largest object size.
How long will you store these objects? This will help you determine static, persistent or container storage.
What is the total size of the storage you need? Again, different products have different limits.
How often do you need to do backup snapshots? Where will you store them?
Every cloud vendor has a detailed calculator to help you determine costs. However, to use them effectively you need to have all of these questions answered and you need to understand what each product is used for. If you would like to get a quick estimate of costs, you can use this calculator by NetApp.

Does AWS S3 offer any kind of rate limiting or protection against abuse for publicly accessible files?

I have a web app which serves media files (in other words pretty large) with public access. The files are hosted on S3. I'm wondering if AWS offers any kind of abuse-protection, for example detection or prevention against download hogs via some type of rate limiting. A scenario might be a single source re-downloading the same content repeatedly. I was hoping there might be some mechanism to detect that behavior and either take preventative action or notify me.
I'm looking at AWS docs and don't see anything but perhaps I'm not looking smartly enough.
How do folks who host files which are available publicly handle this?
S3 is mostly a file storage service, with elementary web server capabilities. I would highly recommend you place a CDN between your end users and S3. A good CDN will provide protection from the sort of abuse you are talking about, while also serving the files to the user more quickly.
If you are mostly worried about how the abuse will affect your bills (and they can get very large so its good to be concerned about this), I would suggest that you put in some billing alerts on your account that alarm when certain thresholds are reached.
I have a step alarms set on my account so that I know when it hits 25%, 50%, 75% and 100% of what I budget each month. That way, for example, if I hit an alarm that tells me I have used 25% of my budget in the first two days of the month, I know I better look into it.

Is there a way to realisticlly model or estimate AWS usage?

This question is specifically for aws and s3 but it could be for other cloud services as well
Amazon charges for s3 by storage (which is easily estimated with the amount of data stored times the price)
But also charges for requests which is really hard to estimate.. a page that has one image stored in s3 technically gets 1 request per user per visit, but using cache it reduces it. Further more, how can I understand the costs with 1000 users?
Are there tools that will extrapolate data of the current usage to give me estimates?
As you mention, its depending on a lot of different factors. Calculating the cost per GB is not that hard, but estimating the amount of requests is a lot more difficult.
There are no tools that I know of that will calculate the AWS S3 costs based on historic access logs or the like. These calculations would also not be that accurate.
What you can best do is calculate the costs based on the worst case scenario. In this calculation, you assume that nothing will be cached and will assume you will get peak requests all the time. In 99% of the cases, the outcome of that calculation will be lower than what will happen in reality.
If the outcome of that calculation is acceptable pricing wise, you're good to go. If it is way more than your budget allows, then you should think about various ways you could lower these costs (caching being one of them).
Cost calculation beforehand is purely to indicate if the project or environment could realistically stay below budget. Its not meant to provide a 100% accurate estimate beforehand. Most important thing to do is to keep track of the costs after everything has been deployed. Setup billing/budget alerts and check for possible savings.
The AWS pricing calculator should help you get started: https://calculator.aws/
Besides using the calculator, I tend to prefer the actual pricing pages of each individual service and calculate it within a spreadsheet. This gives me a more in-depth overview of the actual costs.