Amazon cloude front and unexpected billing amount - amazon-web-services

I did some experiment with amazon cloud front and it seems that i done some basic things wrong as the total expenses are beyond my expectation and calculation.
I configured S3 bucket and upload all my content to this bucket and configured Amazon cloude front to pick data from here, while for some other files (only css and js) , i took original pull approach.
Avg traffic for my blog was around 100-200 Unique visit per day and which cost me around USD 1-2 per month.
I am hosting some contest this month and traffic has increased significant to 600-700 unique visit per day.
I checked Google analytic as well State counter
StateCounter
Page Loads (Total): 22,728
Google analytic :25,935 (Page view)
But Amazon showing me a bill of around 150+ USD with following information
$0.0075 per 10,000 HTTP Requests 179,211,964 Requests (For US zone)
i am not sure, where i did wrong.
Can anyone point or guide me to resource where i can learn about configuring it in proper way :(

You didn't mention looking at the Cloufront logs.
It would seem like the first step would be enabling the Cloudfront logs... then looking at the logs to identify the source of all of that traffic.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html
No matter what you're doing, 179,211,962 requests in one month works out to ~69 requests per second every second for the entire month (if my math is good), which does seem dramatically out of proportion to your page loads and views.
You can also monitor your estimated charges on a continuous basis with Cloudwatch to avoid future surprises.
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/monitor_estimated_charges_with_cloudwatch.html

Related

Why am I billed for hourly AWS workspace when it's stopped?

I configured two hourly performance AWS workspaces about 2 months ago. The fee for each is 9.75/mo + .47/hour.
I used each maybe only 3 hours each so I would expect a bill of about $22.32 ((9.75 x 2) + (.47 x 6))but my bill was over $70 (which equals about 100 hours). I reached out to support and this is what they concluded:
As per checking with the Service Team, they have advised that WorkSpaces are billed on a monthly basis, and you pay only for the WorkSpaces you launch that allow end-users to access the documents, applications and resources they need with the device of their choice, including laptops, iPad, Kindle Fire, or Android tablets. So even if the service is on a stop mode, as long as users keep on accessing to the documents, desktops or even domains you have the WorkSpace associated to, will incur in charges.
I am the only user and I didn't interact with the stopped workspaces. I don't have any other AWS services interacting with these workspaces. I don't even understand how users could access "documents, desktops or even domains you have the WorkSpace associated to" if the workspace is stopped.
I have trouble drilling down to the necessary level of detail using the AWS billing dashboard - so I just feel like I have a blindspot here. Why am I getting billed so much? How can I get more details about these Workspace charges?
AWS Support actually called me. They were a big help in demystifying the charges. The short answer is that I'm a dummy. But I wanted to provide an explanation, info and links for others who want to get more details about their own usage and bill.
AWS has a few other helpful ways to get more info. The first was the bill itself (From your billing home page click on 'Bills' in the top left). The first thing I learned was that (1) my bill was $50 not $70. I might have combined my Jan and Feb bill or thought their 'estimate' was the bill. Either way - my baseline was wrong. (2) I also had an RDS instance running which accounted for $16. (3) Finally I could see an exact breakdown of workspace charges. There was the base monthly charge of 9.75. Then there was the .47 hourly charge for 22 hours which accounted for 10.34. The charges we're adding up - but the hours seemed too high.
This was great but I asked if there was a way to see when I used those 22 hours because that was still more than I had recorded myself. He directed me to the cost explorer. On the cost explorer specifically there was a histogram with a button on the top right to "View in Cost Explorer".
There I was able to view how much I was billed per day. Using the group by options on the top we grouped by 'Service' to see this
This showed that nearly all of those hours were on one day. I think that's when I set things up and might not have had AutoStop toggled. So just make sure you have your workspace configured for AutoStop if that's best for you.

How to use CloudFront efficiently for less popular website?

We are building a website which contains a lot of images and data. We have optimized a lot to make the website faster. Then we decided to use AWS CloudFront also to make it faster for all regions around the world. The app works faster after the integration of CloudFront.
But later we found that the data will load to CloudFront cache only when the website asks for it. So we are afraid that the initial load will take the same time as it used to take without the CDN because it loads from S3 to CDN first and then to the user.
Also, we used the default TTL values (ie., 24 hours). In our case, a user may log in once or twice per week to this website. So in that case also, the advantage of caching won't work here as well because the caching expires after 24 hours. Will raising the time of TTL (Maximum TTL) to a larger value solve the issue? Does it cost more money? And I also read that, increasing to a longer TTL is not a good idea as it has some disadvantages also for updating the data in s3.
Cloudfront will cache the response only after the first user requests for it. So it will be slow for the first user, but it will be significantly faster for every other user after the first user. So it does make sense to use Cloudfront.
Using the default TTL value is okay. Since most users will see the same content and the website has a lot of static components as well. Every user except the first user will see a fast response from your website. You could even reduce this to 10-12 hours depending on how often you expect your data to change.
There is no additional cost to increasing your TTL. However invalidation requests are charged. So if you want to remove a cache, there will be a cost added to it. So I would prefer to keep a short TTL as short as your data is expected to change, so you dont have to invalidate existing caches when your data changes. At the same time, maximum number of users can benefit from your CDN.
No additional charge for the first 1,000 paths requested for invalidation each month. Thereafter, $0.005 per path requested for invalidation.
UPDATE: In the event that you only have 1 user using the website over a long period of time (1 week or so), it might not be of much benefit to use CloudFront at all. CloudFront and all caching services are only effective when there are multiple users requesting for the same resources.
However you might still have a marginal benefit using CloudFront, as the requests will be routed from the edge location to S3 over AWS's backbone network which is much faster than the internet. But whether this is cost effective for you or not depends on how many users are using the website and how slow it is.
Aside from using CloudFront, you could also try S3 Cross Region Replication to increase your overall speed. Cross Region Replication can replicate your buckets to a different region as and when they are added in one region. This can help to minimize latency for users from other regions.

Audit Report API Limit Increase

I need to access Google Docs Audit Activity for my domain. The limit for the same is 1000 records in a single API call. Also, the number of API calls per day is 10K.
What is the way to increase the limits for API calls per day? Google Support is unable to answer this question and redirected me to Stack Overflow.
You may want to refer with this thread regarding quota increase for Report API:
There are several quotas for the Google Analytics APIs and Google APIs in general.
requests/day 0 of 50,000
requests/100seconds/user 100
requests/perView 10000
Your application can make 50000 requests per day by default. This can be extended but it takes a while to get permission when you are getting close to this limit around 80% its best to request an extension at that time.
Your user can max make 100 requests a second which must be something that has just gone up last I knew it was only 10 requests a second. User is denoted by IP address. There is no way to extend this quota more then the max you cant apply for it or pay for it.
Then there is the last quota the one you asked about. You can make max 10000 requests a day to a view. This isn't just application based if the user runs my application and your application then together we have only 10000 requests that can be made. This quota is a pain if you ask me. Now for the bad news there is no way to extend this quota you cant apply for it you cant pay for it and you cant beg the Google Analytics dev team (I have tried)
Answer: No you cant extend the per view per day quota limit.
If you encountered error, it is recommended to catch the exception and, using an exponential backoff algorithm, wait for a small delay before retrying the failed call.

How to reduce Amazon Cloudfront costs?

I have a site that has exploded in traffic the last few days. I'm using Wordpress with W3 Total Cache plugin and Amazon Cloudfront to deliver the images and files from the site.
The problem is that the cost of Cloudfront is quite huge, near $500 just the past week. Is there a way to reduce the costs? Maybe using another CDN service?
I'm new to CDN, so I might not be implementing this well. I've created a cloudfront distribution and configured it on W3 Total Cache Plugin. However, I'm not using S3 and don't know if I should or how. To be honest, I'm not quite sure what's the difference between Cloudfront and S3.
Can anyone give me some hints here?
I'm not quite sure what's the difference between Cloudfront and S3.
That's easy. S3 is a data store. It stores files, and is super-scalable (easily scaling to serving 1000's of people at once.) The problem is that it's centralized (i.e. served from one place in the world.)
CloudFront is a CDN. It caches your files all over the world so they can be served faster. If you squint, it looks like they are 'storing' your files, but the cache can be lost at any time (or if they boot up a new node), so you still need the files at your origin.
CF may actually hurt you if you have too few hits per file. For example, in Tokyo, CF may have 20 nodes. It may take 100 requests to a file before all 20 CF nodes have cached your file (requests are randomly distributed). Of those 100 requets, 20 of them will hit an empty cache and see an additional 200ms latency as it fetches the file. They generally cache your file for a long time.
I'm not using S3 and don't know if I should
Probably not. Consider using S3 if you expect your site to massively grow in media. (i.e. lots of use photo uploads.)
Is there a way to reduce the costs? Maybe using another CDN service?
That entirely depends on your site. Some ideas:
1) Make sure you are serving the appropriate headers. And make sure your expires time isn't too short (should be days or weeks, or months, ideally).
The "best practice" is to never expire pages, except maybe your index page which should expire every X minutes or hours or days (depending on how fast you want it updated.) Make sure every page/image says how long it can be cached.
2) As stated above, CF is only useful if each page is requested > 100's of times per cache time. If you have millions of pages, each requested a few times, CF may not be useful.
3) Requests from Asia are much more expensive than the from the US. Consider launching your server in Toyko if you're more popular there.
4) Look at your web server log and see how often CF is requesting each of your assets. If it's more often than you expect, your cache headers are setup wrong. If you setup "cache this for months", you should only see a handful of requests per day (as they boot new servers, etc), and a few hundred requests when you publish a new file (i.e. one request per CF edge node).
Depending on your setup, other CDNs may be cheaper. And depending on your server, other setups may be less expensive. (i.e. if you serve lots of small files, you might be better off doing your own caching on EC2.)
You could give cloudflare a go. It's not a full CDN so it might not have all the features as cloudfront, but the basic package is free and it will offload a lot of traffic from your server.
https://www.cloudflare.com
Amazon Cloudfront costs Based on 2 factor
Number of Requests
Data Transferred in GB
Solution
Reduce image requests. For that combine small images into one image and use that image
https://www.w3schools.com/css/tryit.asp?filename=trycss_sprites_img (image sprites)
Don't use CDN for video file because video size is high and this is responsible for too high in CDN coast
What components make up your bill? One thing to check with W3 Total Cache plugin is the number of invalidation requests it is sending to CloudFront. It's known to send a large amount of invalidations paths on each change, which can add up.
Aside from that, if your spend is predictable, one option is to use CloudFront Security Savings Bundle to save up to 30% by committing to a minimum amount for a one year period. It's self-service, so you can sign up in the console and purchase additional commitments as your usage grows.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/savings-bundle.html
Don't forget that cloudfront has 3 different price classes, which will influence how far your data is being replicated, but at the same time, it will make it cheaper.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PriceClass.html
The key here is this:
"If you choose a price class that doesn’t include all edge locations, CloudFront might still occasionally serve requests from an edge location in a region that is not included in your price class. When this happens, you are not charged the rate for the more expensive region. Instead, you’re charged the rate for the least expensive region in your price class."
It means that you could use price class 100 (the cheapest one) and still get replication on regions you are not paying for <3

What good alternatives are there to Copperegg for monitoring EC2 instances?

I've been using Copperegg for a while now and have generally been happy with it until lately, where I have had a few issues. It's being used to monitor a number of EC2 instances that must be up 24/7.
Last week I was getting phantom alerts that servers had gone down when they hadn't, which I can cope with, but also I didn't get an alert when I should have done. One server had high CPU for over 5 mins when the alert should be triggered after 1 minute. The Copperegg support weren't not all that helpful, merely agreeing that an alert should have been triggered.
The latter of those problems is unacceptable and if it were to happen again outside of working hours then serious problems will follow.
So, I'm looking for alternative services that will do that same job. I've looked at Datadog and New Relic, but both have a significant problem in that they will only alert me of a problem 5 minutes after it's occurred, rather than the 1 minute I can get with Copperegg.
What else is out there that can do the same job and will also integrate with Pager Duty?
tl;dr : Amazon CloudWatch will do what you want and probably much much more.
I believe that Amazon actually offers a service that would accomplish your goal - CloudWatch (pricing). I'm going to take your points one by one. Note that I haven't actually used it before, but the documentation is fairly clear.
One server had high CPU for over 5 mins when the alert should be triggered after 1 minute
It looks like CloudWatch can be configured to send an alert (which I'll get to) after one minute of a condition being met:
One can actually set conditions for many other metrics as well - this is what I see on one of my instances, and I think that detailed monitoring (I use free), might have even more:
What else is out there that can do the same job and will also integrate with Pager Duty?
I'm assuming you're talking about this. It turns out the Pager Duty has a helpful guide just for integrating CloudWatch. How nice!
Pricing
Here's the pricing page, as you would probably like to parse it instead of me telling you. I'll give a brief overview, though:
You don't want basic monitoring, as it only gives you metrics once per five minutes (which you've indicated is unacceptable.) Instead, you want detailed monitoring (once every minute).
For an EC2 instance, the price for detailed monitoring is $3.50 per instance per month. Additionally, every alarm you make is $0.10 per month. This is actually very cheap if compared to CopperEgg's pricing - $70/mo versus maybe $30 per month for 9 instances and copious amounts of alarms. In reality, you'll probably be paying more like $10/mo.
Pager Duty's tutorial suggests you use SNS, which is another cost. The good thing: it's dirt cheap. $0.60 per million notifications. If you ever get above a dollar in a year for SNS, you need to perform some serious reliability improvements on your servers.
Other shiny things!
You're not just limited to Amazon's pre-packaged metrics! You can actually send custom metrics (time it took to complete a cronjob, whatever) to Cloudwatch via a PUT request. Quite handy.
Submit Custom Metrics generated by your own applications (or by AWS resources not mentioned above) and have them monitored by Amazon CloudWatch. You can submit these metrics to Amazon CloudWatch via a simple Put API request.
(from here)
Conclusion
So all in all: CloudWatch is quite cheap, can do 1-minute frequency stats, and will integrate with Pager Duty.
tl;dr: Server Density will do what you want, on top of that it has web checks and custom metrics too.
In short Server Density is a monitoring tool that will monitor all the relevant server metrics. You can take a look at this page where it’s all described.
One server had high CPU for over 5 mins when the alert should be triggered after 1 minute
Server Density’s open source agent collects and posts the data to their server every minute and you can decide yourself when that alert should be triggered. In the alert below you can see that the alert will alert 1 person after 1 minute and then repeatedly alert every 5 minutes.
There is a lot of other metrics that you can alert on too.
What else is out there that can do the same job and will also integrate with Pager Duty?
Server Density also integrates with PagerDuty. The only thing you need to do is to generate an api key at PagerDuty and then provide that in the settings.
Just provide the API key in the settings and you can then in check pagerduty as one of the alert recipients.
Pricing
You can find the pricing page here. I’ll give you a brief overview of it. The pricing starts at $10 for one server plus one web check and then get’s cheaper per server the more servers you add.
Everything will be monitored once every minute and there is no fees added for the amount of alerts added or triggered, even if that is an SMS to your phone number. The cost is slightly more expensive than the Cloudwatch example, but the support is good. If you used copperegg before they have a migration tool too.
Other shiny things!
Server Density allows you to monitor all the things! Then only thing you need to do is to send us custom metrics which you can do with a plugin written by yourself or by someone else.
I have to say that the graphs that Server Density provides is somewhat akin to eye candy too. Most other monitoring solutions I’ve seen out there have quite dull dashboards.
Conclusion
It will do the job for you. Not as cheap as CloudWatch, but doesn’t lock you in into AWS. It’ll give you 1 minute frequency metrics and integrate with pagerduty + a lot more stuff.