aws dynamodb free tier practical limit - amazon-web-services

As per AWS Dynamodb pricing
it allows 25 read capacity units which translates to 50 GetItem requests per second ( with eventual consistency and each item being less than 4kb).
Free Tier*
As part of AWS’s Free Tier, AWS customers can get started with Amazon
DynamoDB for free. DynamoDB customers get 25 GB of free storage, as
well as up to 25 write capacity units and 25 read capacity units of
ongoing throughput capacity (enough throughput to handle up to 200
million requests per month) and 2.5 million read requests from
DynamoDB Streams for free.
How does this translate to online web site? If more than 50 users make GET calls at the same time, do the requests gets throttled ? and eventually 400 response is returned? Does it mean free tier practical limits are, during bursts if 100 users log in the site and make GET calls at same time, we may see 400 responses from DynamoDB. Is this valid conclusion?
I understand that there may hot be 100 requests per every second. But if the site has active users of more than 200 at any time, does Dydnamodb free tier still work?

First, read up on DynamoDB burst capacity. Your DynamoDB tables should be able to sustain short bursts of higher throughput without throttling.
Second, the question of how database throughput capacity will "translate to online web site" is way too broad to answer directly. It entirely depends on how many database calls your web application makes per web request. Your question sounds like you are assuming every page load on your website results in exactly one DynamoDB request. That seems like an extremely unrealistic assumption.
You should be using a CDN to prevent as many requests as possible from even hitting your web server. You should be using a cache like Redis to prevent as many data lookups from hitting your database as possible. Then you should perform some benchmarking to determine the database throughput you are going to need, and evaluate that against the DynamoDB free tier.
How often does your web app content change? Do users submit changes to your content and if so, how often? These questions will directly affect how well your site can be cached, which will directly affect how often your app will have to go back to the database to get the latest content.
As for what response is returned to the user in the scenario where the DynamoDB request is throttled, that would be entirely dependent how you capture and handle errors in your web app.

Related

Apache SuperSet is very slow

Any recommendation on how to make superset faster?
Cache seems to load full data from the cache, I thought it load only old data from the cache, and real-time data from the database, isn't it like this?
What about some parallel processing?
This answer is valid as of Superset 0.37.0.
At the moment, dashboard performance is affected by a few different factors. I'll enumerate them below along with methods to improve performance:
Database concurrency limits can have an impact on dashboard performance. Dashboards load their information in parallel via concurrent web requests. Make sure that the database user provided allows enough concurrency that queries aren't being queued at the database layer.
Cache performance your caching layer should be able to return multiple results, if not in parallel, extremely quickly. We've had success leveraging S3 for our cache.
Cache hit percentage Superset will hit the cache only for queries that exactly match one that has been run recently. Otherwise the full query will fall through to the underlying analytical DB (Druid in this case). You can reduce the query load on Druid by using a less granular resolution on your dashboard - if it's possible to have it update less frequently, say a couple of times a day rather than in real-time, this can hit cache for all requests other than the first request in the new period under consideration.
Python Web Process Concurrency Limits make sure that your web application server can handle enough parallel requests. The browser will request multiple charts' data at the same time, and the system will need to be able to handle these requests in parallel.
Chart Query Performance As data is frequently requested, especially for real-time data from a database like Druid, optimizing the queries run by the charts can be very useful. I'd take a look at any virtual datasources that are being leveraged to see if they can be materialized or made more efficient.
Web browser concurrent request limits By default most web browsers limit the number of concurrent requests that can be made to the same FQDN. If you have more than 6 charts on the same dashboard, it can be helpful to balance requests across multiple FQDNs running Superset to get around this browser limitation. There's more information on the approach to that in the issue history on Github, but Superset does support this type of configuration.
The community is very interested in improving performance over time, and as such there have been recommendations to move all analytical queries to Celery as well as making other architectural changes to improve performance. I hope this description helps and that something in here will help you track down the issue!

How to use CloudFront efficiently for less popular website?

We are building a website which contains a lot of images and data. We have optimized a lot to make the website faster. Then we decided to use AWS CloudFront also to make it faster for all regions around the world. The app works faster after the integration of CloudFront.
But later we found that the data will load to CloudFront cache only when the website asks for it. So we are afraid that the initial load will take the same time as it used to take without the CDN because it loads from S3 to CDN first and then to the user.
Also, we used the default TTL values (ie., 24 hours). In our case, a user may log in once or twice per week to this website. So in that case also, the advantage of caching won't work here as well because the caching expires after 24 hours. Will raising the time of TTL (Maximum TTL) to a larger value solve the issue? Does it cost more money? And I also read that, increasing to a longer TTL is not a good idea as it has some disadvantages also for updating the data in s3.
Cloudfront will cache the response only after the first user requests for it. So it will be slow for the first user, but it will be significantly faster for every other user after the first user. So it does make sense to use Cloudfront.
Using the default TTL value is okay. Since most users will see the same content and the website has a lot of static components as well. Every user except the first user will see a fast response from your website. You could even reduce this to 10-12 hours depending on how often you expect your data to change.
There is no additional cost to increasing your TTL. However invalidation requests are charged. So if you want to remove a cache, there will be a cost added to it. So I would prefer to keep a short TTL as short as your data is expected to change, so you dont have to invalidate existing caches when your data changes. At the same time, maximum number of users can benefit from your CDN.
No additional charge for the first 1,000 paths requested for invalidation each month. Thereafter, $0.005 per path requested for invalidation.
UPDATE: In the event that you only have 1 user using the website over a long period of time (1 week or so), it might not be of much benefit to use CloudFront at all. CloudFront and all caching services are only effective when there are multiple users requesting for the same resources.
However you might still have a marginal benefit using CloudFront, as the requests will be routed from the edge location to S3 over AWS's backbone network which is much faster than the internet. But whether this is cost effective for you or not depends on how many users are using the website and how slow it is.
Aside from using CloudFront, you could also try S3 Cross Region Replication to increase your overall speed. Cross Region Replication can replicate your buckets to a different region as and when they are added in one region. This can help to minimize latency for users from other regions.

Audit Report API Limit Increase

I need to access Google Docs Audit Activity for my domain. The limit for the same is 1000 records in a single API call. Also, the number of API calls per day is 10K.
What is the way to increase the limits for API calls per day? Google Support is unable to answer this question and redirected me to Stack Overflow.
You may want to refer with this thread regarding quota increase for Report API:
There are several quotas for the Google Analytics APIs and Google APIs in general.
requests/day 0 of 50,000
requests/100seconds/user 100
requests/perView 10000
Your application can make 50000 requests per day by default. This can be extended but it takes a while to get permission when you are getting close to this limit around 80% its best to request an extension at that time.
Your user can max make 100 requests a second which must be something that has just gone up last I knew it was only 10 requests a second. User is denoted by IP address. There is no way to extend this quota more then the max you cant apply for it or pay for it.
Then there is the last quota the one you asked about. You can make max 10000 requests a day to a view. This isn't just application based if the user runs my application and your application then together we have only 10000 requests that can be made. This quota is a pain if you ask me. Now for the bad news there is no way to extend this quota you cant apply for it you cant pay for it and you cant beg the Google Analytics dev team (I have tried)
Answer: No you cant extend the per view per day quota limit.
If you encountered error, it is recommended to catch the exception and, using an exponential backoff algorithm, wait for a small delay before retrying the failed call.

Azure SQL database type

I am making a MVC website that has a SQL database for storage on Azure. Potentially there will be many hundreds and possibly thousands of transactions per day to the website via a web service.
What type of database should I use? Should it be the web retired version, standard or any of the other types? What is the cheapest, that still works well with the traffic of many hundreds and possibly thousands of transactions per day.
Thanks in advance.
Your description of your requirements equates to a light transaction workload - "hundreds and possibly thousands per day". It's difficult to give a concrete answer with such little information. However, assuming you're OK with the 2GB database size, the Basic tier is where I would start. You can always change your service tier and/or performance level if you find you need more.
Be sure to check the Overview of the Performance Model chart to find the service tier and performance level that meets your requirements. Notice that Basic benchmarks are per hour, Standard are per minute, and Premium are per second.
Also, look at the SQL Database pricing page and you will see the pricing for a Basic is going to be cheaper than an equivalent Web Tier.

How to reduce Amazon Cloudfront costs?

I have a site that has exploded in traffic the last few days. I'm using Wordpress with W3 Total Cache plugin and Amazon Cloudfront to deliver the images and files from the site.
The problem is that the cost of Cloudfront is quite huge, near $500 just the past week. Is there a way to reduce the costs? Maybe using another CDN service?
I'm new to CDN, so I might not be implementing this well. I've created a cloudfront distribution and configured it on W3 Total Cache Plugin. However, I'm not using S3 and don't know if I should or how. To be honest, I'm not quite sure what's the difference between Cloudfront and S3.
Can anyone give me some hints here?
I'm not quite sure what's the difference between Cloudfront and S3.
That's easy. S3 is a data store. It stores files, and is super-scalable (easily scaling to serving 1000's of people at once.) The problem is that it's centralized (i.e. served from one place in the world.)
CloudFront is a CDN. It caches your files all over the world so they can be served faster. If you squint, it looks like they are 'storing' your files, but the cache can be lost at any time (or if they boot up a new node), so you still need the files at your origin.
CF may actually hurt you if you have too few hits per file. For example, in Tokyo, CF may have 20 nodes. It may take 100 requests to a file before all 20 CF nodes have cached your file (requests are randomly distributed). Of those 100 requets, 20 of them will hit an empty cache and see an additional 200ms latency as it fetches the file. They generally cache your file for a long time.
I'm not using S3 and don't know if I should
Probably not. Consider using S3 if you expect your site to massively grow in media. (i.e. lots of use photo uploads.)
Is there a way to reduce the costs? Maybe using another CDN service?
That entirely depends on your site. Some ideas:
1) Make sure you are serving the appropriate headers. And make sure your expires time isn't too short (should be days or weeks, or months, ideally).
The "best practice" is to never expire pages, except maybe your index page which should expire every X minutes or hours or days (depending on how fast you want it updated.) Make sure every page/image says how long it can be cached.
2) As stated above, CF is only useful if each page is requested > 100's of times per cache time. If you have millions of pages, each requested a few times, CF may not be useful.
3) Requests from Asia are much more expensive than the from the US. Consider launching your server in Toyko if you're more popular there.
4) Look at your web server log and see how often CF is requesting each of your assets. If it's more often than you expect, your cache headers are setup wrong. If you setup "cache this for months", you should only see a handful of requests per day (as they boot new servers, etc), and a few hundred requests when you publish a new file (i.e. one request per CF edge node).
Depending on your setup, other CDNs may be cheaper. And depending on your server, other setups may be less expensive. (i.e. if you serve lots of small files, you might be better off doing your own caching on EC2.)
You could give cloudflare a go. It's not a full CDN so it might not have all the features as cloudfront, but the basic package is free and it will offload a lot of traffic from your server.
https://www.cloudflare.com
Amazon Cloudfront costs Based on 2 factor
Number of Requests
Data Transferred in GB
Solution
Reduce image requests. For that combine small images into one image and use that image
https://www.w3schools.com/css/tryit.asp?filename=trycss_sprites_img (image sprites)
Don't use CDN for video file because video size is high and this is responsible for too high in CDN coast
What components make up your bill? One thing to check with W3 Total Cache plugin is the number of invalidation requests it is sending to CloudFront. It's known to send a large amount of invalidations paths on each change, which can add up.
Aside from that, if your spend is predictable, one option is to use CloudFront Security Savings Bundle to save up to 30% by committing to a minimum amount for a one year period. It's self-service, so you can sign up in the console and purchase additional commitments as your usage grows.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/savings-bundle.html
Don't forget that cloudfront has 3 different price classes, which will influence how far your data is being replicated, but at the same time, it will make it cheaper.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PriceClass.html
The key here is this:
"If you choose a price class that doesn’t include all edge locations, CloudFront might still occasionally serve requests from an edge location in a region that is not included in your price class. When this happens, you are not charged the rate for the more expensive region. Instead, you’re charged the rate for the least expensive region in your price class."
It means that you could use price class 100 (the cheapest one) and still get replication on regions you are not paying for <3