I recently started to use spot block instances because it is guaranteed for a certain hours.
When I send the request with on-demand price as max price with a few hours block time, the request stuck on capacity-not-available status all the time.
I keep trying from around noon till evening, the request stays at capacity-not-available status.
Then I tried to request for regular spot instances with same parameters, the request got fulfilled immediately.
Does anyone know if this behavior is reasonable? If it is true, I don't see much value in spot block instances then.
I use us-west-2 region by the way.
Thanks everyone for your advice in advance.
Source: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-bid-status.html
Holding:
If one or more request constraints are valid but can't be met yet,
or if there is not enough capacity, the request goes into a holding
state waiting for the constraints to be met.
The request options affect the likelihood of the request being fulfilled.
For example, if you specify a maximum price below the current Spot price,
your request stays in a holding state until the Spot price goes below
your maximum price. If you specify an Availability Zone group,
the request stays in a holding state until the Availability Zone constraint is met.
Maybe you could try another availability zone?
Also you could check the current prices on https://aws.amazon.com/ec2/spot/pricing/ to see if your bid is in range.
Related
I use Cloud Run for my apps and trying to predict the costs using the GCP pricing calculator. I can't find out why it's cheaper with CPU always allocated instead of CPU allocated during request processing when it says "When you opt in to "CPU always allocated", you are billed for the entire lifetime of container instances".
Any explanation?
Thank you!
Cloud Run is serverless by default: you pay as you use. When a request comes in, an instance is created (and started, it's called cold-start) and your request processed. The timer starts. When your web server send the answer, the timer stop.
You pay for the memory and the CPU used during the request processing, rounded to the upper 100ms. The instance continue to live for about 15 minutes (by default, can be changed at any moment) to be ready to process another request without the need to start another one (and wait the cold start again).
As you can see, the instance continue to live EVEN IF YOU NO LONGER PAY FOR IT. Because you pay only when a request is processed.
When you set the CPU always on, you pay full time the instance run. No matter the request handling or not. Google don't have to pay for instances running and not used, waiting a request as the pay per use model. You pay for that, and you pay less
It's like a Compute Engine up full time. And as a Compute Engine, you can have something similar to sustained used discount. That's why it's cheaper.
In general it depends on how you use cloud run. Google is giving some hints here: https://cloud.google.com/run/docs/configuring/cpu-allocation
To give a summary to the biggest pricing differences:
CPU is only allocated during request processing
you pay for:
every request on a per request basis
cpu and memory allocation time during request
CPU always allocated
you pay for
cpu and memory allocation with a cheaper rate for time of instance active
Compare the pricing here: https://cloud.google.com/run/pricing
So if you have a lot of request which do not use a lot of ressources and not so much variance in it, then "always allocated" might be a lot cheaper.
i created some AWS CW Alerts which have typically a time periode of 1 hour / 1 Datapoint. By occuring an Alert our Service Team has been notificated. During a "normal" workday, someone cares about it and do the work of resetting some programms etc. But it also happens, no one have time or sense to care and the alert keeps in the alert state.
Now i want to repeat the alert if there wasn't any state-change in the past 24 hours. It is possible? I still does not find the "easy" answer.
Thx!
EDIT:
Added a "daily_occurence_alert" which is controlled by eventrules / time control. An additional alert for each observed Alert combined with an AND serves good.
It is a workaround, not a solution. Hope this feature will be added as a standard in future.
We have an AWS Elasticsearch cluster setup. However, our Error rate alarm goes off at regular intervals. The way we are trying to calculate our error rate is:
((sum(4xx) + sum(5xx))/sum(ElasticsearchRequests)) * 100
However, if you look at the screenshot below, at 7:15 4xx was 4, however ElasticsearchRequests value is only 2. Based on the metrics info on AWS Elasticsearch documentation page, ElasticsearchRequests should be total number of requests, so it should clearly be greater than or equal to 4xx.
Can someone please help me understand in what I am doing wrong here?
AWS definitions of these metrics are:
OpenSearchRequests (previously ElasticsearchRequests): The number of requests made to the OpenSearch cluster. Relevant statistics: Sum
2xx, 3xx, 4xx, 5xx: The number of requests to the domain that resulted in the given HTTP response code (2xx, 3xx, 4xx, 5xx). Relevant statistics: Sum
Please note the different terms used for the subjects of the metrics: cluster vs domain
To my understanding, OpenSearchRequests only considers requests that actually reach the underlying OpenSearch/ElasticSearch cluster, so some the 4xx requests might not (e.g. 403 errors), hence the difference in metrics.
Also, AWS only recommends comparing 5xx to OpenSearchRequests:
5xx alarms >= 10% of OpenSearchRequests: One or more data nodes might be overloaded, or requests are failing to complete within the idle timeout period. Consider switching to larger instance types or adding more nodes to the cluster. Confirm that you're following best practices for shard and cluster architecture.
I know this was posted a while back but I've additionally struggled with this issue and maybe I can add a few pointers.
First off, make sure your metrics are properly configured. For instance, some responses (4xx for example) take up to 5 minutes to register, while OpensearchRequests are refershed every minute. This makes for a very wonky graph that will definitely throw off your error rate.
In the picture above, I send a request that returns 400 every 5 seconds, and send a response that returns 200 every 0.5 seconds. The period in this case is 1 minute. This makes it so on average it should be around a 10% error rate. As you can see by the green line, the requests sent are summed up every minute, whereas the the 4xx are summed up every 5 minute, and in between every minute they are 0, which makes for an error rate spike every 5 minutes (since the opensearch requests are not multiplied by 5).
In the next image, the period is set to 5 minutes. Notice how this time the error rate is around 10 percent.
When I look at your graph, I see metrics that look like they are based off of a different period.
The second pointer I may add is to make sure to account for when no data is coming in. The behavior the alarm has may vary based on your how you define the "treat missing data" parameter. In some cases, if no data comes in, your expression might make it so it stays in alarm when in fact there is only no new data coming in. Some metrics might return no value when no requests are made, while some may return 0. In the former case, you can use the FILL(metric, value) function to specify what to return when no value is returned. Experiment with what happens to your error rate if you divide by zero.
Hope this message helps clarify a bit.
I have a path (mysite.com/myapiendpoint for sake of example) that is both resource intensive to service, and very prone to bot abuse. I need to rate limit access to that specific path to something like 10 requests per minute per client IP address. How can this be done?
I'm hosting off an EC2 instance with CloudFront and AWS WAF in front. I have the standard "Rate Based Rule" enabled, but its 2,000 requests per minute per IP address minimum is absolutely unusable for my application.
I was considering using API Gateway for this, and have used it in the past, but its rate limiting as I understand it is not based on IP address, so bots would simply use up the limit and legitimate users would constantly be denied usage of the endpoint.
My site does not use sessions of any sort, so I don't think I could do any sort of rate limiting in the server itself. Also please bear in mind my site is a one-man-operation and I'm somewhat new to AWS :)
How can I limit the usage per IP to something like 10 requests per minute, preferably in WAF?
[Edit]
After more research I'm wondering if I could enable header forwarding to the origin (running node/express) and use a rate-limiter package. Is this a viable solution?
I don't know if this is still useful to you - but I just got a tip from AWS support. If you add the rate limit rule multiple times, it effectively reduces the number of requests each time. Basically what happens is each time you add the rule, it counts an extra request for each IP. So say an IP makes a single request. If you have 2 rate limit rules applied, the request is counted twice. So basically, instead of 2000 requests, the IP only has to make 1000 before it gets blocked. If you add 3 rules, it will count each request 3 times - so the IP will be blocked at 667 requests.
The other thing they clarified is that the "window" is 5 minutes, but if the total is breached anywhere in that window, it will be blocked. I thought the WAF would only evaluate the requests after a 5 minute period. So for example. Say you have a single rule for 2000 requests in 5 minutes. Say an IP makes 2000 requests in the 1st minute, then only 10 requests after that for the next 4 minutes. I initially understood that the IP would only be blocked after minute 5 (because WAF evaluates a 5 minute window). But apparently, if the IP exceeds the limit anywhere in that window, it will be locked immediately. So if that IP makes 2000 requests in minute 1, it will actually be blocked from minute 2, 3, 4 and 5. But then will be allowed again from minute 6 onward.
This clarified a lot for me. Having said that, I haven't tested this yet. I assume the AWS support techie knows what he's talking about - but definitely worth testing first.
AWS have now finally released an update which allows the rate limit to go as low as 100 requests every 5 minutes.
Announcement post: https://aws.amazon.com/about-aws/whats-new/2019/08/lower-threshold-for-aws-waf-rate-based-rules/
Using rule twice will not work, because WAF rate based rule will count on cloud watch logs basis, both rule will count 2000 requests separately, so it would not work for you.
You can use AWS-WAF automation cloud front template, and choose lambda/Athena parser, this way, request count will perform on s3 logs basis, also you will be able to block SQL,XSS and bad bot requests.
I am trying to learn how writes/updates work internally in DynamoDB. This is what I could find.
AWS Tutorial Link
"When your application writes data to a DynamoDB table and receives an HTTP 200 response (OK), all copies of the data are updated. The data will eventually be consistent across all storage locations, usually within one second or less."
For ex: If my DynamoDB has 50 partitions and it is replicated across 3 availability zones in a region, what happens in DynamoDB
After it receives an API request to create an item
After it sends the 200 OK response to the client
I would really appreciate any document that talks about this or hear from you directly.
Thanks
Dynamodb as per this replicates its data in 3 availability zones within the region.
So the question is how it manages the availability of the data.
Assume there is one receiver which will receive the request from the users.
The receiver for write request will have m/n value for consistency of data.
n is the number of availability zones
m would be ((n+1)/2) to maintain consistency.
In this case, it is 2/3.
Now when a receiver receives any request it will send the command to write data to all 3 zones but will wait for 2 zones to respond. When 2 of the zones has written the value the receiver will send 200 OK to user without waiting for zone 3 to respond.
Let say that user now immediately want to retrieve the data which was written.
For read request the receiver will use 1/(number of availability zones), In this case it is 1/3.
So receiver will request all the zone for data, Let say that zone A respond, This respond will be immediately sent to user.
Assuming 2/3 write request the data is stored in Zone A and Zone B currently, Zone C is still not updated.
Now when we read data if Zone A or B respond then we will have the value if Zone C respond then it will result in not found, this is the reason AWS say dynamoDB is eventual consistent.
When we query data with strongly consistent read, the value change to 2/3 which will make sure that updated value will be sent to user because at a time 2 availability zone will have the newest value
Note: This is just a simplified explanation and I am not associated with Amazon, they might be using some other things behind the scene.
Hope that helps