Create an alarm based on a CloudWatch insight query - amazon-web-services

My problem:
I would like to blacklist IPs which are accessing my public AWS API Gateway endpoint more than 5 times a hour.
My proposed solution:
Requests are logged to CloudWatch
Requests are counted and grouped by IP
An alarm monitors IPs send a message to a SNS topic in case the threshold is met
Lambda is triggered by the message and blacklists the IP
I am able to log and count the IPs by using the Insight query below:
fields ip
| stats count() as ipCount by ip
| filter ispresent(ip)
| sort ipCount desc
What I am struggling to accomplish is getting an CloudWatch Alarm based on this query.
I have searched a lot but no success. Any ideas on how to create such a metric / alert?

I know you planned to do a custom Lambda, but check if WAF already fulfills your use case. For example, the rate limit section in this article here clearly allows you to define the rate per 5-minutes for a given IP:
https://docs.aws.amazon.com/waf/latest/developerguide/classic-web-acl-rules-creating.html
If you are not doing anything else, a custom Lambda function may not be needed.
EDIT
If you want to go down the path of CloudWatch alarms, I think you can define a metric filter to create a CloudWatch metric. Then you can create the alarm based on the metric.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/MonitoringLogData.html

The best approche is to use the managed services "AWS WAF" which is perfectly integrated with your APIs.
The problem with a custom solution, is the latency, time to aggregate logs, count, and the cost, because each time a lambda will run with queries....
In API Gateway you can attach a WAF Web ACL directly, you can indicate the rate per 5 min, per 10min... for you need, is the job of the WAF.

Related

AWS API Gateway monitoring usage per Lambda and per client (public ip)

My UI/application is using AWS API gateway functionality (with Lambdas).
I wanted to use AWS functionality in order to track different clients (public ip address) using different API gateway resources (Lambdas).
That supposed to give me a bit of insights how my application is used. The plan is to track it per user/tenant and the simplest solution would be to use public IP address (and them move to something more sophisticated).
That address is in the CloudWatch/LogGroups/{lambda function name}. Each lambda is logging all headers from the http request:
def handler(event, context):
logger.info(str(event))
I was planning to use CloudWatch metric filers for json (example here: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html) but i guess that will allow me to create a metric which will be numeric (based on the number of matches determined by the metric filter). So it will not allow me to achieve my goal.
My desired state would be:
Dashboard in CloudWatch which is a chart (timeseries) showing me the number of Lambda executions per client (public IP) per lambda name. OR Dashboard in CloudWatch which is just text/table showing me the number of Lambda executions grouped by Lambda name and client (public_ip) (for a desired time range)
Now: my request seems to be very basic and common. Is there any better way to achieve it ? (i do not want to use sophisticated solutions like RUM or Xray).
Thanks,
Indeed - thanks Mark B - CloudWatch log insights was the best solution.
I have built the following query:
fields #message
| filter requestContext.path like /Test/
| display `headers.X-Forwarded-For`, `requestContext.path`
| stats count() as ExecutionPerIpPerLambda by `headers.X-Forwarded-For`, `requestContext.path`
And this generic query allows to visualize + chart lambda executions per public_ip + lambda function.
Thanks,

How can I add ip-based rate limits with longer intervals on API Gateway?

I have an API Gateway endpoint that I would like to limit access to. For anonymous users, I would like to set both daily and monthly limits (based on IP address).
AWS WAF has the ability to set rate limits, but the interval for them is a fixed 5 minutes, which is not useful in this situation.
API Gateway has the ability to add usage plans with longer term rate quotas that would suit my needs, but unfortunately they seem to be based on API keys, and I don't see a way to do it by IP.
Is there a way to accomplish what I'm trying to do using AWS Services?
Is it maybe possible to use a usage plan and automatically generate an api key for each user who wants to access the api? Or is there some other solution?
Without more context on your specific use-case, or the architecture of your system, it is difficult to give a “best practice” answer.
Like most things tech, there are a few ways you could accomplish this. One way would be to use a combination of CloudWatch API logging, Lambda, DynamoDB (with Streams) and WAF.
At a high level (and regardless of this specific need) I’d protect my API using WAF and the AWS security automations quickstart, found here, and associate it with my API Gateway as guided in the docs here. Once my WAF is setup and associated with my API Gateway, I’d enable CloudWatch API logging for API Gateway, as discussed here. Now that I have things setup, I’d create two Lambdas.
The first will parse the CloudWatch API logs and write the data I’m interested in (IP address and request time) to a DynamoDB table. To avoid unnecessary storage costs, I’d set the TTL on the record I’m writing to my DynamoDB table to be twice whatever my analysis’s temporal metric is... ie If I’m looking to limit it to 1000 requests per 1 month, I’d set the TTL on my DynamoDB record to be 2 months. From there, my CloudWatch API log group will have a subscription filter that sends log data to this Lambda, as described here.
My second Lambda is going to be doing the actual analysis and handling what happens when my metric is exceeded. This Lambda is going to be triggered by the write event to my DynamoDB table, as described here. I can have this Lambda run whatever analysis I want, but I’m going to assume that I want to limit access to 1000 requests per month for a given IP. When the new DynamoDB item triggers my Lambda, the Lambda is going to query the DynamoDB table for all records that were created in the preceding month from that moment, and that contain the IP address. If the number of records returned is less than or equal to 1000, it is going to do nothing. If it exceeds 1000 then the Lambda is going to update the WAF WebACL, and specifically UpdateIPSet to reject traffic for that IP, and that’s it. Pretty simple.
With the above process I have near real-time monitoring of request to my API gateway, in a very efficient, cost-effective, scaleable manner in a way that can be deployed entirely Serverless.
This is just one way to handle this, there are definitely other ways you could accomplish this with say Kinesis and Elastic Search, or instead of logs you could analyze CloudTail events, or by using a third party solution that integrates with AWS, or something else.

Google Cloud Stackdriver: Metric grouped by ip

I want to create stackdriver metrics, based on the ip and the frequency of requests an ip makes.
Therefore I would like to group by ip (the IP address of a requesting client) my loadbalancer logs, and if the number of requests exceed a threshold sent a notification.
Edit:
A workaround to achieve this.
Go to Stackdriver Logging and create a User-defined Metric that counts the total requests.
Fire an alarm when requests exceed a threshold.
Alarms call a lambda function that create a sync from stackdriver to bigquery
Execute the queries in order to find out the ip that causes the trouble
In Stackdriver Logging, create a User-defined Metric (myMetric) [1] filtered on the desired IP address,
In Stackdriver Monitoring, find resource type and metric by locating myMetric to create the chart.
[1] https://cloud.google.com/logging/docs/logs-based-metrics/
There is no out of the box solution so there can be a workaround with BigQuery
Go to Stackdriver Logging and create a User-defined Metric that counts the total requests.
Fire an alarm when requests exceed a threshold.
Alarms call a lambda function that create a sync from stackdriver to bigquery
Execute the queries in order to find out the ip that causes the trouble

How to scale an aws ecs service based on multiple alarms

We have a service running in aws ecs that we want to scale in and out based on 2 metrics.
Scale out when: cpu > 80% or connection_count > 9500
Scale in when: cpu < 50% and connection_count < 5000
We have access to both the cpu and connection count metrics and alarms in cloud watch. However, we can't figure out how to setup a dynamic scaling policy like this based on both of them.
Using the standard aws console interface for creating the auto scaling rules I don't see any options for multiple. Any links to a tutorial or aws docs on this would be appreciated.
Based on the responses posted in the support aws forums, nothing can be done for AND/OR/IF conditions. (https://forums.aws.amazon.com/thread.jspa?threadID=94984)
It does mention however that they already put a feature request to the cloudwatch team.
The following is mentioned as a workaround:
"In the meantime, a possible workaround can be to create a custom metric using a custom script which would run after every five minutes and get the data points from the CloudWatch metrics, then perform the AND or OR operation and then push the output to a custom metric. You can then create a CloudWatch alarm which would monitor this custom metric and then trigger actions accordingly."

How does Amazon CloudWatch batch logs when streaming to AWS Lambda?

The AWS documentation indicates that multiple log event records are provided to Lambda when streaming logs from CloudWatch.
logEvents
The actual log data, represented as an array of log event
records. The "id" property is a unique identifier for every log event.
How does CloudWatch group these logs?
Time? Count? Randomly, from my perspective?
Currently you get one Lambda invocation for every PutLogEvents batch that CloudWatch Logs had received against that log group. However you should probably not rely on that because AWS could always change it (for example batch more, etc).
You can observe this behavior by running the CWL -> Lambda example in the AWS docs.
Some aws services allow you to configure the log intervals such as elastic load balancing. There's a choice between five and sixty minute log intervals. You may not see a specific increment or parameter in the docs because they are configurable based on each service.