AWS load balancer log analyzer - amazon-web-services

I'm new to AWS wolrd. My purpose is to find as soon as possible in case of problems using Elastic Load Balancer logs top ips from requests, if possible who they are or some inspection on it. I only found paid services. Does anyone know a free application or maybe a website that analyzes AWS ELB logs?

Completely free solution isn't available as I know. Btw, there are cheap solutions.
You can monitor your load balancer by "Access logs", "CloudWatch metrics", "Request tracing" and "CloudTrail logs".
I don't understand exactly what you want, but there are some possible solutions.
If you're afraid of being attacked and you need immediate protection (against security scans, DDoS etc), you can use AWS's own services. "AWS Shield Standard" is automatically included at no extra cost. Btw, "For added protection against DDoS attacks, AWS offers AWS Shield Advanced". https://docs.aws.amazon.com/shield/
WAF is also good against attacks. You can create rules, rule-actions etc. Sadly it's not completely free. It runs "pay-as-you-use" style. https://aws.amazon.com/waf/pricing/
you can store the access log in S3 and analyse it later, but this can be costly in the end (and it's not real time)
you can analyse your log records with Lambda function. In this case, you need to use some NoSQL or something to store states or logics. (Lambda and DynamoDB is "pay-as-you-use" style and cheap, but not for free)
Keep in mind that:
The load balancer and lambda also increments the corresponding CloudWatch metric (it's cheap, but not for free)
You will pay for the outgoing data transfer. I mean from AWS to internet 1TB/month/account is always free (through CloudFront): https://aws.amazon.com/free/
you should use AWS's own services if you want a cheap and good solution

Elastic Load Balancing provides access logs that capture detailed information about requests sent to your load balancer. Each log contains information such as the time the request was received, the client's IP address, latencies, request paths, and server responses.
But keep in mind that access logging is an optional feature of Elastic Load Balancing that is disabled by default. After you enable access logging for your load balancer, Elastic Load Balancing captures the logs and stores them in the Amazon S3 bucket that you specify as compressed files. You can disable access logging at any time.
There are many complex and paid application that returns information regard access log but i advise you a simple, easy to use website that i use when i want to see top requester on our load balancer.
Website is https://vegalog.net
You shoud only upload your log file taken from S3 bucket and it returns to you a report with top requester, who they are (using whois function), response time and other useful informations.

Related

Usage monitoring from whitelisted IPs

I need to setup a shared processing service that uses a load balancer and several EC2 instances to process incoming requests using a custom .NET application. My issue is that I need to be able to bill based on usage. Only white-listed IPs will be able to call the application, but each IP only gets a set number of calls before each call is a billable event.
Since the AWS documentation for the ELB states "We recommend that you use access logs to understand the nature of the requests, not as a complete accounting of all requests", I do not feel the Access Logs on the ELB is what I'm looking for.
The question I have is how to best manage this so that the accounting team has an easy report each month that says how many calls each client made.
Actually you can use Access logs and since access logs will be written to S3, you can query each IP with Athena by using standard SQL. You can analyze your logs and extract reports.
References:
https://docs.aws.amazon.com/athena/latest/ug/what-is.html
https://aws.amazon.com/premiumsupport/knowledge-center/athena-analyze-access-logs/

Universal bucket in Google Cloud Platform for content, which would determine the user's location and serve content from the closest server?

I am a mobile application developer. I use Google Cloud bucket to store 10-second videos and photos that I use in the application. Thousands of users use the application every day, and I want to use a CDN to ensure that the content of the application is delivered to users with minimum delays and maximum speed.
At the moment, I have only found an opportunity to create a bucket within one region to choose from: the USA, Europe, and Asia. How to create a universal bucket in Google Cloud Platform for storing application content, which would determine the user's location and serve content from the server closest to the user?
Thank you!
You can take advantage of Cloud CDN (fully) when you use an HTTP(S) Load Balancer since it's a global resource, meaning that your bucket traffic would be forwarded to a GCP Point of Presence (PoP) near all the clients worldwide.
If interested, here's how to do it, but keep in mind that you would (potentially) need to change your DNS value from current bucket to the new Load Balancer, and HTTP --> HTTPS redirection is not implemented by default (you can achieve an automatic redirect with this, but this is something you would need to setup separatedly).
On an additional note, depending on how many files (similar) are requested from your bucket, you will be charged less than it would be considered CDN traffic and not full GCP traffic.
So in short, your bucket data would be on your selected region, but CDN would be all around the globe meaning less latency, less price and your backend not being overwhelmed (this doesn't apply here since you are using buckets, but would apply for backend GCE instances).

What happens if a botnet uses http requests to drain my Amazon AWS server?

Im going to launch an app and Im worried if my competitors would just kill me by draining my Amazon AWS resources by using a botnet to send gibberish http requests to my Amazon AWS Account. I only got a few thousand dollars and I can not afford to be slaughtered like that.
In what other ways my competitors or haters could drain my server resources to drain my bank balance and how to prevent it?
please help. Im in very stressful situation where I cant get any answer for this question. Any suggestion is welcome.
Thanks.
As pointed by #morras, AWS Shield + WAF is good combination to protect your resources from spam requests. Since you have not given your architecture about what aws services you are actually using, I am trying to answer based on general term.
In AWS Shield there are two types
Standard - Automated mitigation techniques are built-into AWS Shield Standard, giving you protection against common, most frequently occurring infrastructure attacks. If you have technical expertise to create rules based on your request, you can go with this.
Advanced - AWS WAF comes free with this, and you will have 24x7 access to the AWS DDoS Response Team (DRT), support experts who apply manual mitigations for more complex and sophisticated DDoS attacks, directly create or update AWS WAF rules, and can recommend improvements to your AWS architectures.It also includes some cost protection against Amazon EC2, Elastic Load Balancing, Amazon CloudFront, and Amazon Route 53 usage spikes that could result from scaling during a DDoS attack
Please take a look at design resilient architecture in aws to mitigate DDOS.
update: If the AWS Shield Advanced team determines that the incident is a valid DDoS attack and that the underlying services scaled to absorb the attack, AWS provides account credit for charges incurred due to the attack. For example, if your legitimate CloudFront data transfer usage during the attack period was 20 GB, but due to the attack you incurred charges for 200 GB of incremental data transfer, AWS provides credit to offset the incremental data transfer charges. AWS automatically applies all credits toward your future monthly bills. Credits are applied towards AWS Shield and cannot be used for payment for other AWS services. Credits are valid for 12 months.
The services covered as per doc are Amazon CloudFront, Elastic Load Balancing, Route 53 or Amazon EC2 . Please check with AWS support, whether your services are covered or not.
There are a couple of options available. First of AWS provides AWS Shield which is a DDoS protection service. The standard subscription is free and covers most frequently occurring network and transport layer DDoS attacks.
On top of that you can consider using AWS WAF - Web Application Firewall which allows you to setup rules for what traffic to allow to your servers.
You can also use API gateway in front of your service and set throttling limits on how much traffic to allow through.
However I would question if you really need this? It sounds like you are worried that you would run up a huge AWS bill if competitors start sending you millions of requests. You can setup billing alerts so when your forecasted bill exceeds a specific threshold you are warned and you can either manually shut down the services that are being bombarded and figure out what the attacks look like, or you can have an automatic response via CloudWatch. I believe that you will find that you will not be under attack and that you should not worry too much on this attack vector at this time.

How to make a HTTP call reaching all instances behind amazon AWS load balancer?

I have a web app which runs behind Amazon AWS Elastic Load Balancer with 3 instances attached. The app has a /refresh endpoint to reload reference data. It need to be run whenever new data is available, which happens several times a week.
What I have been doing is assigning public address to all instances, and do refresh independently (using ec2-url/refresh). I agree with Michael's answer on a different topic, EC2 instances behind ELB shouldn't allow direct public access. Now my problem is how can I make elb-url/refresh call reaching all instances behind the load balancer?
And it would be nice if I can collect HTTP responses from multiple instances. But I don't mind doing the refresh blindly for now.
one of the way I'd solve this problem is by
writing the data to an AWS s3 bucket
triggering a AWS Lambda function automatically from the s3 write
using AWS SDK to to identify the instances attached to the ELB from the Lambda function e.g. using boto3 from python or AWS Java SDK
call /refresh on individual instances from Lambda
ensuring when a new instance is created (due to autoscaling or deployment), it fetches the data from the s3 bucket during startup
ensuring that the private subnets the instances are in allows traffic from the subnets attached to the Lambda
ensuring that the security groups attached to the instances allow traffic from the security group attached to the Lambda
the key wins of this solution are
the process is fully automated from the instant the data is written to s3,
avoids data inconsistency due to autoscaling/deployment,
simple to maintain (you don't have to hardcode instance ip addresses anywhere),
you don't have to expose instances outside the VPC
highly available (AWS ensures the Lambda is invoked on s3 write, you don't worry about running a script in an instance and ensuring the instance is up and running)
hope this is useful.
While this may not be possible given the constraints of your application & circumstances, its worth noting that best practice application architecture for instances running behind an AWS ELB (particularly if they are part of an AutoScalingGroup) is ensure that the instances are not stateful.
The idea is to make it so that you can scale out by adding new instances, or scale-in by removing instances, without compromising data integrity or performance.
One option would be to change the application to store the results of the reference data reload into an off-instance data store, such as a cache or database (e.g. Elasticache or RDS), instead of in-memory.
If the application was able to do that, then you would only need to hit the refresh endpoint on a single server - it would reload the reference data, do whatever analysis and manipulation is required to store it efficiently in a fit-for-purpose way for the application, store it to the data store, and then all instances would have access to the refreshed data via the shared data store.
While there is a latency increase adding a round-trip to a data store, it is often well worth it for the consistency of the application - under your current model, if one server lags behind the others in refreshing the reference data, if the ELB is not using sticky sessions, requests via the ELB will return inconsistent data depending on which server they are allocated to.
You can't make these requests through the load balancer, So you will have to open up the security group of the instances to allow incoming traffic from source other than the ELB. That doesn't mean you need to open it to all direct traffic though. You could simply whitelist an IP address in the security group to allow requests from your specific computer.
If you don't want to add public IP addresses to these servers then you will need to run something like a curl command on an EC2 instance inside the VPC. In that case you would only need to open the security group to allow traffic from some server (or group of servers) that exist in the VPC.
I solved it differently, without opening up new traffic in security groups or resorting to external resources like S3. It's flexible in that it will dynamically notify instances added through ECS or ASG.
The ELB's Target Group offers a feature of periodic health check to ensure instances behind it are live. This is a URL that your server responds on. The endpoint can include a timestamp parameter of the most recent configuration. Every server in the TG will receive the health check ping within the configured Interval threshold. If the parameter to the ping changes it signals a refresh.
A URL may look like:
/is-alive?last-configuration=2019-08-27T23%3A50%3A23Z
Above I passed a UTC timestamp of 2019-08-27T23:50:23Z
A service receiving the request will check if the in-memory state is at least as recent as the timestamp parameter. If not, it will refresh its state and update the timestamp. The next health-check will result in a no-op since your state was refreshed.
Implementation notes
If refreshing the state can take more time than the interval window or the TG health timeout, you need to offload it to another thread to prevent concurrent updates or outright service disruption as the health-checks need to return promptly. Otherwise the node will be considered off-line.
If you are using traffic port for this purpose, make sure the URL is secured by making it impossible to guess. Anything publicly exposed can be subject to a DoS attack.
As you are using S3 you can automate your task by using the ObjectCreated notification for S3.
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
https://docs.aws.amazon.com/cli/latest/reference/s3api/put-bucket-notification.html
You can install AWS CLI and write a simple Bash script that will monitor that ObjectCreated notification. Start a Cron job that will look for the S3 notification for creation of new object.
Setup a condition in that script file to curl "http: //127.0.0.1/refresh" when the script file detects new object created in S3 it will curl the 127.0.0.1/refresh and done you don't have to do that manually each time.
I personally like the answer by #redoc, but wanted to give another alternative for anyone that is interested, which is a combination of his and the accepted answer. Using SEE object creation events, you can trigger a lambda, but instead of discovering the instances and calling them, which requires the lambda to be in the vpc, you could have the lambda use SSM (aka Systems Manager) to execute commands via a powershell or bash document on EC2 instances that are targeted via tags. The document would then call 127.0.0.1/reload like the accepted answer has. The benefit of this is that your lambda doesn't have to be in the vpc, and your EC2s don't need inbound rules to allow the traffic from lambda. The downside is that it requires the instances to have the SSM agent installed, which sounds like more work than it really is. There's AWS AMIs already optimized with SSM agent stuff, but installing it yourself in the user data is very simple. Another potential downside, depending on your use case, is that it uses an exponential ramp up for simultaneous executions, which means if you're targeting 20 instances, it runs one 1, then 2 at once, then 4 at once, then 8, until they are all done, or it reaches what you set for the max. This is because of the error recovery stuff it has built in. It doesn't want to destroy all your stuff if something is wrong, like slowly putting your weight on some ice.
You could make the call multiple times in rapid succession to call all the instances behind the Load Balancer. This would work because the AWS Load Balancers use round-robin without sticky sessions by default, meaning that each call handled by the Load Balancer is dispatched to the next EC2 Instance in the list of available instances. So if you're making rapid calls, you're likely to hit all the instances.
Another option is that if your EC2 instances are fairly stable, you can create a Target Group for each EC2 Instance, and then create a listener rule on your Load Balancer to target those single instance groups based on some criteria, such as a query argument, URL or header.

What is the expectation of privacy on an EC2 instance?

If I turn on a machine in EC2, what expectation of privacy do I have for my running processes, command line history, data stored on ephemeral disk, etc?
Can people at Amazon decide to take a look at what I'm running?
Could Amazon decide to do some profiling for the purposes of upselling?
Hi there! Looks like you're running Cassandra! Here's the optimal
tuning requirements for Cassandra on your m1.xlarge machine!
I can't seem to find anything in the docs...
This is the most applicable thing I found:
AWS only uses each a customer's content to provide the AWS services
selected by that customer and does not use customer content for any
other purposes. AWS treats all customer content the same and has no
insight into what type of content the customer chooses to store in
AWS. AWS simply makes available the compute, storage, database,
mobile, and network services selected by the customer. AWS does not
require access to customer content to provide its services.
http://aws.amazon.com/compliance/data-privacy-faq/
What you are asking about should be addressed in their "Data Privacy" policy (http://aws.amazon.com/agreement/) in their Customer Agreement page:
3.2 Data Privacy. We participate in the safe harbor programs described in the Privacy Policy. You may specify the AWS regions in which Your
Content will be stored and accessible by End Users. We will not move
Your Content from your selected AWS regions without notifying you,
unless required to comply with the law or requests of governmental
entities. You consent to our collection, use and disclosure of
information associated with the Service Offerings in accordance with
our Privacy Policy, and to the processing of Your Content in, and the
transfer of Your Content into, the AWS regions you select.
Here's a link to their "Privacy Policy":
http://aws.amazon.com/privacy/
So in essence, it's saying that you need to consent for them to gather information stored in your server. Now that's different from poking at the TCP ports on your machines from the outside. Amazon constantly runs port checking and traffic checking from the outside (it could be in their intranet too) to make sure you are complying with their customer agreement. For example, they can monitor that you are not hosting something illegal (through public content) or that you are not sending spam or robot traffic to hack into other servers.
Having said that, it's quite possible that they use some of these monitoring tools to check: ok this person has port so and so open. So he/she must be running this application and we can suggest something better for them.
Hope it helps.