High number of aws target group health checks - amazon-web-services

I have an application load balancer with several registered target groups (and 6 availability zones in case it is important to mention).
There is one ec2 instance which is the registered target for all target groups. On the ec2 instance there is an nginx running.
For each target group I defined a health check with a custom url and with an interval of 60 seconds.
When I look at the nginx logs I expect to see the health check url for a particular target group every 60 seconds. But to my surprise I see that in 60 seconds there are groups of 8 calls like this:
172.31.25.32 - - [14/Feb/2022:16:00:29 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.118 uct="0.000" uht="0.120" urt="0.120"
172.31.89.13 - - [14/Feb/2022:16:00:35 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.080 uct="0.000" uht="0.080" urt="0.080"
172.31.75.210 - - [14/Feb/2022:16:00:43 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.050 uct="0.000" uht="0.052" urt="0.052"
172.31.88.219 - - [14/Feb/2022:16:00:44 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.059 uct="0.000" uht="0.060" urt="0.060"
172.31.9.236 - - [14/Feb/2022:16:00:51 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.059 uct="0.000" uht="0.060" urt="0.060"
172.31.15.138 - - [14/Feb/2022:16:01:02 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.010 uct="0.000" uht="0.008" urt="0.008"
172.31.49.23 - - [14/Feb/2022:16:01:07 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.062 uct="0.000" uht="0.064" urt="0.064"
172.31.47.189 - - [14/Feb/2022:16:01:13 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.094 uct="0.000" uht="0.092" urt="0.092"
172.31.25.32 - - [14/Feb/2022:16:01:29 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.050 uct="0.000" uht="0.048" urt="0.048"
172.31.89.13 - - [14/Feb/2022:16:01:35 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.049 uct="0.000" uht="0.048" urt="0.048"
172.31.75.210 - - [14/Feb/2022:16:01:43 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.280 uct="0.000" uht="0.280" urt="0.280"
172.31.88.219 - - [14/Feb/2022:16:01:44 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.050 uct="0.000" uht="0.048" urt="0.048"
172.31.9.236 - - [14/Feb/2022:16:01:52 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.508 uct="0.000" uht="0.508" urt="0.508"
172.31.15.138 - - [14/Feb/2022:16:02:02 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.176 uct="0.000" uht="0.172" urt="0.172"
172.31.49.23 - - [14/Feb/2022:16:02:07 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.061 uct="0.000" uht="0.060" urt="0.060"
172.31.47.189 - - [14/Feb/2022:16:02:13 +0000] "GET /path/target-group-X/ HTTP/1.1" 200 4 "-" "ELB-HealthChecker/2.0" rt=0.057 uct="0.000" uht="0.056" urt="0.056"
There are 8 different local IP-s from which the calls are coming. If I take each such IP separately (e.g. 172.31.25.32), then indeed the health checks calls from that IP are arriving after exactly 60 seconds. But what is about the other calls? Why are so many?

I think at a minimum the target group is going to do a health check from each availability zone, or maybe each VPC subnet. You can probably map those IPs back to specific subnets in your VPC.
It definitely seems excessive, but you have to realize that behind the scenes a multi-az load balancer is really multiple servers, and each one is doing its own health check against your target server(s).

Related

Error 4xx AWS Elastic Beanstalk - Severe integrity

Good afternoon people,
I created an environment in Elastic Beanstalk and uploaded a NODEjs application an api with express.
She's working fine, all right.
But the integrity of the environment is reported as serious, and this monitoring attempt appears in the logs.
----------------------------------------
/var/log/nginx/access.log
----------------------------------------
172.31.46.198 - - [03/Nov/2021:19:14:13 +0000] "GET / HTTP/1.1" 404 139 "-" "ELB-HealthChecker/2.0" "-"
172.31.1.181 - - [03/Nov/2021:19:14:13 +0000] "GET / HTTP/1.1" 404 139 "-" "ELB-HealthChecker/2.0" "-"
172.31.30.127 - - [03/Nov/2021:19:14:13 +0000] "GET / HTTP/1.1" 404 139 "-" "ELB-HealthChecker/2.0" "-"
172.31.46.198 - - [03/Nov/2021:19:14:28 +0000] "GET / HTTP/1.1" 404 139 "-" "ELB-HealthChecker/2.0" "-"
172.31.1.181 - - [03/Nov/2021:19:14:28 +0000] "GET / HTTP/1.1" 404 139 "-" "ELB-HealthChecker/2.0" "-"
172.31.30.127 - - [03/Nov/2021:19:14:28 +0000] "GET / HTTP/1.1" 404 139 "-" "ELB-HealthChecker/2.0" "-"
172.31.46.198 - - [03/Nov/2021:19:14:43 +0000] "GET / HTTP/1.1" 404 139 "-" "ELB-HealthChecker/2.0" "-"
172.31.30.127 - - [03/Nov/2021:19:14:43 +0000] "GET / HTTP/1.1" 404 139 "-" "ELB-HealthChecker/2.0" "-"
172.31.1.181 - - [03/Nov/2021:19:14:43 +0000] "GET / HTTP/1.1" 404 139 "-" "ELB-HealthChecker/2.0" "-"
172.31.30.127 - - [03/Nov/2021:19:14:58 +0000] "GET / HTTP/1.1" 404 139 "-" "ELB-HealthChecker/2.0" "-"
172.31.1.181 - - [03/Nov/2021:19:14:58 +0000] "GET / HTTP/1.1" 404 139 "-" "ELB-HealthChecker/2.0" "-"
172.31.46.198 - - [03/Nov/2021:19:14:58 +0000] "GET / HTTP/1.1" 404 139 "-" "ELB-HealthChecker/2.0" "-"
172.31.30.127 - - [03/Nov/2021:19:15:13 +0000] "GET / HTTP/1.1" 404 139 "-" "ELB-HealthChecker/2.0" "-"
Does anyone know how I can fix this, without turning off the monitoring?
Good night people,
I found the problem, I didn't have anything set in my API's root on "/", so EB tried to monitor the api state and took a 404.
I set up a HealthCheck on the root "/" and normalized the 404 errors and integrity issue in the environment.

Regex in fail2ban not matching

Should be a simple thing, but with regex nothing is simple.
My fail2ban filter for wordpress sites:
[Definition]
#failregex = <HOST>.*POST.*(wp-login\.php|xmlrpc\.php).* 200
#failregex = <HOST>.*POST.*(wp-login\.php|xmlrpc\.php).* 200[ 0-9]*
failregex = ^"<HOST> .* "POST .*wp-login.php
#failregex = <HOST>.*POST.*wp-login.php .*
#failregex = ^"<HOST> .* "POST .*(wp-login.php|xmlrpc.php) HTTP/.*" (200|401)
ignoreregex =
As you can see I have tested multiple things, but I just don't get a match. Odly I do get a match on regex101.
And this is my logfile (those entires should be found):
"hostname 172.70.34.43 - - [18/May/2021:05:58:22 +0000] "POST //wp-login.php HTTP/1.1" 200 3069"
"hostname 172.70.34.43 - - [18/May/2021:05:58:22 +0000] "POST //wp-login.php HTTP/1.1" 200 3069"
"hostname 172.70.34.43 - - [18/May/2021:05:58:21 +0000] "POST //wp-login.php HTTP/1.1" 200 3069"
The logfile could also contain entries like this:
"hostname 172.69.63.84 - - [19/May/2021:09:23:01 +0000] "GET /feed/ HTTP/1.1" 200 14872"
"hostname 172.69.63.84 - - [19/May/2021:09:23:00 +0000] "GET /feed HTTP/1.1" 301 0"
"hostname 162.158.91.10 - - [19/May/2021:09:23:01 +0000] "POST /wp-cron.php?doing_wp_cron=1621416181.1017169952392578125000 HTTP/1.1" 200 0"
"hostname 172.68.57.138 - - [19/May/2021:09:22:34 +0000] "GET /versand/ HTTP/1.1" 200 27456"
"hostname 172.68.110.69 - - [19/May/2021:09:22:34 +0000] "POST /wp-cron.php?doing_wp_cron=1621416154.5001699924468994140625 HTTP/1.1" 200 0"
"hostname 172.69.34.217 - - [19/May/2021:09:19:48 +0000] "GET / HTTP/1.1" 200 32986"
And I have tested with fail2ban-regex, but with no success. I have also tried to replace < HOST > with the actual hostname, but in this case fail2ban will not accept the regex.
Running tests
=============
Use failregex filter file : wordpress, basedir: /etc/fail2ban
Use log file : /home/runcloud/logs/tmp.log
Use encoding : UTF-8
Results
=======
Failregex: 0 total
Ignoreregex: 0 total
Date template hits:
|- [# of hits] date format
| [3] Day(?P<_sep>[-/])MON(?P=_sep)ExYear[ :]?24hour:Minute:Second(?:\.Microseconds)?(?: Zone offset)?
`-
Lines: 3 lines, 0 ignored, 0 matched, 3 missed
this regex match (in this example the first 3 lines)
"POST request on either wp-login.php or xmlrp.php" as rapsli wanted
"POST\b.+\b(wp-login|xmlrp)\.php
in
"hostname 172.70.34.43 - - [18/May/2021:05:58:22 +0000] "POST //wp-login.php HTTP/1.1" 200 3069"
"hostname 172.70.34.43 - - [18/May/2021:05:58:22 +0000] "POST //wp-login.php HTTP/1.1" 200 3069"
"hostname 172.70.34.43 - - [18/May/2021:05:58:21 +0000] "POST //wp-login.php HTTP/1.1" 200 3069"
"hostname 172.69.63.84 - - [19/May/2021:09:23:01 +0000] "GET /feed/ HTTP/1.1" 200 14872"
"hostname 172.69.63.84 - - [19/May/2021:09:23:00 +0000] "GET /feed HTTP/1.1" 301 0"
"hostname 162.158.91.10 - - [19/May/2021:09:23:01 +0000] "POST /wp-cron.php?doing_wp_cron=1621416181.1017169952392578125000 HTTP/1.1" 200 0"
"hostname 172.68.57.138 - - [19/May/2021:09:22:34 +0000] "GET /versand/ HTTP/1.1" 200 27456"
"hostname 172.68.110.69 - - [19/May/2021:09:22:34 +0000] "POST /wp-cron.php?doing_wp_cron=1621416154.5001699924468994140625 HTTP/1.1" 200 0"
"hostname 172.69.34.217 - - [19/May/2021:09:19:48 +0000] "GET / HTTP/1.1" 200 32986"
https://regexr.com/5t8e3
needs to stand for the place with the IP. So this regex should work with fail2ban
failregex = "[a-z]* <HOST>.*(wp-login\.php|xmlrpc.php).*

Suddenly I am getting 502 on an End point, rest are working fine

I am serving django project with gunicorn, It running fine but after some time on one specific endpoint start giving 502. Other api end point still okay and giving proper response.
I already tried with gunicorn service settings
Current setting
ExecStart=/var/virtualenv/d/bin/gunicorn --workers 5 proj.wsgi:application -b :9008 --threads 8 -k gthread --timeout 120
47.247.243.245 - - [14/Jun/2019:15:02:11 +0000] "GET /api/user/1263/league/81/ HTTP/1.1" 200 291 "-" "okhttp/3.12.1"
157.32.16.35 - - [14/Jun/2019:15:02:11 +0000] "GET /api/v1/xyc-leaderboard/?contest=4973 HTTP/1.1" 200 43714 "-" "okhttp/3.12.1""
157.32.16.35 - - [14/Jun/2019:15:02:16 +0000] "GET /api/v1/xy/xy-team/15543 HTTP/1.1" 301 5 "-" "okhttp/3.12.1"
157.32.16.35 - - [14/Jun/2019:15:02:16 +0000] "GET /api/v1/xy/xy-team/15543/ HTTP/1.1" 502 5517 "-" "okhttp/3.12.1"
171.76.167.124 - - [14/Jun/2019:14:39:56 +0000] "GET /api/v1/xy/xy-team/15343/ HTTP/1.1" 502 182 "-" "okhttp/3.12.1"

AWS elasticbeanstalk worker received not allowed requests and stop to work. It need to be restart manually

I use AWS Elastic Beanstalk worker environment with SQS and cronjobs to do what I want.
But sometimes, my environment bug and stop to work (it needs to be restarted manually) because it received some unknown requests (not send by me of course) :
196.52.43.55 (-) - - [09/Jun/2017:00:33:11 +0000] "GET / HTTP/1.1" 400 226 "-" "-"
81.196.3.208 (-) - - [09/Jun/2017:01:45:30 +0000] "GET / HTTP/1.0" 200 4576 "-" "-"
195.154.214.162 (-) - - [09/Jun/2017:03:43:21 +0000] "GET //recordings/modules/phonefeatures.module HTTP/1.1" 404 471 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.1.1.el6.x86_64"
195.154.214.162 (-) - - [09/Jun/2017:04:54:27 +0000] "GET //recordings/modules/phonefeatures.module HTTP/1.1" 404 471 "-" "python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-696.1.1.el6.x86_64"
Example of cron job I executed every minute
127.0.0.1 (-) - - [09/Jun/2017:00:14:59 +0000] "POST /workers/cron/search/detailsHTTP/1.1" 200 - "-" "aws-sqsd/2.3"
127.0.0.1 (-) - - [09/Jun/2017:00:14:59 +0000] "POST /workers/cron/positions HTTP/1.1" 200 60 "-" "aws-sqsd/2.3"
127.0.0.1 (-) - - [09/Jun/2017:00:15:01 +0000] "POST /queue/received HTTP/1.1" 200 10 "-" "aws-sqsd/2.3"
Do you have a solution for me? Do I need to change my VPC and/or EC2 group security?
My architecture is one Elasticsearch Application and one Elasticsearch Worker.
Thank you very much

AWS elastic beanstalk cannot return custom response code using resteasy?

I'm working on a web service using RESTEASY to set the response status code when get some exception.
First I tried resteasy exception mapper which works fine locally. The mapper code attached below. However, when I upload that WS into elastic beanstalk, that always return 500 (internal server error).
#Provider
public class LoadGridTileFailedExceptionMapper extends BaseExceptionMapper implements ExceptionMapper<LoadGridTileFailedException>
{
#Override
public Response toResponse(LoadGridTileFailedException e)
{
log(e.getMessage(), e);
return printMsg(e.getMessage(), DtmWebServiceReturnStatus.LOAD_GRID_TILE_FAILED_EXCEPTION_CODE);
}
}
Then I try just throw exception WebApplicationException(ex, DtmWebServiceReturnStatus.LOAD_GRID_TILE_FAILED_EXCEPTION_CODE) to get around exception mapping. The result is that I got a response status 498(LOAD_GRID_TILE_FAILED_EXCEPTION_CODE) wrapped in status code 500.
Apache Tomcat/7.0.27 - Error report HTTP Status 498 - type Status reportmessage description http.498Apache Tomcat/7.0.27
It seems that elastic beanstalk wrapped all exceptions throw out in the server side with status code 500?The question is how can I get around that feature and return the status code I set in response? Thank you.
UPDATE
Try more requests this morning and find something interesting:
Get the right return status in elastic beanstalk log snapshot
/var/log/tomcat7/localhost_access_log.txt
127.0.0.1 - - [09/Jan/2013:15:06:28 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22
127.0.0.1 - - [09/Jan/2013:15:06:31 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22
127.0.0.1 - - [09/Jan/2013:15:06:34 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22
127.0.0.1 - - [09/Jan/2013:15:06:37 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22
127.0.0.1 - - [09/Jan/2013:15:06:39 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22
127.0.0.1 - - [09/Jan/2013:15:06:41 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22
127.0.0.1 - - [09/Jan/2013:15:06:44 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22
127.0.0.1 - - [09/Jan/2013:15:06:48 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22
127.0.0.1 - - [09/Jan/2013:15:06:51 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22
127.0.0.1 - - [09/Jan/2013:15:06:54 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22
127.0.0.1 - - [09/Jan/2013:15:06:57 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22
/var/log/httpd/elasticbeanstalk-access_log
10.28.215.233 (65.167.11.254, 10.28.215.233) - - [09/Jan/2013:15:06:28 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22 "-" "-"
10.28.215.233 (65.167.11.254, 10.28.215.233) - - [09/Jan/2013:15:06:31 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22 "-" "-"
10.28.215.233 (65.167.11.254, 10.28.215.233) - - [09/Jan/2013:15:06:34 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22 "-" "-"
10.28.215.233 (65.167.11.254, 10.28.215.233) - - [09/Jan/2013:15:06:37 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22 "-" "-"
10.28.215.233 (65.167.11.254, 10.28.215.233) - - [09/Jan/2013:15:06:39 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22 "-" "-"
10.28.215.233 (65.167.11.254, 10.28.215.233) - - [09/Jan/2013:15:06:41 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22 "-" "-"
10.28.215.233 (65.167.11.254, 10.28.215.233) - - [09/Jan/2013:15:06:44 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22 "-" "-"
10.28.215.233 (65.167.11.254, 10.28.215.233) - - [09/Jan/2013:15:06:48 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22 "-" "-"
10.28.215.233 (65.167.11.254, 10.28.215.233) - - [09/Jan/2013:15:06:51 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22 "-" "-"
10.28.215.233 (65.167.11.254, 10.28.215.233) - - [09/Jan/2013:15:06:54 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22 "-" "-"
10.28.215.233 (65.167.11.254, 10.28.215.233) - - [09/Jan/2013:15:06:57 +0000] "GET /published/tile/003331330031 HTTP/1.1" 498 22 "-" "-"
However in client side, still got 500 :-(
printMsg method:
protected Response printMsg(String msg, int intStatus)
{
// Need this due to the Resteasy bug
ServiceDataCollector.processRequest(true);
ResponseBuilder builder = Response.status(intStatus);
builder.type("text/plain");
builder.entity("ERROR: " + msg);
Response rep = builder.build();
LOG.error(rep.getStatus() + ":" + rep.toString());
return rep;
}
Some one help me to work the problem out. I had the httpd deployed in my AMI before tomcat server at 80. So the load balancer will interact with httpd server, which change the status code from tomcat to 500. Disable that httpd server will solve the problem. Thx for everyone's help.