I have created an amazon ec2 instance and I am hosting a flask server (the public ip of the server is known only to another server, it is not meant to be used by clients but only by another computer).
For some reason, I am receiving a weird network activity:
From the logs:
162.142.125.10 - - [18/Apr/2022 19:45:39] "GET / HTTP/1.1" 200 -
118.123.105.85 - - [18/Apr/2022 20:06:30] "GET / HTTP/1.0" 200 -
198.235.24.20 - - [18/Apr/2022 22:37:16] "GET / HTTP/1.1" 200 -
128.14.209.250 - - [19/Apr/2022 01:24:07] "GET / HTTP/1.1" 200 -
128.14.209.250 - - [19/Apr/2022 01:24:15] code 400, message Bad request version ('À\x14À')
128.14.209.250 - - [19/Apr/2022 07:05:32] "▬♥☺ ±☺ ♥♥Ýfé$0±6nu♀¤♫ëe éSV∟É#☼ß↨♠\ VÀ◄ÀÀ‼À À¶À" HTTPStatus.BAD_REQUEST -
I have looked all these IPs and they are across the globe.
Why am I getting these kind of requests ? What are they probably trying to achieve ?
[EDIT]
162.142.125.10 -> https://about.censys.io/
118.123.105.85 -> ChinaNet Sichuan Province Network
198.235.24.20 -> Palo Alto Networks Inc
128.14.209.250 -> zl-dal-us-gp1-wk123.internet-census.org
As others said, it's common that bots and (ethical?) hackers around the world scan your machine if it's on a public network.
Your assumption that "the public ip of the server is known only to another server" simply isn't true.
If you want to achieve that, you should place your server inside a private VPC subnet and/or allow the traffic only from the specific server via Security Group configuration.
Related
We're currently experiencing an issue with our GCP Kubernetes which is forwarding client requests to pods that have been assigned IPs that other pods within the cluster have /previously/ gotten. The way we can see this is by using the following query in Logs Explorer:
resource.type="http_load_balancer"
httpRequest.requestMethod="GET"
httpRequest.status=404
Snippet from one of the logs:
httpRequest: {
latency: "0.017669s"
referer: "https://asdf.com/"
remoteIp: "5.57.50.217"
requestMethod: "GET"
requestSize: "34"
requestUrl: "https://[asdf.com]/api/service2/[...]"
responseSize: "13"
serverIp: "10.19.160.16"
status: 404
userAgent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
}
...where the requestUrl property indicates the incoming URL to the load balancer.
Then I search for the IP 10.19.160.16 to find out which pod the IP is assigned to:
c:\>kubectl get pods -o wide | findstr 10.19.160.16
service1-675bfc4f97-slq6g 1/1 Terminated 0 40h 10.19.160.16 gke-namespace-te-namespace-te-153a9649-p2mg
service2-574d69cf69-c7knp 0/1 Error 0 3d16h 10.19.160.16 gke-namespace-te-namespace-te-153a9649-p2mg
service3-6db4c97784-428pq 1/1 Running 0 16h 10.19.160.16 gke-namespace-te-namespace-te-153a9649-p2mg
So based on requestUrl the request should have been sent to service2. Instead, what we see is that it gets sent to service3 because it's gotten the IP that service2 once used to have, in other words it seems that the cluster still thinks that service2 is holding on to the IP 10.19.160.16. The effect is that service3 returns status code 404 because it doesn't recognize the endpoint.
This behavior only stops if we manually delete the pods in failed state (eg Error or Terminated) by using the kubectl delete pod ... command.
We suspect that this behavior started since we upgraded our cluster to v1.23 which required us to migrate away from extensions/v1beta1 to networking.k8s.io/v1 as described in https://cloud.google.com/kubernetes-engine/docs/deprecations/apis-1-22.
Our test environment is using pre-emptible VM and whilst we're not 100% (but pretty close) sure it seems like the pods end in Error state after a node is pre-empted.
Why does the cluster still think that a dead pod still has the IP that it used to have? Why is the problem gone after deleting failed pods? Shouldn't they have been cleaned up after a node pre-emption?
Gari Singh provided the answer in the comment.
In Slack, I have set up an app with a slash command. The app works well when I use a local ngrok server.
However, when I deploy the app server to PCF, it is returning 502 errors:
[CELL/0] [OUT] Downloading droplet...
[CELL/SSHD/0] [OUT] Exit status 0
[APP/PROC/WEB/0] [OUT] Exit status 143
[CELL/0] [OUT] Cell e6cf018d-0bdd-41ca-8b70-bdc57f3080f1 destroying container for instance 28d594ba-c681-40dd-4514-99b6
[PROXY/0] [OUT] Exit status 137
[CELL/0] [OUT] Downloaded droplet (81.1M)
[CELL/0] [OUT] Cell e6cf018d-0bdd-41ca-8b70-bdc57f3080f1 successfully destroyed container for instance 28d594ba-c681-40dd-4514-99b6
[APP/PROC/WEB/0] [OUT] ⚡️ Bolt app is running! (development server)
[OUT] [APP ROUTE] - [2021-12-23T20:35:11.460507625Z] "POST /slack/events HTTP/1.1" 502 464 67 "-" "Slackbot 1.0 (+https://api.slack.com/robots)" "10.0.1.28:56002" "10.0.6.79:61006" x_forwarded_for:"3.91.15.163, 10.0.1.28" x_forwarded_proto:"https" vcap_request_id:"7fe6cea6-180a-4405-5e5e-6ba9d7b58a8f" response_time:0.003282 gorouter_time:0.000111 app_id:"f1ea0480-9c6c-42ac-a4b8-a5a4e8efe5f3" app_index:"0" instance_id:"f46918db-0b45-417c-7aac-bbf2" x_cf_routererror:"endpoint_failure (use of closed network connection)" x_b3_traceid:"31bf5c74ec6f92a20f0ecfca00e59007" x_b3_spanid:"31bf5c74ec6f92a20f0ecfca00e59007" x_b3_parentspanid:"-" b3:"31bf5c74ec6f92a20f0ecfca00e59007-31bf5c74ec6f92a20f0ecfca00e59007"
Besides endpoint_failure (use of closed network connection), I also see:
x_cf_routererror:"endpoint_failure (EOF (via idempotent request))"
x_cf_routererror:"endpoint_failure (EOF)"
In PCF, I created an https:// route for the app. This is the URL I put into my Slack App's "Redirect URLs" section as well as my Slash command URL.
In Slack, the URLs end with /slack/events
This configuration all works well locally, so I guess I missed a configuration point in PCF.
Manifest.yml:
applications:
- name: kafbot
buildpacks:
- https://github.com/starkandwayne/librdkafka-buildpack/releases/download/v1.8.2/librdkafka_buildpack-cached-cflinuxfs3-v1.8.2.zip
- https://github.com/cloudfoundry/python-buildpack/releases/download/v1.7.48/python-buildpack-cflinuxfs3-v1.7.48.zip
instances: 1
disk_quota: 2G
# health-check-type: process
memory: 4G
routes:
- route: "kafbot.apps.prod.fake_org.cloud"
env:
KAFKA_BROKER: 10.32.17.182:9092,10.32.17.183:9092,10.32.17.184:9092,10.32.17.185:9092
SLACK_BOT_TOKEN: ((slack_bot_token))
SLACK_SIGNING_SECRET: ((slack_signing_key))
command: python app.py
When x_cf_routererror says endpoint_failure it means that the application has not handled the request sent to it by Gorouter for some reason.
From there, you want to look at response_time. If the response time is high (typically the same value as the timeout, like 60s almost exactly), it means your application is not responding quickly enough. If the value is low, it could mean that there is a connection problem, like Gorouter tries to make a TCP connection and cannot.
Normally this shouldn't happen. The system has a health check in place that makes sure the application is up and listening for requests. If it's not, the application will not start correctly.
In this particular case, the manifest has health-check-type: process which is disabling the standard port-based health check and using a process-based health check. This allows the application to start up even if it's not on the right port. Thus when Gorouter sends a request to the application on the expected port, it cannot connect to the application's port. Side note: typically, you'd only use process-based health checks if your application is not listening for incoming requests.
The platform is going to pass in a $PORT env variable with a value in it (it is always 8080, but could change in the future). You need to make sure your app is listening on that port. Also, you want to listen on 0.0.0.0, not localhost or 127.0.0.1.
This should ensure that Gorouter can deliver requests to your application on the agreed-upon port.
What is the best way to identify the object that consumes the most bandwidth in a S3 bucket with thousands of other objects ?
By "bandwidth" I will assume that you mean the bandwidth consumed by delivering files from S3 to some place on the Internet (as when you use S3 to serve static assets).
To track this, you'll need to enable S3 access logs, which creates logfiles in a different bucket that show all of the operations against your primary bucket (or a path in it).
Here are two examples of logged GET operations. The first is from anonymous Internet access using a public S3 URL, while the second uses the AWS CLI to download the file. I've redacted or modified any identifying fields, but you should be able to figure out the format from what remains.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx com-example-mybucket [04/Feb/2020:15:50:00 +0000] 3.4.5.6 - XXXXXXXXXXXXXXXX REST.GET.OBJECT index.html "GET /index.html HTTP/1.1" 200 - 90 90 9 8 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0" - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx - ECDHE-RSA-AES128-GCM-SHA256 - com-example-mybucket.s3.amazonaws.com TLSv1.2
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx com-example-mybucket [05/Feb/2020:14:51:44 +0000] 3.4.5.6 arn:aws:iam::123456789012:user/me XXXXXXXXXXXXXXXX REST.GET.OBJECT index.html "GET /index.html HTTP/1.1" 200 - 90 90 29 29 "-" "aws-cli/1.17.7 Python/3.6.9 Linux/4.15.0-76-generic botocore/1.14.7" - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader com-example-mybucket.s3.amazonaws.com TLSv1.2
So, to get what you want:
Enable logging
Wait for a representative amount of data to be logged. At least 24 hours unless you're a high-volume website (and note that it can take up to an hour for log records to appear).
Extract all the lines that contain REST.GET.OBJECT
From these, extract the filename and the number of bytes (in this case, the file is 90 bytes).
For each file, multiply the number of bytes by the number of times that it appears in a given period.
Beware: because every access is logged, the logfiles can grow quite large, quite fast, and you will pay for storage charges. You should create a life-cycle rule on the destination bucket to delete old logs.
Update: you could also use Athena to query this data. Here's an AWS blog post that describes the process.
Servers are accessible normally. Checking /
(default page)
We will have some sort of load, these will respond a little slower than it likes. Then Take down the Instances from our load balancer.
Because the Application doesnt fail, I cannot "reboot" the instances via Ec2. I can often access the webpage / IP direct myself when it's "out of service"
This isn't a general failure or a misconfiguration, it can be up for 12-2400 hours, but then randomly fail 3x in 3 hrs. Under medium-low load.
Server set to 10s response timeouts, 30s intervals, 5x to Fail; 2x to say its ok.
Any ideas?
Health check logs are responding normal, and nothing in ERRORS. Heres a sample from access:
10.0.100.30 - - [25/Nov/2016:06:49:22 +0000] "GET /index.html HTTP/1.1" 200 11415 "-" "ELB-HealthChecker/1.0"
::1 - - [25/Nov/2016:06:49:26 +0000] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.4.20 (Ubuntu) (internal dummy connection)"
I have a server on eb (running a tomcat application), I also have a CloudFront cache setup to cache duplicate requests so that they dont go to the server.
I have two behaviours set up
/artist/search
/Default(*)
and Default(*) is set to:
Allowed Http Methods :GET,PUT
Forward Headers :None
Headers :Customize
Timeout :84,0000
Forward Cookies :None
Forward Query Strings :Yes
Smooth Streaming :No
Restricted View Access:No
so there is no timeout and the only thing it forwards are queries strings
Yet I can see from looking at the localhost_access_log file that my server id receiving duplicate requests:
127.0.0.1 - - [22/Apr/2015:10:58:28 +0000] "GET /artist/cee3e39e-fb10-414d-9f11-b50fa7d6fb7a HTTP/1.1" 200 1351114
127.0.0.1 - - [22/Apr/2015:10:58:29 +0000] "GET /artist/cee3e39e-fb10-414d-9f11-b50fa7d6fb7a HTTP/1.1" 200 1351114
127.0.0.1 - - [22/Apr/2015:10:58:38 +0000] "GET /artist/cee3e39e-fb10-414d-9f11-b50fa7d6fb7a HTTP/1.1" 200 1351114
I can also see from my CloudFront Popular Objects page there are many objects that hit sometimes and miss sometimes including these artist urls, I was expecting only one miss and there all the rest to be hits
Why would this be ?
Update
Looking more carefully it seems (although not sure about this) that less likely to be cached as the size of the artist page increases, but extra weirdly even if the main artist page is larger it also seems to reget everthing referenced in that page such as icons (pngs) but not when the artist page is small. This is the worst outcome for me because it is the large artist pages that need more processing to create on the server - this is why I using cloudfront to try and avoid the recreation of these pages in the first place.
What you are seeing is a combination of two reasons:
Each individual CloudFront POP requests object separately, so if your viewers are in different locations you can expect multiple queries to your origin server (and they will be misses)
I'm not sure about report date range you are looking at, but CloudFront eventually evicts less popular objects to make room in cache for new objects