Why is google trying to access my backend server? - django

I have a productionized Django backend server running as on Kubernetes (Deployment/Service/Ingress) on GCP.
My django is configured with something like
ALLOWED_HOSTS = [BACKEND_URL,INGRESS_IP,THIS_POD_IP,HOST_IP]
Everything is working as expected.
However, my backend server logs intermittent errors like these (about 7 per day)
DisallowedHost: Invalid HTTP_HOST header: 'www.google.com'. You may need to add 'www.google.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'xxnet-f23.appspot.com'. You may need to add 'xxnet-f23.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'xxnet-301.appspot.com'. You may need to add 'xxnet-301.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'www.google.com'. You may need to add 'www.google.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'narutobm1234.appspot.com'. You may need to add 'narutobm1234.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'z-h-e-n-116.appspot.com'. You may need to add 'z-h-e-n-116.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'www.google.com'. You may need to add 'www.google.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'xxnet-131318.appspot.com'. You may need to add 'xxnet-131318.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'www.google.com'. You may need to add 'www.google.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'stoked-dominion-123514.appspot.com'. You may need to add 'stoked-dominion-123514.appspot.com' to ALLOWED_HOSTS.
My primary question is: Why - what are all of these hosts?.
I certainly don't want to allow those hosts without understanding their purpose.
Bonus question: What's the best way to silence unwanted hosts within my techstack?

My primary question is: Why - what are all of these hosts?.
Some of them are web crawlers that gather information for various purposes. For example, the www.google.com address is most likely the web crawlers that populate the search engine databases for Google search, etcetera.
Google probably got to your back-end site by accident by following a chain of links from some other page that is searchable; e.g. your front end website. You could try to identify that path. I believe there is also a page where you can request the removal of URLs from search ... though I'm not sure how effective that would be in quieting your logs.
Others may be robots probing your site for vulnerabilities.
I certainly don't want to allow those hosts without understanding their purpose.
Well, you can never entirely know their purpose. And in some cases, you may never be able to find out.
Bonus question: What's the best way to silence unwanted hosts within my techstack?
One way is to simply block access using a manually managed blacklist or whitelist.
A second way is to have your back-end publish a "/robots.txt" document; see About /robots.txt. Note that not all crawlers will respect a "robots.txt" page, but the reputable ones will; see How Google interprets the robots.txt specification.
Note that it is easy to craft a "/robots.txt" that says "nobody crawl this site".
Other ways would include putting your backend server behind a firewall or giving it a private IP address. (It seems a bit of an odd decision to expose your back-end services to the internet.)
Finally, the sites you are seeing are already being blocked, and Django is telling you that. Perhaps what you should be asking is how to mute the log messages for these events.

Django checks any request with a header against the ALLOWED_HOSTS setting. When it’s not there, Django throws the Invalid HTTP_HOST header error. See documentation.
These HTTP requests could be coming from bots with wrong host header value. You may want to consider checking Cloud Armor to block traffic from specific host header/domain.

Related

Can I stop the significant amount of [Django] ERROR (EXTERNAL IP): Invalid HTTP_HOST header from strange sites I'm getting?

Since adding the option to email me (the admin) when there are problems with my Django server, I keep getting a LOT of the following emails (20 in the last hour alone).
[Django] ERROR (EXTERNAL IP): Invalid HTTP_HOST header: 'staging.menthanhgiadalat.com'. You may need to add 'staging.menthanhgiadalat.com' to ALLOWED_HOSTS.
I've set my server up to have the following at the top of the file in my sites-enabled nginx config as I read (somewhere on SO) that this would may prevent me from getting these types of emails:
server {
server_name _;
return 444;
}
But it hasn't done anything.
In the next server block I have the IP address and domain names for my site. Could this be causing the problem?
This 'staging' site isn't the only domain I'm being asked to add to my ALLOWED_HOSTS. But it is, by far, the most frequent.
Can I stop this type of alert being sent? Can I stop it from being raised? Is there something I've configured incorrectly on my server (I'm ashamed to admit I'm pretty new at this).
Thanks for any help you might be able to give.
You can configure LOGGING in your settings.py to silence django.security.DisallowedHost as directed at https://docs.djangoproject.com/en/3.2/topics/logging/#django-security

Invalid HOST Header from router IP

I keep getting an Invalid HOST Header error which I am trying to find the cause of. It reads as such:
Report at /GponForm/diag_Form
Invalid HTTP_HOST header: '192.168.0.1:443'. You may need to add '192.168.0.1' to ALLOWED_HOSTS
I do not know what /GponForm/diag_Form is but from the looks of it, it may be a vulnerability attacked by malware.
I also am wondering why the IP is from a router 192.168.0.1 as well as why it is coming through SSL :443
Should I consider putting a HoneyPot and blocking this IP address? Before I do, why does the IP look like a local router?
The full Request URL in the report looks like this:
Request URL: https://192.168.0.1:443/GponForm/diag_Form?style/
I am getting this error at least ~10x/day now so I would like to stop it.
Yes, this surely represents a vulnerability - someone tried to access this url on router (which usually have ip 192.168.0.1).
It looks so because request from attacker contains HOST header with this value.
Maybe django is run locally with DEBUG=True.
You may consider running it more production wised with web-server (i.e. nginx) in front filtering unwanted requests with nginx config and further adding fail2ban to parse nginx error logs and ban ip.
Or make site available only from specific ips / ads simple authorization, i.e. Basic Auth on web-server level.
Previous irrelevant answer
ALLOWED_HOSTS option specifies domains django project can serve.
In running locally - python manage.py runserver or with DEBUG=True - it defaults to localhost, 127.0.0.1 and similar.
If you are accessing django via different url - it will complain in such a manner.
To allow access from another domains - add them to ALLOWED_HOSTS: ALLOWED_HOSTS = ['localhost', '127.0.0.1', '[::1]', '192.168.0.1'].

unknown Invalid HTTP_HOST header in Django logs: api-keyboard.cmcm.com

I have been testing a new Django application on aws beanstalk. While looking through the httpd error logs I see thousands of lines like this:
... Invalid HTTP_HOST header: 'api-keyboard.cmcm.com'. You may need to add 'api-keyboard.cmcm.com' to ALLOWED_HOSTS.*
Normally this is because I didn't add my own hostname to ALLOWED_HOSTS but this domain is completely foreign to me and I can't find references to it online.
So I'm wondering what this means, how do random host like this end up in the header, and if anyone recognized this one.
Thanks!

Django in Elastic BeanStalk getting too many Invalid HTTP_HOST header errors

I have deployed my Django application at AWS Elastic Bean Stalk server. Now I am getting too many invalid http host error from different IP addresses including localhost and http as following
SuspiciousOperation: Invalid HTTP_HOST header (you may need to set ALLOWED_HOSTS): 172.31.0.67
SuspiciousOperation: Invalid HTTP_HOST header (you may need to set ALLOWED_HOSTS): 172.31.22.203
SuspiciousOperation: Invalid HTTP_HOST header (you may need to set ALLOWED_HOSTS): check.proxyradar.com
SuspiciousOperation: Invalid HTTP_HOST header (you may need to set ALLOWED_HOSTS): testp2.czar.bielawa.pl
'PATH_TRANSLATED': '/opt/python/current/app/coinn/coinn/wsgi.py/testproxy.php',
In Elastic BeanStalk security Group I have given following access
Type | Protocol | Port | Source
HTTP | TCP | 80 | Anywhere | 0.0.0.0/0
Are these error coming from automatic health check performed by Load balancer or some one trying to hack my aws instance system ?
I am saying the former because if I am removing the localhost and 127.0.0.1 from ALLOWED_HOST list in the django setting I started getting the same error from locahost as well as following :
SuspiciousOperation: Invalid HTTP_HOST header (you may need to set ALLOWED_HOSTS): localhost
Please suggest the possible cause and resolution of this issue.
Are these error coming from automatic health check performed by Load
balancer or some one trying to hack my aws instance system?
The load balancer is certainly not going to be setting the HTTP_HOST header to values like "check.proxyradar.com" and "testp2.czar.bielawa.pl" so I think we can definitely rule out the ELB health checks.
As to if they are someone trying to hack your system, or something more benign, that is more difficult to answer. You might want to look at this related question, and the answer which states that this is probably someone probing your site for vulnerabilities.

How to remove old subdomain from Google bot

I can't seem to find an answer to this question.
I had an old subdomain, lets say asdasd.example.com
This subdomain site no longer exists. However, I keep getting error emails from Django about invalid httphost:
SuspiciousOperation: Invalid HTTP_HOST header (you may need to set ALLOWED_HOSTS): asdasd.example.com
Since the subsite no longer exists, I cannot use robots.txt.
So how can I stop the crawler from trying to index this page that no longer exists?
It doesn't just try to index asdasd.example.com but also asdasd.example.com/frontpage and other urls, which used to be valid.