How to remove old subdomain from Google bot - django

I can't seem to find an answer to this question.
I had an old subdomain, lets say asdasd.example.com
This subdomain site no longer exists. However, I keep getting error emails from Django about invalid httphost:
SuspiciousOperation: Invalid HTTP_HOST header (you may need to set ALLOWED_HOSTS): asdasd.example.com
Since the subsite no longer exists, I cannot use robots.txt.
So how can I stop the crawler from trying to index this page that no longer exists?
It doesn't just try to index asdasd.example.com but also asdasd.example.com/frontpage and other urls, which used to be valid.

Related

Why is google trying to access my backend server?

I have a productionized Django backend server running as on Kubernetes (Deployment/Service/Ingress) on GCP.
My django is configured with something like
ALLOWED_HOSTS = [BACKEND_URL,INGRESS_IP,THIS_POD_IP,HOST_IP]
Everything is working as expected.
However, my backend server logs intermittent errors like these (about 7 per day)
DisallowedHost: Invalid HTTP_HOST header: 'www.google.com'. You may need to add 'www.google.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'xxnet-f23.appspot.com'. You may need to add 'xxnet-f23.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'xxnet-301.appspot.com'. You may need to add 'xxnet-301.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'www.google.com'. You may need to add 'www.google.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'narutobm1234.appspot.com'. You may need to add 'narutobm1234.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'z-h-e-n-116.appspot.com'. You may need to add 'z-h-e-n-116.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'www.google.com'. You may need to add 'www.google.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'xxnet-131318.appspot.com'. You may need to add 'xxnet-131318.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'www.google.com'. You may need to add 'www.google.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'stoked-dominion-123514.appspot.com'. You may need to add 'stoked-dominion-123514.appspot.com' to ALLOWED_HOSTS.
My primary question is: Why - what are all of these hosts?.
I certainly don't want to allow those hosts without understanding their purpose.
Bonus question: What's the best way to silence unwanted hosts within my techstack?
My primary question is: Why - what are all of these hosts?.
Some of them are web crawlers that gather information for various purposes. For example, the www.google.com address is most likely the web crawlers that populate the search engine databases for Google search, etcetera.
Google probably got to your back-end site by accident by following a chain of links from some other page that is searchable; e.g. your front end website. You could try to identify that path. I believe there is also a page where you can request the removal of URLs from search ... though I'm not sure how effective that would be in quieting your logs.
Others may be robots probing your site for vulnerabilities.
I certainly don't want to allow those hosts without understanding their purpose.
Well, you can never entirely know their purpose. And in some cases, you may never be able to find out.
Bonus question: What's the best way to silence unwanted hosts within my techstack?
One way is to simply block access using a manually managed blacklist or whitelist.
A second way is to have your back-end publish a "/robots.txt" document; see About /robots.txt. Note that not all crawlers will respect a "robots.txt" page, but the reputable ones will; see How Google interprets the robots.txt specification.
Note that it is easy to craft a "/robots.txt" that says "nobody crawl this site".
Other ways would include putting your backend server behind a firewall or giving it a private IP address. (It seems a bit of an odd decision to expose your back-end services to the internet.)
Finally, the sites you are seeing are already being blocked, and Django is telling you that. Perhaps what you should be asking is how to mute the log messages for these events.
Django checks any request with a header against the ALLOWED_HOSTS setting. When it’s not there, Django throws the Invalid HTTP_HOST header error. See documentation.
These HTTP requests could be coming from bots with wrong host header value. You may want to consider checking Cloud Armor to block traffic from specific host header/domain.

Cookie “PHPSESSID” will be soon treated as cross-site cookie against <file> because the scheme does not match

I've just noticed my console is littered with this warning, appearing for every single linked resource. This includes all referenced CSS files, javascript files, SVG images, and even URLs from ajax calls (which respond in JSON). But not images.
The warning, for example in case of a style.css file, will say:
Cookie “PHPSESSID” will be soon treated as cross-site cookie against “http://localhost/style.css” because the scheme does not match.
But, the scheme doesn't match what? The document? Because that it does.
The URL of my site is http://localhost/.
The site and its resources are all on http (no https on localhost)
The domain name is definitely not different because everything is referenced relative to the domain name (meaning the filepaths start with a slash href="/style.css")
The Network inspector just reports a green 200 OK response, showing everything as normal.
It's only Mozilla Firefox that is complaining about this. Chromium seems to not be concerned by anything. I don't have any browser add-ons. The warnings seem to originate from the browser, and each warning links to view the corresponding file source in Debugger.
Why is this appearing?
that was exactly same happening with me. the issue was that, firefox keeps me showing even Cookies of different websites hosted on same URL : "localhost:Port number" stored inside browser memory.
In my case, i have two projects configured to run at http://localhost:62601, when i run first project, it saves that cookie in browser memory. when i run second project having same URL, Cookie is available inside that projects console also.
what you can do, is delete the all of the cookies from browser.
#Paramjot Singh's answer is correct and got me most of the way to where I needed to be. I also wasted a lot of time staring at those warnings.
But to clarify a little, you don't have to delete ALL of your cookies to resolve this. In Firefox, you can delete individual site cookies, which will keep your settings on other sites.
To do so, click the hamburger menu in the top right, then, Options->Privacy & Security or Settings->Privacy & Security
From here, scroll down about half-way and find Cookies and Site Data. Don't click Clear Data. Instead, click Manage Data. Then, search for the site you are having the notices on, highlight it, and Remove Selected
Simple, I know, but I made the mistake of clearing everything the first time - maybe this will prevent someone from doing same.
The warning is given because, according to MDN web docs:
Standards related to the Cookie SameSite attribute recently changed such that:
The cookie-sending behaviour if SameSite is not specified is SameSite=Lax. Previously the default was that cookies were sent for all requests.
Cookies with SameSite=None must now also specify the Secure attribute (they require a secure context/HTTPS).
Which indicates that a secure context/HTTPS is required in order to allow cross site cookies by setting SameSite=None Secure for the cookie.
According to Mozilla, you should explicitly communicate the intended SameSite policy for your cookie (rather than relying on browsers to apply SameSite=Lax automatically), otherwise you might get a warning like this:
Cookie “myCookie” has “SameSite” policy set to “Lax” because it is missing a “SameSite” attribute, and “SameSite=Lax” is the default value for this attribute.
The suggestion to simply delete localhost cookies is not actually solving the problem. The solution is to properly set the SameSite attribute of cookies being set by the server and use HTTPS if needed.
Firefox is not the only browser making these changes. Apparently the version of Chrome I am using (84.0.4147.125) has already implemented the changes as I got this message in the console:
The previously mentioned MDN article and this article by Mike Conca have great information about changes to SameSite cookie behavior.
Guess you are using WAMP or LAMP etc. The first thing you need to do is enable ssl on WAMP as you will find many references saying you need to adjust the cookie settings to SameSite=None; Secure That entails your local connection being secure. There are instructions on this link https://articlebin.michaelmilette.com/how-to-add-ssl-https-to-wampserver/ as well as some YouTube vids.
The important thing to note is that when creating the SSL certificate you should use sha256 encoding as sha1 is now deprecated and will throw another warning.
There is a good explanation of SameSite cookies on https://web.dev/samesite-cookies-explained/
I was struggling with the same issue and solved it by making sure the Apache 2.4 headers module was enabled and than added one line of code
Header always edit Set-Cookie ^(.")$ $1;HttpOnly;Secure
I wasted lots of time staring at the same sets of warnings in the Inspector until it dawned on me that the cookies were persisting and needed purging.
Apparently Chrome was going to introduce the new rules by now but Covid-19 meant a lot of websites might have been broken while people worked from home. The major browsers are working together on the SameSite attribute this so it will be in force soon.

How can I force trailing slash in static site hosted on Google Cloud Storage?

I have a website hosted on a Google Cloud Storage bucket, following the instructions on https://cloud.google.com/storage/docs/hosting-static-website. The site works, but navigating to any subdirectory page directly, such as https://example.com/blog, will redirect me to https://example.com/blog/index.html, and sometimes this results in another redirect to my 404 page. If I start at https://example.com, and navigate elsewhere, the site works fine.
This is with the MainPageSuffix set to index.html and NotFoundPage set to 404.html.
If I navigate to a subdirectory page with a trailing slash at the end (e.g. https://example.com/blog/), the site works fine. I’ve also looked st the troubleshooting advice for 301s, and it running through the steps did not work for me.
Is there any way to enforce the trailing slash for GCS buckets as a static site? If not, how can I get around the issues I am seeing with redirects to index.html?
If your MainPageSuffix is index.html, when you try to access a subdirectory directly such as https:// example.com/blog as you indicated, the service try to look for the target object or https:// example.com/blog/index.html. Same is also true for https:// example.com/blog/ assuming no zero-byte object exists for /blog/. In case a zero byte empty object exists for /blog/, See the Troubleshooting topic for removing this zero byte object. When the zero byte object is removed the system will show https:// example.com/blog/index.html. If no such object exists the system will show an error page "404.html" if it is set for NotFoundPage.
In your case if you include an index.html file under the subdirectory /blog/ it should resolve the issue by displaying the https:// example.com/blog/index.html page in both scenarios https://example.com/blog or https://example.com/blog/. Alternatively you need to provide the full path to access any particular object within the subdirectory.
For further info on how subdirectory works see the following links.
How Subdirectories Work
From Recommended: Assigning specialty pages:
An index page (also called a webserver directory index is a file served to visitors when they request a URL that doesn't have an associated file. When you assign a MainPageSuffix, Cloud Storage looks for a file with that name whose prefix matches the URL the visitor requested.
For example, say you set the MainPageSuffix of your static website to index.html. Additionally, say you have no file named directory in your bucket www.example.com. In this situation, if a user requests the URL http://www.example.com/directory, Cloud Storage attempts to serve the file www.example.com/directory/index.html. If that file doesn't exist, Cloud Storage returns an error page.
The MainPageSuffix also controls the file served when users request the top level site. Continuing the above example, if a user requests http://www.example.com, Cloud Storage attempts to serve the file www.example.com/index.html.
If you are still experiencing any issues, please provide the breakdown of your website so that specific solution for your problem can be provided and also indicate what specific outcome you are expecting.
For info, I intentionally inserted a space after each http:// to avoid including so many links in this post.

I want to remove cookies by its domain on firefox extension

I develop Firefox extension, but I can't remove cookie with specified domain. I want to remove cookie with specified domain on Firefox extension
example:
remove cookies of domain https://www.facebook.com
And I want cookies... Wait what?!
More seriously, you likely got your downvotes by saying "I want..." and not demonstrating that you made any attempt or at least did any research to solve this on your own.
Anyway:
Use nsICookieManager2.getCookiesFromHost and/or nsICookieManager.enumerator to get a list of cookies. See also: Reading existing cookies
Filter the cookies by your criteria, making sure your code doesn't remove more cookies than it needs to.
Remove the cookies you collected with nsICookieManager.remove.
Bonus: Use the notifications to listen for any new cookies and get rid of them.

How to delete Firefox Cookies and cache while program runs on Selenium Grid?

I am running Selenium Grid and most of my scripts fail due to inability to delete Firefox cookies.
Each testcase needs to delete browser cookies. If anyone knows how to do this please let me know.
You can use the deleteCookie function with Selenium to get rid of the cookies and you can put that in your Test Setup. The documentation for deleteCookie is below
deleteCookie(name, optionsString)
Arguments:
* name - the name of the cookie to be deleted
* optionsString - options for the cookie. Currently supported
options include 'path', 'domain' and
'recurse.' The optionsString's format
is "path=/path/, domain=.foo.com,
recurse=true". The order of options
are irrelevant. Note that specifying a
domain that isn't a subset of the
current domain will usually fail.
Delete a named cookie with specified path and domain. Be careful;
to delete a cookie, you need to delete
it using the exact same path and
domain that were used to create the
cookie. If the path is wrong, or the
domain is wrong, the cookie simply
won't be deleted. Also note that
specifying a domain that isn't a
subset of the current domain will
usually fail. Since there's no way to
discover at runtime the original path
and domain of a given cookie, we've
added an option called 'recurse' to
try all sub-domains of the current
domain with all paths that are a
subset of the current path. Beware;
this option can be slow. In big-O
notation, it operates in O(n*m) time,
where n is the number of dots in the
domain name and m is the number of
slashes in the path.