Avoiding double caching of items available from different URIs using Varnish - regex

In the Varnish Cache wiki it states an example of how to regsub to avoid caching request to www.example.com and example.com separately. The example from https://www.varnish-cache.org/trac/wiki/RedirectsAndRewrites is:
set req.http.host = regsub(req.http.host, "^www\.example\.com$","example.com");
"Requests to www.example.com and example.com will all go to the backend as "example.com" and end up cached by that string." This means duplicate caching does not occur.
I have multiple sites using the same varnish server (VCL) so am looking to replace "example.com" with a statement that will work on multiple URLs. eg:
www.example1.co.uk > example1.co.uk
www.example2.com > example2.com
What would be the appropriate regex (if that is the correct term) for this?
There are multiple separate domains (different sites with different content on different domains) using this VCL I am hoping to avoid having to alter the vcl when new sites are added/removed. Therefore a generic solution is what I am after, something that can be applied to any domain to remove the possibility of a duplicate with/without the WWW alias being store/served by Varnish. (Having trouble phrasing this, hope it is clearer!!)
I am aware that redirecting can be done outside of varnish, in Apache etc, but not looking for that as a solution.

set req.http.host = regsub(req.http.host,
"^www\.(.*)$",
"\1");
This will strip www off any domain. (I do feel reluctant to give you this answer, as it goes against my religion)
You might get penalized by search engines for serving the same content on multiple URLs, but SEO is a different topic.

Instead of what Chris suggested, you can just remove the www part:
set req.http.host = regsub(req.http.host, "^www\.", "");
Should be a teeny tiny bit faster, too

Related

Consistent user authorization across url with/without www

I need to clarify a fundamental concept (beginner here).
In a Django web app I maintain, I notice that if one logs in via going to example.com, they remain logged out on www.example.com (and can then go on to create a clone account).
1) Why does this happen?
2) What's the standard practice to iron out this issue? I.e., give one consistent experience across www and no-www.
In case the answer is as basic as just a redirection, I could use some pointers and an illustrative example there too - I'm using nginx reverse proxy with gunicorn.
1 ) Django cookies do not work for same with a prepended www and non-www domain by default.Django considers it as a different sessions.
2) The PREPEND_WWW setting you can set to redirect your xyz.com to www.xyz.com.
PREPEND_WWW = True
or if you need same cookie to both of the sites you can use session_cookie_domain,
SESSION_COOKIE_DOMAIN = ".yoursite.com"

"URL with WWW and URL without WWW" -Is there any difference between them?

i have noticed one things , when some website are opened in any browser then in URL bar some are like
http://www.something.com
where some are like
http://something.com
here www is missing. Same things is happening with my blog url
if i write in URL bar as
http://www.shareprogrammingtips.com/
then it automatic converted in
http://shareprogrammingtips.com/
i am not getting why this happening ? is there any difference url with www and url without www ?
Edit:
one more thing i have notice is that url with www take longer time to open website then url without www takes..!
It does not matter if you have www in the URL or not, as long as you use the same URL always. This must be happening probably because your server is set-up to redirect the http://www.shareprogrammingtips.com/ to http://shareprogrammingtips.com/.
This will make sure that all the pages will always come to http://shareprogrammingtips.com/ and also search engines would index your site as http://shareprogrammingtips.com/. If your site is accessible from both http://www.shareprogrammingtips.com/ and http://shareprogrammingtips.com/ then the search engines would index both versions of your site, but the page rank of your site will be divided between these 2 versions as for search engines both these sites are different sites.
In the past, every URL required the www. prefix (i.e. www.hello.com). Nowadays we have naked domains which don't require this prefix (i.e. hello.com). We still have many domains with the www. prefix for legacy reasons.
When a company wants to buy a domain name, they can buy it either with or without the prefix, or get both (for example, buy the naked domain and set-up the same domain with the www. prefix as sub-domain) and configure both to load the same website. There are technical reasons to chosing a domain with a www. prefix (allows for certain cookie blocking polices) or a naked domain (shorter url).
Usually, one of the two will be the canonical (real) domain, while the other will only redirect to the real domain. This redirect causes a delay but it's there for a reason.
If you code this redirect the right way, search engines will understand that both are the same website. Otherwise if you skip the redirect and point both domains to your files directly, search engines will think they are separate websites which will hurt your SEO (Search Engine Optimization).

A regular expression to find domains that will be cached by Varnish

I ask this question here because I think this is more a regex question than an actual varnish question.
What I basically want to do, is to define a list of domains that varnish will cache. There is already an answer given to this, but I want to use a different approach.
Varnish: cache only specific domain
Now the code that is used in this answer is the following:
sub vcl_recv {
# dont cache foo.com or bar.com - optional www
if (req.http.host ~ "(www)?\.(foo|bar)\.com") {
pass;
}
# cache foobar.com - optional www
if ( req.http.host ~ "(www)?\.foobar\.com" ) {
lookup;
}
}
Now what I want is a little different. I have names with different TLD, but I only want to have the non-www version of the domain cached.
So only mydomain.com, or myotherdomain.nl, or yadomain.net
Any other subdomain can be passed to the backend.
If I'm reading your question correctly you have a list with multiple versions of the same website, so both www.foobar.com AND/OR foobar.com and you want to match ONLY foobar.com
If so, you need to back reference with (?<!www). So a search that would only match domains without the preceding www would be (?<!www\.)((?:[^.\s]+)\.(?:com|net|nl))
Hope that helps
EDIT
(?<!www\.)((?:[^.\s]+)\.(?:\w{2,3})) if domains are unknown

django www vs non-www issue with middleware authentication

I have been having inconsistent behavior with my Django app.
If I login with no www, and then prepend www, it's not authenticated, and all the combinations thereof. (www.mydomain.com and mydomain.com like different sites in terms of auth)
If the authentication code is important, I wrote a middleware based on the tutorial here: http://onecreativeblog.com/post/59051248/django-login-required-middleware
So far I have fixed the issue forcing the appending of www, using PREPEND_WWW = True, but I would still like to understand the issue;)
Does anyone have an idea of what may be going on?
Thanks in advance!
What Zaha Zorg said: Cookies from Django won't work for both a prepended www and non-www domain by default.
However, the deeper issue here is that you're allowing both www and non-www domains of your site to serve identical content. Besides the obvious SEO consequences of having traffic split between the two, you run into issues like these. The proper way to handle this is to redirect all traffic from one to the other (whichever you prefer). The PREPEND_WWW setting you found works perfectly for this. For the opposite (forcing all traffic to non-www), it's recommended to just do a re-write at the server configuration level, such as Apache or Nginx.
You need to look at https://docs.djangoproject.com/en/dev/ref/settings/?from=olddocs#session-cookie-domain
SESSION_COOKIE_DOMAIN
Default: None
The domain to use for session cookies. Set this to a string such as ".lawrence.com" for cross-domain cookies, or use None for a standard domain cookie. See the How to use sessions.
Could it be that cookies depend on the hostname of the server ? This could explain why both domain names are considered different.

Sitecore Multiple Sites - Using Wildcards With hostName Attribute

I'm having an issue with getting a site set up in the web.config file for a Sitecore site. Specifically I can't figure how to use the hostName property to capture the "www" subdomain for a domain (e.g. www.mydomain.com) as well as no specified subdomain (e.g. mydomain.com).
I've experimented a little and found that I can do something like *.mydomain.com and it works. But the problem is that we want users to also be able to go to just mydomain.com and have the site come up. When I have the hostName configured as *.mydomain.com this apparently is not possible.
Any ideas? The Sitecore developer network doesn't say too much on this (unless it's hidden somewhere I couldn't find).
Craig
For a bit more precision than Mark's removing the dot (which will work) you can use pipe separation to list alternative names:
<site hostName="mydomain.com | *.mydomain.com" ... />
That would allow you to configure a second site reallymydomain.com without it being caught by the hostName above. Remember the sites list will be processed in order, so the first match counts even if there's a second match that is more specific.
Try no dot in the hostName:
<site hostName="*mydomain.com" ... />
Both Mark and James answers are correct and will help you resolve multiple domain/subdomain names to a single Sitecore site.
You may instead want to consider setting up a redirect in IIS from the non www domain to the www sub-domain or vice-versa. Having more than one definitive URL for your domain can negatively effect your page rank.
This is a handy module for IIS 7 to help you define redirects. http://www.iis.net/download/urlrewrite