A regular expression to find domains that will be cached by Varnish - regex

I ask this question here because I think this is more a regex question than an actual varnish question.
What I basically want to do, is to define a list of domains that varnish will cache. There is already an answer given to this, but I want to use a different approach.
Varnish: cache only specific domain
Now the code that is used in this answer is the following:
sub vcl_recv {
# dont cache foo.com or bar.com - optional www
if (req.http.host ~ "(www)?\.(foo|bar)\.com") {
pass;
}
# cache foobar.com - optional www
if ( req.http.host ~ "(www)?\.foobar\.com" ) {
lookup;
}
}
Now what I want is a little different. I have names with different TLD, but I only want to have the non-www version of the domain cached.
So only mydomain.com, or myotherdomain.nl, or yadomain.net
Any other subdomain can be passed to the backend.

If I'm reading your question correctly you have a list with multiple versions of the same website, so both www.foobar.com AND/OR foobar.com and you want to match ONLY foobar.com
If so, you need to back reference with (?<!www). So a search that would only match domains without the preceding www would be (?<!www\.)((?:[^.\s]+)\.(?:com|net|nl))
Hope that helps
EDIT
(?<!www\.)((?:[^.\s]+)\.(?:\w{2,3})) if domains are unknown

Related

How to set routing between nginx locations based on regex with wildcard

I have some http api behind nginx, and i want to make filter requests to API based on requests parameters value. Parameters are passed directly in url like
https://api.com/api/v1/action?param1=value1&param2=value2&etc...
Lets assume that i want to filter requests with some value of param2 to some other url.
I thought that it will be easy like
location ~* /api/.*param2=somevalue.* { #location; }
But nginx cant find the match even if there is no alternative location at all.
I'm confused. Are these wildcards are truly wildcards, or I miss something? But what?
I already tried escaping and different modifiers but no luck. :(

Avoiding double caching of items available from different URIs using Varnish

In the Varnish Cache wiki it states an example of how to regsub to avoid caching request to www.example.com and example.com separately. The example from https://www.varnish-cache.org/trac/wiki/RedirectsAndRewrites is:
set req.http.host = regsub(req.http.host, "^www\.example\.com$","example.com");
"Requests to www.example.com and example.com will all go to the backend as "example.com" and end up cached by that string." This means duplicate caching does not occur.
I have multiple sites using the same varnish server (VCL) so am looking to replace "example.com" with a statement that will work on multiple URLs. eg:
www.example1.co.uk > example1.co.uk
www.example2.com > example2.com
What would be the appropriate regex (if that is the correct term) for this?
There are multiple separate domains (different sites with different content on different domains) using this VCL I am hoping to avoid having to alter the vcl when new sites are added/removed. Therefore a generic solution is what I am after, something that can be applied to any domain to remove the possibility of a duplicate with/without the WWW alias being store/served by Varnish. (Having trouble phrasing this, hope it is clearer!!)
I am aware that redirecting can be done outside of varnish, in Apache etc, but not looking for that as a solution.
set req.http.host = regsub(req.http.host,
"^www\.(.*)$",
"\1");
This will strip www off any domain. (I do feel reluctant to give you this answer, as it goes against my religion)
You might get penalized by search engines for serving the same content on multiple URLs, but SEO is a different topic.
Instead of what Chris suggested, you can just remove the www part:
set req.http.host = regsub(req.http.host, "^www\.", "");
Should be a teeny tiny bit faster, too

What regular expression will match a domain name without a 3rd level?

What is the most efficient regex that will match these domains, without having to specify any rules to ignore?
Example matches:
domain.com
test.com
example.net
company.org
Example Ignore:
dev.domain.com
m.domain.com
www.domain.com
Any top level domain is possible. Essentially I am trying to make sure the domain doesnt already have a 3rd level.
To match a domain with any TLD use this:
^[^.\s]+\.[^.\s]+$

"URL with WWW and URL without WWW" -Is there any difference between them?

i have noticed one things , when some website are opened in any browser then in URL bar some are like
http://www.something.com
where some are like
http://something.com
here www is missing. Same things is happening with my blog url
if i write in URL bar as
http://www.shareprogrammingtips.com/
then it automatic converted in
http://shareprogrammingtips.com/
i am not getting why this happening ? is there any difference url with www and url without www ?
Edit:
one more thing i have notice is that url with www take longer time to open website then url without www takes..!
It does not matter if you have www in the URL or not, as long as you use the same URL always. This must be happening probably because your server is set-up to redirect the http://www.shareprogrammingtips.com/ to http://shareprogrammingtips.com/.
This will make sure that all the pages will always come to http://shareprogrammingtips.com/ and also search engines would index your site as http://shareprogrammingtips.com/. If your site is accessible from both http://www.shareprogrammingtips.com/ and http://shareprogrammingtips.com/ then the search engines would index both versions of your site, but the page rank of your site will be divided between these 2 versions as for search engines both these sites are different sites.
In the past, every URL required the www. prefix (i.e. www.hello.com). Nowadays we have naked domains which don't require this prefix (i.e. hello.com). We still have many domains with the www. prefix for legacy reasons.
When a company wants to buy a domain name, they can buy it either with or without the prefix, or get both (for example, buy the naked domain and set-up the same domain with the www. prefix as sub-domain) and configure both to load the same website. There are technical reasons to chosing a domain with a www. prefix (allows for certain cookie blocking polices) or a naked domain (shorter url).
Usually, one of the two will be the canonical (real) domain, while the other will only redirect to the real domain. This redirect causes a delay but it's there for a reason.
If you code this redirect the right way, search engines will understand that both are the same website. Otherwise if you skip the redirect and point both domains to your files directly, search engines will think they are separate websites which will hurt your SEO (Search Engine Optimization).

Sitecore Multiple Sites - Using Wildcards With hostName Attribute

I'm having an issue with getting a site set up in the web.config file for a Sitecore site. Specifically I can't figure how to use the hostName property to capture the "www" subdomain for a domain (e.g. www.mydomain.com) as well as no specified subdomain (e.g. mydomain.com).
I've experimented a little and found that I can do something like *.mydomain.com and it works. But the problem is that we want users to also be able to go to just mydomain.com and have the site come up. When I have the hostName configured as *.mydomain.com this apparently is not possible.
Any ideas? The Sitecore developer network doesn't say too much on this (unless it's hidden somewhere I couldn't find).
Craig
For a bit more precision than Mark's removing the dot (which will work) you can use pipe separation to list alternative names:
<site hostName="mydomain.com | *.mydomain.com" ... />
That would allow you to configure a second site reallymydomain.com without it being caught by the hostName above. Remember the sites list will be processed in order, so the first match counts even if there's a second match that is more specific.
Try no dot in the hostName:
<site hostName="*mydomain.com" ... />
Both Mark and James answers are correct and will help you resolve multiple domain/subdomain names to a single Sitecore site.
You may instead want to consider setting up a redirect in IIS from the non www domain to the www sub-domain or vice-versa. Having more than one definitive URL for your domain can negatively effect your page rank.
This is a handy module for IIS 7 to help you define redirects. http://www.iis.net/download/urlrewrite