Varnish: Remove Some Cookies Issue - regex

I am using Varnish 3.0.5 and Apache 2.4.6 with PHP 5.4.21
I have read the documentation here which says
Varnish will, in the default configuration, not cache a object coming from the backend with a Set-Cookie header present. Also, if the client sends a Cookie header, Varnish will bypass the cache and go directly to the backend.
So, in an effort to have Varnish cash pages, I need to remove the non-important cookies being sent to Varnish from the client. At present, there is only one cookie being sent as depicted here:
My default.vcl file has the following code, which is supposed to remove the cookie(s) whose name starts with the underscore character, or whose name is "has_js":
sub vcl_recv {
# //Remove all cookies that begin with an underscore
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(_[_\.a-z0-9]+|has_js)=[^;]*", "");
# //Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;\s", "");
# unset req.http.Cookie;
...
I have tested the regex from this application and it finds a match for the cookie being sent from the client as noted in the image above.
When I run
]# varnishes
from the command line, I find that I have no "hits" only "misses". However, if I uncomment the
unset req.http.Cookie;
line, so that it removes all the Cookies (of which there should be only one, I assume from the image above) I get the hits I'd expect.
I'm hoping someone can point me in the right direction as to what I may be missing?
Thanks.

The problem with the code above is this line in the default.vcl that gets called later on:
if (req.http.Authorization || req.http.Cookie) {
# /* Not cacheable by default */
return (pass);
}
It is commented out, but still gets called as part of the default behavior of Varnish.
As you can see, it is asking if
req.http.Cookie
exists. In the code provided in the question, the variable will still exist, but will be an empty string. This empty string will still pass the logical test in the default Varnish behavior. As a result, the following code must be added after the code which removes the undesirable cookies:
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(_[_\.a-z0-9]+|has_js)=[^;]*", "");
# //Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;\s", "");
if (req.http.Cookie == "") {
unset req.http.Cookie;
}
Now, if the req.http.Cookie is empty, the object will be removed and Varnish will cache as expected.

Related

Varnish regular expression regsuball to uppercase (support Replacement Text Case Conversion)

I am looking for proper handling of URLs in Varnish (5.2.1), here is what I do (trying to redirect to lowercase URLs):
set req.url = std.tolower(req.url); //this is new.url
//if original.url != new.url => redirect
This produce good URL, until client library (and there are quite few) where they convert %[hex] to %[HEX] according https://www.rfc-editor.org/rfc/rfc3986#section-2.1 end up in URL redirection loop.
Example:
req.url = "/query=mythbusters%20-%20die%20wissensj%c3%A4ger"
is redirected to
"/query=mythbusters%20-%20die%20wissensj%c3%a4ger"
and client redirects it to
"/query=mythbusters%20-%20die%20wissensj%c3%A4ger"
I am trying to solve this issue, using regular expressions, but for some reason, I can not get UPPER case results, according PCRE/PCRE2/Perl regexp it should be possible like this:
set req.url = std.tolower(req.url);
set req.url = regsuball(req.url, "(%[0-9a-f][0-9a-f])", "\U\1");
Anybody have idea how to solve this ?
I posted issue on Varnish github, answer was this is not supported.

Way to have browsers ignore a specific string of characters in middle of a URL?

I have a very narrow, specific case where an application takes a URL and appends the name of a specific server to the end of it from a variable passed to it, but it does it as host:servername001. The problem is that the URL doesn't work if host: is there. Is there a string I can add to the URL prior to the variable that will tell it to ignore the next 5 characters (eg, the host:) and then use the resulting URL?
The URL it would pass is like:
https://website1/path/server.do?sysparm_query=name=$host.name
Which results in an actual URL output of:
https://website1/path/server.do?sysparm_query=name=host:servername001
I am looking for a way that I can have it ignore the 5 characters (host:) before the servername. I can control the URL string, but not the $host.name variable.
Thoughts or suggestions?
EDIT for clarification:
I'm passing a URL and a variable to an output. This is all in a 3rd party app that I don't have a lot of control over. I can edit the URL. Right now, it's:
https://website1/path/server.do?sysparm_query=name=
I need to append the servername from the variable at the end of that URL. The only one I can use for that is
$host.name
Which adds "host:" right before the server name in the resulting URL. I need to know if there is something I can add to the URL string above, that would tell a browser to ignore "the next 5 characters" so the result is just that the URL looks like this:
https://website1/path/server.do?sysparm_query=name=servername001
Instead of like this:
https://website1/path/server.do?sysparm_query=name=host:servername001
Hope that makes sense...
Here is a code snipped to append the host name without the host: prefix:
var url = "https://website1/path/server.do?sysparm_query=name=";
var $host = {
name: 'host:servername001'
};
var newUrl = url + $host.name.replace(/^[^:]*:/, '');
console.log('- url: ' + url);
console.log('- newUrl: ' + newUrl);

Prevent URL encoding that is removing equals signs from URL

Working on a Django/React app. I have some verification emails links that look like the following:
https://test.example.com/auth/security_questions/f=ru&i=101083&k=7014c315f3056243534741610545c8067d64d747a981de22fe75b78a03d16c92
In dev env this works fine, but now that I am getting it ready for production, it isn't working. When I click on it, it converts it to:
https://test.example.com/auth/security_questions/f%3Dru&i%3D101083&k%3D7014c315f3056243534741610545c8067d64d747a981de22fe75b78a03d16c92/
This prevents react-router-dom from matching the correct URL, so a portion of the web application does not load properly.
The link is constructed using the following.
link = '%s/auth/security_questions/f=%s&i=%s&k=%s' % \
('https://test.example.com', 'ru', user.id, user.key)
Also, here is the url() that is catching the route:
url(r'^(?:.*)/$', TemplateView.as_view(template_name='index.html')),
These variables are supposed to be query parameters in a GET request. When you construct the link, you'll need to have a question mark in there somewhere separating the URL from the query string:
https://test.example.com/auth/security_questions/?f=ru&i=101083&k=7014c315...
^
|___ here
The conversion of = to url-encoded %3D etc is correct, and equivalent. Sometimes variables are part of the URL directly, but webapps don't use &-separated key/value pairs in that case.

Regex capture group in Varnish VCL

I have a URL in the form of:
http://some-site.com/api/v2/portal-name/some/webservice/call
The data I want to fetch needs
http://portal-name.com/webservices/v2/some/webservice/call
(Yes I can rewrite the application so it uses other URL's but we are testing varnish at the moment so for now it cannot be intrusive.)
But I'm having trouble getting the URL correctly in varnish VCL. The api part is replaced by an empty string, no worries but now the portal-name.
Things I've tried:
if (req.url ~ ".*/(.*)/") {
set req.http.portalhostname = re.group.0;
set req.http.portalhostname = $1;
}
From https://docs.fastly.com/guides/vcl/vcl-regular-expression-cheat-sheet and Extracting capturing group contents in Varnish regex
And yes, std is imported.
But this gives me either a
Syntax error at
('/etc/varnish/default.vcl' Line 36 Pos 35)
set req.http.portalhostname = $1;
or a
Symbol not found: 're.group.0' (expected type STRING_LIST):
So: how can I do this? When I have extracted the portalhostname I should be able to simply do a regsub to replace that value with an empty string and then prepend "webservices" and my URL is complete.
The varnish version i'm using: varnish-4.1.8 revision d266ac5c6
Sadly re.group seems to have been removed at some version. Similar functionality appears to be accessible via one of several vmods. See https://varnish-cache.org/vmods/

How to redirect in nginx if request URI does not contain certain words

I'm using nginx to serve static news-like pages.
On the top-level there is
https://example.com/en/news/ with an overview of the articles.
Individual items have a URL similar to this: https://example.com/en/news/some-article
All URLs contain the language, i.e. /en/ or /de/.
I would like to create a rule that redirects requests that don't contain the language to the correct URL (the language is mapped based on IP an available via $lang).
The following should work (en example):
/news/ --- redirect ---> /en/news/
/news/some-article --- redirect ---> /en/news/some-article
My attempts looked something like this
location ~* /news/.*$ {
if ($request_uri !~* /(de|en)/$) {
return 302 https://example.com/$lang/$request_uri;
}
}
So far this resulted in infinite redirects.
Your solution seems overly complicated to me. And testing $request_uri with a trailing $ will never match the rewritten URIs (hence the loop).
You could use a prefix location to only match URIs that begin with /news/.
Assuming that you have calculated a value for $lang elsewhere, this may work for you:
location ^~ /news/ {
return 302 /$lang$request_uri;
}
The ^~ modifier is only necessary if you have regular expression location blocks within your configuration that may conflict. See this document for more.