Regex capture group in Varnish VCL - regex

I have a URL in the form of:
http://some-site.com/api/v2/portal-name/some/webservice/call
The data I want to fetch needs
http://portal-name.com/webservices/v2/some/webservice/call
(Yes I can rewrite the application so it uses other URL's but we are testing varnish at the moment so for now it cannot be intrusive.)
But I'm having trouble getting the URL correctly in varnish VCL. The api part is replaced by an empty string, no worries but now the portal-name.
Things I've tried:
if (req.url ~ ".*/(.*)/") {
set req.http.portalhostname = re.group.0;
set req.http.portalhostname = $1;
}
From https://docs.fastly.com/guides/vcl/vcl-regular-expression-cheat-sheet and Extracting capturing group contents in Varnish regex
And yes, std is imported.
But this gives me either a
Syntax error at
('/etc/varnish/default.vcl' Line 36 Pos 35)
set req.http.portalhostname = $1;
or a
Symbol not found: 're.group.0' (expected type STRING_LIST):
So: how can I do this? When I have extracted the portalhostname I should be able to simply do a regsub to replace that value with an empty string and then prepend "webservices" and my URL is complete.
The varnish version i'm using: varnish-4.1.8 revision d266ac5c6

Sadly re.group seems to have been removed at some version. Similar functionality appears to be accessible via one of several vmods. See https://varnish-cache.org/vmods/

Related

Varnish regular expression regsuball to uppercase (support Replacement Text Case Conversion)

I am looking for proper handling of URLs in Varnish (5.2.1), here is what I do (trying to redirect to lowercase URLs):
set req.url = std.tolower(req.url); //this is new.url
//if original.url != new.url => redirect
This produce good URL, until client library (and there are quite few) where they convert %[hex] to %[HEX] according https://www.rfc-editor.org/rfc/rfc3986#section-2.1 end up in URL redirection loop.
Example:
req.url = "/query=mythbusters%20-%20die%20wissensj%c3%A4ger"
is redirected to
"/query=mythbusters%20-%20die%20wissensj%c3%a4ger"
and client redirects it to
"/query=mythbusters%20-%20die%20wissensj%c3%A4ger"
I am trying to solve this issue, using regular expressions, but for some reason, I can not get UPPER case results, according PCRE/PCRE2/Perl regexp it should be possible like this:
set req.url = std.tolower(req.url);
set req.url = regsuball(req.url, "(%[0-9a-f][0-9a-f])", "\U\1");
Anybody have idea how to solve this ?
I posted issue on Varnish github, answer was this is not supported.

Nginx rewrite rule is not working if hash in the url

I have written nginx rewrite rule to redirect all request for /path/category except subcategory1. I am using below regular expression for match and it is working fine in regex tester. However, when I am providing same regex in Nginx conf then it is not working for negative lookahead if url contains the # character. Do you have any suggestions?
Regex tried so far:
^\/path\/category(?!.*(\bsubcategory1\b)).*$
^\/path\/category(([\/#]*)(?!.*(subcategory1))).*$
Rewrite Rule:
rewrite ^\/path\/category(?!.*(\bsubcategory1\b)).*$ https://new.host.com permanent;
Path Details:
It should redirect to https://new.host.com which is working fine
/path/category
/path/category/
/path/category#/
/path/category/#/
skip the redirection for subcategory1 . It is not working for last 3 urls that contains hash.
/path/category/subcategory1
/path/category/subcategory1/
/path/category/subcategory1/dsadasd
/path/category#/subcategory1
/path/category/#/subcategory1
/path/category#/subcategory1/dadsd
Anything in the URI after # is ignored because it is supposed to be client side so it never gets to HTTP server (Nginx for instance).
Nginx regex will show abnormal behavior if a # is in the string under processing.
The part after # is called fragment.
The fragment can be processed at client side.
You can use window.location.hash to access and process fragments.
This Javascript example transform fragment in parameters in a request to process.html :
let param = window.location.hash;
param = param.substring(1); // remove #
param = '?' + param;
console.log('param=',param);
location.href = '/process.html' + param;

my nginx regular expression routing is not working

location ~* "/mypath/([a-zA-Z0-9_.-]{12}$)" {
return 301 https://new-domain.com;
}
Above regular express is when user type https://mywebsite.com/mypath/uy2hgy12jer2 in browser, it will be redirect to https://new-domain.com. But problem is when they type https://mywebsite.com/mypath/uy2hgy12jer2?params=1287612, it's also redirected. What I want is I want to make redirect only to https://mywebsite.com/mypath/uy2hgy12jer2. Please let me know how to do it. Thanks.
Location blocks in NGINX will only match the URI part but not the query string.
Alternatively, you can use below inside location block.
if ($is_args) {
break;
}
I found this behavior after few trails in https://nginx.viraptor.info/. Any character you type after 12th character doesn't get matched except when it is a query string. Next I found the alternative I mentioned and the link below.
For more info - https://serverfault.com/questions/237517/nginx-query-keyword-matching-in-location

How to configure Fiddler's Autoresponder to "map" a host to a folder?

I'm already using Fiddler to intercept requests for specific remote files while I'm working on them (so I can tweak them locally without touching the published contents).
i.e. I use many rules like this
match: regex:(?insx).+/some_file([?a-z0-9-=&]+\.)*
respond: c:\somepath\some_file
This works perfectly.
What I'd like to do now is taking this a step further, with something like this
match: regex:http://some_dummy_domain/(anything)?(anything)
respond: c:\somepath\(anything)?(anything)
or, in plain text,
Intercept any http request to 'some_dummy_domain', go inside 'c:\somepath' and grab the file with the same path and name that was requested originally. Query string should pass through.
Some scenarios to further clarify:
http://some_domain/somefile --> c:\somepath\somefile
http://some_domain/path1/somefile --> c:\somepath\path1\somefile
http://some_domain/path1/somefile?querystring --> c:\somepath\path1\somefile?querystring
I tried to leverage what I already had:
match: regex:(?insx).+//some_dummy_domain/([?a-z0-9-=&]+\.)*
respond: ...
Basically, I'm looking for //some_dummy_domain/ in requests. This seems to match correctly when testing, but I'm missing how to respond.
Can Fiddler use matches in responses, and how could I set this up properly ?
I tried to respond c:\somepath\$1 but Fiddler seems to treat it verbatim:
match: regex:(?insx).+//some_domain/([?a-z0-9-=&]+\.)*
respond: c:\somepath\$1
request: http://some_domain/index.html
response: c:\somepath\$1html <-----------
The problem is your use of insx at the front of your expression; the n means that you want to require explicitly-named capture groups, meaning that a group $1 isn't automatically created. You can either omit the n or explicitly name the capture group.
From the Fiddler Book:
Use RegEx Replacements in Action Text
Fiddler’s AutoResponder permits you to use regular expression group replacements to map text from the Match Condition into the Action Text. For instance, the rule:
Match Text: REGEX:.+/assets/(.*)
Action Text: http://example.com/mockup/$1
...maps a request for http://example.com/assets/Test1.gif to http://example.com/mockup/Test1.gif.
The following rule:
Match Text: REGEX:.+example\.com.*
Action Text: http://proxy.webdbg.com/p.cgi?url=$0
...rewrites the inbound URL so that all URLs containing example.com are passed as a URL parameter to a page on proxy.webdbg.com.
Match Text: REGEX:(?insx).+/assets/(?'fname'[^?]*).*
Action Text C:\src\${fname}
...maps a request for http://example.com/‌assets/img/1.png?bunnies to C:\src\‌img\‌1.png.

regex quandry - for word matching

I am out of my depth here, currently reading the tutorials and using python to learn regex.
I have a website where a php file http://www.example.com/showme.php?user=JOHN will load the visitor page of JOHN. However I want to let John have his own vanity URL like john.example.com and rewrite it to http://www.example.com/showme.php?user=JOHN .
I know it can be done and after fiddling with it it seems lighttpd mod_rewrite is the way to go. Now I am stumped as I am trying to come up with regex to match!
rewrite ("^![www]\.example\.com" => "www\.example\.com\?user=###");
I am playing with python re module to test out several ways of getting the john from john.example.com and recognize when the first segment of url is not www and then redirect. Above was my trial. Am I even in the right continent!
Any help will be appreciated in
recognizing when first part of url before the first . is not www and is something else - so that example.com won't stump it.
getting the first part of the url before first . and tag it to user=###
Thanks a bunch
Use lighttpd's mod-rewrite module. Add this to your lighttpd.conf file:
$HTTP["host"] != "www.example.com" {
$HTTP["host"] =~ "^([^.]+)\.example\.com$" {
url.rewrite-once = (
"^/?$" => "/showme.php?user=%1"
)
}
}
For an href value like /dir/page.php the domain part of the link gets automatically added from the current request as shown in the browser's address bar. So, if you had used www.example.com; the link would point to htp://www.example.com/dir/page.php and likewise for john.example.com.
For all your links to point at www.example.com, you need to be accessing the page using www. This would be possible only if you do an external redirect from the vanity URL to the actual one i.e. users can still use the shortened URL but they would get redirected to the actual one.
$HTTP["host"] != "www.example.com" {
$HTTP["host"] =~ "^([^.]+)\.example\.com$" {
url.redirect = (
"^/?$" => "http://www.example.com/showme.php?user=%1"
)
}
}