Rewrite all sub directories except a few select paths - regex

So if I have current setup like this.
domain.com/some-path/
domain.com/some-path/locations/
domain.com/some-path/locations/some-location
domain.com/some-path/locations/locb
domain.com/some-path/locations/infinite-location-list
Those are the URL patterns I don't want to rewrite. The some-path sub-directory, it's child locations sub-directory, and anything under locations.
All other child URLs under the some-path directory should be 301 directed to the /some-path/ directory.
I tried something like this:
location ~* ^/some-path/(.+)? {
if ($request_uri !~ "^/some-path/locations/(.*)$") {
return 301 http://domain.com/some-path/;
}
}
But /some-path/ gets stuck in a redirect loop, and /some-path/locations/ throws a server side nginx 404.
Thoughts on how to accomplish my goal?

This is a bad idea, design-wise. If a user mistypes domain.com/some-path/locations/some-location, which is valid, according to you, as domain.com/some-path/location/some-location, which is not valid, then they'll end up having all of their typing wiped out, and they'll have to start from scratch (or maybe just go to your competitor).
However, if you so insist for a regular expression:
if ($request_uri !~ "^/some-path/($|locations/.*$)") {
return 301 http://domain.com/some-path/;
}
You'll probably have to have this within an existing location, or even keep it at the top level, depending on the rest of your config, since only a single location gets to handle a given request.

Related

How to replace characters in an nginx variable string?

Is there a way I can replace non alphanumeric characters returned with $request_uri with a space (or a +)?
What I'm trying to do is redirect all 404's in one of my sites to it's search engine, where the query is the uri requested. So, I have a block in my nginx.conf containing:
error_page 404 = #notfound;
location #notfound {
return 301 $scheme://$host/?s=$request_uri;
}
While this does indeed work, the url's it's returning are the actual uri's complete with -_/ characters causing the search to always return 0 results
For instance... give this url: https://example.com/my-articles, the redirect ends up as this: https://example.com/?s=/my-articles
What I would like is to end up (ultimately) like this: https://example.com/?s=my+articles (tho, the + at the beginning works fine too... https://example.com/?s=+my+articles
I will need to do this without LUA or Perl modules. So, how can I accomplish this?
You may need to tweak this depending upon how far down your directory structure you want the replacement to go, but this is the basic concept.
Named location for initial capture of 404s:
location #notfound {
rewrite (.*) /search$1 last;
}
Named locations are a bit limiting, so all this does is add /search/ to the beginning of the URI which returned 404. The last flag tells Nginx to break out of the current location and select the best location to process the request based on the rewritten URI, so we need a block to catch that:
location ^~ /search/ {
internal;
rewrite ^/search/(.*)([^a-z0-9\+])(.*)$ /search/$1+$3 last;
rewrite ^/search/(.*)$ /?s=$1 permanent;
}
The internal directive makes this location only accessible to the Nginx process itself, any client requests to this block will return 404.
The first rewrite will change the last non text, digit or + character into a + and then ask Nginx to reevaluate the rewritten URI.
The location block is defined with the ^~ modifier, which means requests matching this location will not be evaluated against any regex defined location blocks, so this block should keep catching the rewritten requests.
Once all the non word characters are gone the first rewrite will no longer match so the request will be passed to the next rewrite, which removes the /search from the front of the URI and adds the query string.
My logs look like this:
>> curl -L -v http://127.0.0.1/users-forum-name.1
<< "GET /?s=users+forum+name+1 HTTP/1.1"
>> curl -L -v http://127.0.0.1/users-forum-name/long-story/some_underscore
<< "GET /?s=users+forum+name+long+story+some+underscore"
You get the idea..
You can use lua module, transform this variable to what you need using lua string functions. I'am using OpenResty which is basicly nginx with lua enabled. But nginx lua module will do fine. Here is directive that allows you to use lua inside nginx configuration. It could be inside location using content_by_lua_block / access_by_lua_block or in separate file using content_by_lua_file / access_by_lua_file. Here is documentation on this https://github.com/openresty/lua-nginx-module#content_by_lua .
Here is an example from my app.
location ~/.*\.jpg$ {
set $test '';
access_by_lua_block {
ngx.var.test = string.sub(ngx.var.uri, 2)
}
root /var/www/luaProject/img/;
try_files $uri /index.html;
}
It is generally a bad idea to automatically issue redirects from 404 Not Found pages to elsewhere — the user might have simply mistyped a single character in the URL (e.g., on a mobile phone whilst copying the URL from a flier and having a "fat finger"), which would be very easy to correct once they see a 404 and the obvious typo in the address bar, yet may require starting from scratch if your search-engine doesn't deliver.
If you still want to do it, it might be more efficient to do it within the search engine itself — after all, if your search engine isn't capable of searching by URL, and correcting typos, then it doesn't sound like a very useful search engine, now does it?
If you still want to do it within the nginx alone in front of the search engine, then you can use the fact that http://nginx.org/r/rewrite directives essentially let you implement any sort of a DFA — Deterministic Finite Automaton — but, depending on the number of replacements required, it may result in too many cycles and somewhat inflexible rulesets.
Take a look at the following resources on recursive replacements of given characters within the URL for other characters:
How to replace underscore to dash with Nginx
nginx rewrite rule to remove - and _
https://serverfault.com/questions/477103/how-do-i-verify-site-ownership-on-google-webmaster-tools-through-nginx-conf
http://mdoc.su/

nginx - URL encode query string

I have an nginx reverse-proxy which needs to pass on the query string it receives. However this query string it receives is not well formatted and can contain JSON that is not URL encoded i.e. it contains curly brackets i.e. {}, commas, colons and double quotes! Unfortunately, I have no control over this and this causes the downstream server to barf when parsing the string.
Is there a way to correctly URL encode this string before proxying it?
I can replace the curly brackets as I know there will only be one instance of each using the config:
if ($args ~* '(.*){(.*)}(.*)') {
set $args $1%7B$2%7D$3;
rewrite (.*)$ $1;
}
proxy_pass http://127.0.0.1:8080;
However, I don't know in advance how many fields the JSON will have so it's difficult to use the same logic as above for the rest of the object.
I should also mention that I don't think this is related to nginx url-decoding parameters as I am not using a URI in the proxy_pass.
Thanks!
UPDATE: For the time being, the JSON object seems to be sending the same properties so this is what I've used as a workaround. It's pretty hideous and will break if the number of properties changes but does the job for now.
if ($args ~* '(.*){"(.*)":"(.*)","(.*)":"(.*)","(.*)":"(.*)","(.*)":"(.*)","(?<group10>.*)":"(?<group11>.*)"}(?<group12>.*)') {
set $args $1%7B%22$2%22%3A%22$3%22%2C%22$4%22%3A%22$5%22%2C%22$6%22%3A%22$7%22%2C%22$8%22%3A%22$9%22%2C%22${group10}%22%3A%22${group11}%22%7D${group12};
rewrite (.*)$ $1;
}
proxy_pass http://127.0.0.1:8080;
Note that since this returns more than 9 regex groups, I had to name groups 10, 11 and 12 otherwise they get interpreted as $1 + the digit 0, 1 or 2.
Is there a more robust way of doing this?
Personally, I don't like a solution with a single if statement, because it doesn't look very readable, flexible or maintainable. You may see whether having a combination of location or rewrite statements, where each one handles a specific encoding case, may work; see http://mdoc.su/ for a fun project that's very heavy with internal redirects, although I believe at one point nginx may have a limit on the total number of indirections.
Otherwise, provided that you cannot modify the backend, another option is to automatically redirect misbehaving clients and/or requests to an auxiliary backend, whose only purpose is to re-encode the string correctly, providing an X-Accel-Redirect HTTP Response Header as its output (as per http://nginx.org/r/proxy_ignore_headers), which nginx will use to make a subsequent internal redirect / request to the actual backend.

Letsencrypt Renewal + Nginx + owncloud config = failed because of regular expression

I am running an owncloud-server with ngnix on Debian 8.
I use a ssl-certificate for that domain from letsencrypt.
Now i want to use an autorenewal-script, to run periodically and renew my certs. This works with all domains, except the owncloud.
Actually there is one location block in the nginx-owncloud-config, that prevents letsencrypt from enter the subfolder domain.org/.well-known/acme-challenge:
location ~ ^/(?:\.|autotest|occ|issue|indie|db_|console) {
deny all;
}
By god, i am no expert in regular expressions and have no clue, how to solve this (and what this expression actually means).
Below that Block i included a location block for the letsecrypt-renewal:
# Letsencrypt auto-renewal
location '/.well-known/acme-challenge' {
default_type text/plain;
root /var/www/;
try_files $uri /$1;
}
I think I tried something like:
location ~ ^/(?:\.(?!well-known/acme-challenge)|autotest|occ|issue|indie|db_|console) {
deny all;
}
...not knowing, if this would affect the expression.
The only way for me is to comment out the "deny all". And it works. Actually i have in mind, to extend the renewal script to stop the server, change the owncloud-conf, restart the server again, fetch the new certs, stop the server again, change the owncloud-conf back und restart the server...
But maybe its more simple. And i may learn something more about regex...
Does anyone have a tip for me?
The location ~ ^/(?:\.|autotest|occ|issue|indie|db_|console) denies access to any URI beginning with /. such as /.well-known.
Firstly, do you have any files and directories in the root which begin with a period (other than /.well-known)?
One option is to make the regex more specific, for example:
location ~ ^/(?:\.ht|autotest|occ|issue|indie|db_|console)
would deny access to any URI beginning with /.ht.
Another option is to make location '/.well-known/acme-challenge' take precedence by adding the ^~ modifier. See this document.
location ^~ /.well-known/acme-challenge
This would make the location take precedence over all regex locations. So if the location contained .php files, they may cease to work.
A final option would be to turn it into a regex location:
location ~ ^/\.well-known/acme-challenge
In which case it would have equal precedence and you could order it above the deny location.

Nginx request_uri without args

How do I get the value of request_uri without the args appended on the end. I know there is a uri variable but I need the original value as the Nginx documentation states:
request_uri
This variable is equal to the original request URI as received from
the client including the args. It cannot be modified. Look at $uri for
the post-rewrite/altered URI. Does not include host name. Example:
"/foo/bar.php?arg=baz"
I use this map, which works without lua:
map $request_uri $request_uri_path {
"~^(?P<path>[^?]*)(\?.*)?$" $path;
}
You're looking for $uri. It does not have $args. In fact, $request_uri is almost equivalent to $uri$args.
If you really want exactly $request_uri with the args stripped, you can do this.
local uri = string.gsub(ngx.var.request_uri, "?.*", "")
You will need to have lua available but that will do exactly what you're asking.
The accepted answer pointed me in the right direction, but I had to figure out where to add that directive, After some time investigating I found set_by_lua_block
set_by_lua_block $request_uri_path {
return string.gsub(ngx.var.request_uri, "?.*", "")
}
I hope it saves some time to those who comes here.
Ryan Olson's answer is really good but map cannot be used everywhere.
If you need to get that information at a block level that does not accept the use of map, it can be done in a if statement. But remember that if is evil.
set $new_uri "";
if ($request_uri ~ "^([^?]*)(\?.*)?$") {
set $new_uri $1;
}
This code works out of the box (it does not use Lua or anything else).

How to create a regex in nginx to break the browser cache?

I'm a programmer so not sure if I'm dreaming if I think I can do this kind of logic in what should be a configuration file. Simply what I need, is an nginx response such that requests which include a /cacheXXX should strip out that part of the url.
So:
/cache123/editor/editor.js -> /editor/editor.js
/cache456/admin/editor.css -> /admin/editor.css
/cache987/editor/editor.js -> /editor/editor.js
And so anything else should be ignored i.e:
/hello/editor/editor.js -> /hello/editor/editor.js
The key point here is that if the url matches:
/cacheXXX
Then to strip that part out and return the rest.
What would an nginx location entry look like to achieve this look like?
For context as to the point of this, I am trying to break the browser cache by supplying new urls for updated resources by changing the path to the resource, rather then changing url parameters which isn't guaranteed.
One solution is to use a regular expression location to both select the URIs that begin with /cache and extract the filename part for returning the correct file.
For example (assuming that the root is defined somewhere):
location ~ ^/cache[0-9]+(/.*) {
try_files $1 =404;
}
The regular expression location is evaluated in order, so its relative position in the configuration file is significant. See this document for details.
You could also use a rewrite rule, for example:
rewrite ^/cache[0-9]+(/.*) $1 last;
See this document for details.
For maximum efficiency, wrap either technique within a prefix location, for example:
location ^~ /cache {
rewrite ^/cache[0-9]+(/.*) $1 break;
}