Rewrite engine. How to translate URL - regex

I am new to regular expressions and rewrite engine
I want to translate:
domain.com/type/id
on
domain.com/index.php?type=type&id=id
I use
RewriteRule (\w+)/(\d+)$ ./index.php?id=$1&type=$2
I works almost fine and I am able to get two variables but website has a problem with including other files. My main URL is: http://domain.com/repos/site and after trying to type an URL like http://domain.com/repos/site/ee/9, firebug says:
"NetworkError: 404 Not Found - http://domain.com/repos/site/ee/lib/geoext/script/geoext.js"
It seems site takes "ee" as a part of ulr, not as a GET variable.

Yes, you will certainly have to change your paths. Paths behavior:
- href="mypath": will append "/mypath" to the URL from the current URL
- href="./mypath": same as before
- href="/mypath": will append mypath to the root. This is the behavior you want
Note: you can also use "../" to come back to the parent directory of where you are.

Related

How to add regex constraints to Gin framework's router?

Use Rails' routing, for a URL like https://www.amazon.com/posts/1, can use this way to do
get 'posts/:url', to: 'posts#search', constraints: { url: /.*/ }
Use go's gin framework, didn't find a regex constraints method for such a routing
r.GET("posts/search/:url", post.Search)
In the post controller
func Search(c *gin.Context) {
fmt.Println(c.Param("url"))
}
When call http://localhost:8080/posts/search/https://www.amazon.com/posts/1, it returned 404 code.
Like https://play.golang.org/p/dsB-hv8Ugtn
➜ ~ curl http://localhost:8080/site/www.google.com
Hello www.google.com%
➜ ~ curl http://localhost:8080/site/http://www.google.com/post/1
404 page not found%
➜ ~ curl http://localhost:8080/site/https%3A%2F%2Fwww.google.com%2Fpost%2F1
404 page not found%
➜ ~ curl http://localhost:8080/site/http:\/\/www.google.com\/post\/1
404 page not found%
Gin does not support regular expressions in the router. This is probably because it builds a tree of paths in order to not have to allocate memory while traversing and results in excellent performance.
The parameter support for paths is also not very powerful but you can work around the issue by using an optional parameter like
c.GET("/posts/search/*url", ...)
Now c.Param("url") could contain slashes. There are two unsolved problems though:
Gin's router decodes percent encoded characters (%2F) so if the original URL had such encoded parts, it would wrongly end up decoded and not match the original url that you wanted to extract. See the corresponding Github issue: https://github.com/gin-gonic/gin/issues/2047
You would only get the scheme+host+path part of URLs in your parameter, the querystring would still be separate unless you also encode that. E.g. /posts/search/http://google.com/post/1?foo=bar would give you a "url" param of "/http://google.com/posts/1"
As seen in the example above, optional parameters in Gin also (wrongly) always contain a slash at the beginning of the string.
I would recommend you pass the URL as an encoded querystring instead. This will result in a lot less headache. Otherwise I'd recommend looking for a different router or framework that is less restrictive because I don't think Gin will resolve these issues anytime soon - they have been open for years.

Apache mod_rewrite mapping path to parameters

I'm moving over from IIS to Apache (on Windows) and struggling with adapting a rewrite rule (using Helicon ISAPI_Rewrite 3 in IIS).
The rule maps what looks like a directory structure path back into a set of query string parameters. There could be any number of parameters in the path.
E.g.
/basket/param1/value1/param2/value2/param3/value3 ...and so on...
Becomes...
/basket?param1=value1&param2=value2&param3=value3 ...and so on...
Rule in ISAPI_Rewrite:
# This rule simply reverts parameters that appear as folders back to standard parameters
# e.g. /search-results/search-value/red/results/10 becomes /search-results?search-value=red&results=10
RewriteRule ^/(.*?)/([^/]*)/([^/]*)(/.+)? /$1$4?$2=$3 [NC,LP,QSA]
I first spotted that Apache doesn't have the 'LP' flag, so swapped it for the N=10 as a test for looping...
RewriteRule ^(.*?)/([^/]*)/([^/]*)(/.+)? $1$4?$2=$3 [NC,N=10,QSA]
However the Apache error logs show the same parameters being added over and over again until the number of loops on the N flag is reached, ending in a HTTP 500 error.
Any ideas where I'm going wrong?!?
After having done much head scratching and engaging my Google Foo I have located the solution to all my problems on another Stack Overflow comment...
https://stackoverflow.com/a/5520004/14054970
Essentially...
apparently there's been an issue with mod_rewrite re-appending
post-fix part in certain cases
https://issues.apache.org/bugzilla/show_bug.cgi?id=38642
The problem:
If multiple RewriteRules within a .htaccess file match, unwanted
copies of PATH_INFO may accumulate at the end of the URI.
If you are on Apache 2.2.12 or later, you can use the DPI flag to
prevent this http://httpd.apache.org/docs/2.2/rewrite/flags.html
I'm using Apache 2.4, so my Rewrite rule now looks as follows (and I'll be adding the DPI flag to all rules to be safe)...
RewriteRule ^(.*?)/([^/]*)/([^/]*)(/.+)? $1$4?$2=$3 [NC,N=1000,QSA,DPI]

Rewrite path segments with name value pairs to query parameters

I want to rewrite pretty urls to a query parameter in my .htaccess file using regex.
So that:
https://www.example.com/s/file/name1/value1/name2/value2/name3/value3
gets rewritten to:
https://www.example.com?file.htm?name1=value1&name2=value2&name3=value3
It needs to handle variable numbers of name pair values. It will include at least one name value pair. Eg it needs to work with https://www.example.com/s/file/name1/value1 and https://www.example.com/s/file/name1/value1/name2/value2
The s in https://www.example.com/s/file/name1/value1/name2/value2/name3/value3 indicates the rewrite rule should trigger: all other urls should be left alone. The file value is the name of the htm file, so this can have different values.
In regex101.com I have tried:
pattern: \/([^\/]+)(\/)([^\/]+)
substitution: $1=$3&
On string: /s/new/v/123/c/42
And it returns: s=new&v=123&c=42&
But it should return: new.htm?v=123&c=42
So I have successfully gotten it to work with a variable number of name value pairs. But I just can't get my head around how to make it first move past s and new and then dynamically replace name value pairs.
I did not include https://www.example.com/ in regex101 because in a .htaccess file the initial domain is assumed.
I found this method but it seems to work with a fixed amount of value pairs.
I also reviewed this post which contains great information, but no solution to this specific issue.
In the end I simplified what I needed. In the htaccess file I put:
RewriteRule ^v/([^/]+)/([^/]+) /voucher/$1.htm?v=$2 [NC,R,L]
So now a pretty url that looks like
https://www.example.com/v/page/id
gets rewritten to
https://www.example.com/voucher/page.htm?v=id

How to replace characters in an nginx variable string?

Is there a way I can replace non alphanumeric characters returned with $request_uri with a space (or a +)?
What I'm trying to do is redirect all 404's in one of my sites to it's search engine, where the query is the uri requested. So, I have a block in my nginx.conf containing:
error_page 404 = #notfound;
location #notfound {
return 301 $scheme://$host/?s=$request_uri;
}
While this does indeed work, the url's it's returning are the actual uri's complete with -_/ characters causing the search to always return 0 results
For instance... give this url: https://example.com/my-articles, the redirect ends up as this: https://example.com/?s=/my-articles
What I would like is to end up (ultimately) like this: https://example.com/?s=my+articles (tho, the + at the beginning works fine too... https://example.com/?s=+my+articles
I will need to do this without LUA or Perl modules. So, how can I accomplish this?
You may need to tweak this depending upon how far down your directory structure you want the replacement to go, but this is the basic concept.
Named location for initial capture of 404s:
location #notfound {
rewrite (.*) /search$1 last;
}
Named locations are a bit limiting, so all this does is add /search/ to the beginning of the URI which returned 404. The last flag tells Nginx to break out of the current location and select the best location to process the request based on the rewritten URI, so we need a block to catch that:
location ^~ /search/ {
internal;
rewrite ^/search/(.*)([^a-z0-9\+])(.*)$ /search/$1+$3 last;
rewrite ^/search/(.*)$ /?s=$1 permanent;
}
The internal directive makes this location only accessible to the Nginx process itself, any client requests to this block will return 404.
The first rewrite will change the last non text, digit or + character into a + and then ask Nginx to reevaluate the rewritten URI.
The location block is defined with the ^~ modifier, which means requests matching this location will not be evaluated against any regex defined location blocks, so this block should keep catching the rewritten requests.
Once all the non word characters are gone the first rewrite will no longer match so the request will be passed to the next rewrite, which removes the /search from the front of the URI and adds the query string.
My logs look like this:
>> curl -L -v http://127.0.0.1/users-forum-name.1
<< "GET /?s=users+forum+name+1 HTTP/1.1"
>> curl -L -v http://127.0.0.1/users-forum-name/long-story/some_underscore
<< "GET /?s=users+forum+name+long+story+some+underscore"
You get the idea..
You can use lua module, transform this variable to what you need using lua string functions. I'am using OpenResty which is basicly nginx with lua enabled. But nginx lua module will do fine. Here is directive that allows you to use lua inside nginx configuration. It could be inside location using content_by_lua_block / access_by_lua_block or in separate file using content_by_lua_file / access_by_lua_file. Here is documentation on this https://github.com/openresty/lua-nginx-module#content_by_lua .
Here is an example from my app.
location ~/.*\.jpg$ {
set $test '';
access_by_lua_block {
ngx.var.test = string.sub(ngx.var.uri, 2)
}
root /var/www/luaProject/img/;
try_files $uri /index.html;
}
It is generally a bad idea to automatically issue redirects from 404 Not Found pages to elsewhere — the user might have simply mistyped a single character in the URL (e.g., on a mobile phone whilst copying the URL from a flier and having a "fat finger"), which would be very easy to correct once they see a 404 and the obvious typo in the address bar, yet may require starting from scratch if your search-engine doesn't deliver.
If you still want to do it, it might be more efficient to do it within the search engine itself — after all, if your search engine isn't capable of searching by URL, and correcting typos, then it doesn't sound like a very useful search engine, now does it?
If you still want to do it within the nginx alone in front of the search engine, then you can use the fact that http://nginx.org/r/rewrite directives essentially let you implement any sort of a DFA — Deterministic Finite Automaton — but, depending on the number of replacements required, it may result in too many cycles and somewhat inflexible rulesets.
Take a look at the following resources on recursive replacements of given characters within the URL for other characters:
How to replace underscore to dash with Nginx
nginx rewrite rule to remove - and _
https://serverfault.com/questions/477103/how-do-i-verify-site-ownership-on-google-webmaster-tools-through-nginx-conf
http://mdoc.su/

How to guess full file name, having only first 2 letters

I have a directory full of files, which names are prefixed with sequential, unique number - like so:
/01 - Gruppe #1 - Potatisvalsen.mp3
/02 - Gruppe #1 - Wondrous Love & Hell Broke Loose in Georgia.mp3
Those are accessible at http://mysite/01 - Gruppe #1 - Potatisvalsen.mp3 etc.
I would like to rewrite calls like http://mysite/01.mp3 to the correct full URL as above.
I have tried the "obvious":
RewriteRule ^/(\d+)*\.mp3$ ./$1(.*)\.mp3
But that probably just shows my ignorance :)
Is this possible using mod_rewrite?
mod_rewrite cannot do this shell expansion. You will be better off forwarding these requests to a PHP script and load the actual file there.
Step 1: Forward to PHP
RewriteRule ^\d{2}\.mp3$ fileloader.php?f=$0 [L,QSA,NC]
Step 2: Inside fileloader.php
Load a list of files from current directory into an associative array
Perform a lookup on those filename using $_GET['f']
Serve the found file