Use Rails' routing, for a URL like https://www.amazon.com/posts/1, can use this way to do
get 'posts/:url', to: 'posts#search', constraints: { url: /.*/ }
Use go's gin framework, didn't find a regex constraints method for such a routing
r.GET("posts/search/:url", post.Search)
In the post controller
func Search(c *gin.Context) {
fmt.Println(c.Param("url"))
}
When call http://localhost:8080/posts/search/https://www.amazon.com/posts/1, it returned 404 code.
Like https://play.golang.org/p/dsB-hv8Ugtn
➜ ~ curl http://localhost:8080/site/www.google.com
Hello www.google.com%
➜ ~ curl http://localhost:8080/site/http://www.google.com/post/1
404 page not found%
➜ ~ curl http://localhost:8080/site/https%3A%2F%2Fwww.google.com%2Fpost%2F1
404 page not found%
➜ ~ curl http://localhost:8080/site/http:\/\/www.google.com\/post\/1
404 page not found%
Gin does not support regular expressions in the router. This is probably because it builds a tree of paths in order to not have to allocate memory while traversing and results in excellent performance.
The parameter support for paths is also not very powerful but you can work around the issue by using an optional parameter like
c.GET("/posts/search/*url", ...)
Now c.Param("url") could contain slashes. There are two unsolved problems though:
Gin's router decodes percent encoded characters (%2F) so if the original URL had such encoded parts, it would wrongly end up decoded and not match the original url that you wanted to extract. See the corresponding Github issue: https://github.com/gin-gonic/gin/issues/2047
You would only get the scheme+host+path part of URLs in your parameter, the querystring would still be separate unless you also encode that. E.g. /posts/search/http://google.com/post/1?foo=bar would give you a "url" param of "/http://google.com/posts/1"
As seen in the example above, optional parameters in Gin also (wrongly) always contain a slash at the beginning of the string.
I would recommend you pass the URL as an encoded querystring instead. This will result in a lot less headache. Otherwise I'd recommend looking for a different router or framework that is less restrictive because I don't think Gin will resolve these issues anytime soon - they have been open for years.
Related
Is there a way I can replace non alphanumeric characters returned with $request_uri with a space (or a +)?
What I'm trying to do is redirect all 404's in one of my sites to it's search engine, where the query is the uri requested. So, I have a block in my nginx.conf containing:
error_page 404 = #notfound;
location #notfound {
return 301 $scheme://$host/?s=$request_uri;
}
While this does indeed work, the url's it's returning are the actual uri's complete with -_/ characters causing the search to always return 0 results
For instance... give this url: https://example.com/my-articles, the redirect ends up as this: https://example.com/?s=/my-articles
What I would like is to end up (ultimately) like this: https://example.com/?s=my+articles (tho, the + at the beginning works fine too... https://example.com/?s=+my+articles
I will need to do this without LUA or Perl modules. So, how can I accomplish this?
You may need to tweak this depending upon how far down your directory structure you want the replacement to go, but this is the basic concept.
Named location for initial capture of 404s:
location #notfound {
rewrite (.*) /search$1 last;
}
Named locations are a bit limiting, so all this does is add /search/ to the beginning of the URI which returned 404. The last flag tells Nginx to break out of the current location and select the best location to process the request based on the rewritten URI, so we need a block to catch that:
location ^~ /search/ {
internal;
rewrite ^/search/(.*)([^a-z0-9\+])(.*)$ /search/$1+$3 last;
rewrite ^/search/(.*)$ /?s=$1 permanent;
}
The internal directive makes this location only accessible to the Nginx process itself, any client requests to this block will return 404.
The first rewrite will change the last non text, digit or + character into a + and then ask Nginx to reevaluate the rewritten URI.
The location block is defined with the ^~ modifier, which means requests matching this location will not be evaluated against any regex defined location blocks, so this block should keep catching the rewritten requests.
Once all the non word characters are gone the first rewrite will no longer match so the request will be passed to the next rewrite, which removes the /search from the front of the URI and adds the query string.
My logs look like this:
>> curl -L -v http://127.0.0.1/users-forum-name.1
<< "GET /?s=users+forum+name+1 HTTP/1.1"
>> curl -L -v http://127.0.0.1/users-forum-name/long-story/some_underscore
<< "GET /?s=users+forum+name+long+story+some+underscore"
You get the idea..
You can use lua module, transform this variable to what you need using lua string functions. I'am using OpenResty which is basicly nginx with lua enabled. But nginx lua module will do fine. Here is directive that allows you to use lua inside nginx configuration. It could be inside location using content_by_lua_block / access_by_lua_block or in separate file using content_by_lua_file / access_by_lua_file. Here is documentation on this https://github.com/openresty/lua-nginx-module#content_by_lua .
Here is an example from my app.
location ~/.*\.jpg$ {
set $test '';
access_by_lua_block {
ngx.var.test = string.sub(ngx.var.uri, 2)
}
root /var/www/luaProject/img/;
try_files $uri /index.html;
}
It is generally a bad idea to automatically issue redirects from 404 Not Found pages to elsewhere — the user might have simply mistyped a single character in the URL (e.g., on a mobile phone whilst copying the URL from a flier and having a "fat finger"), which would be very easy to correct once they see a 404 and the obvious typo in the address bar, yet may require starting from scratch if your search-engine doesn't deliver.
If you still want to do it, it might be more efficient to do it within the search engine itself — after all, if your search engine isn't capable of searching by URL, and correcting typos, then it doesn't sound like a very useful search engine, now does it?
If you still want to do it within the nginx alone in front of the search engine, then you can use the fact that http://nginx.org/r/rewrite directives essentially let you implement any sort of a DFA — Deterministic Finite Automaton — but, depending on the number of replacements required, it may result in too many cycles and somewhat inflexible rulesets.
Take a look at the following resources on recursive replacements of given characters within the URL for other characters:
How to replace underscore to dash with Nginx
nginx rewrite rule to remove - and _
https://serverfault.com/questions/477103/how-do-i-verify-site-ownership-on-google-webmaster-tools-through-nginx-conf
http://mdoc.su/
I'm trying to re-write a URL such as
http://ourdomain.com/hotels/vegas?cf=0
to
http://ourdomain.com?d=vegas&cf=0
using haProxy.
We used to do it with Apache using
RewriteRule ^hotels/([^/]+)/?\??(.*)$ ?d=$1&$2 [QSA]
I've tried
reqrep ^([^\ :]*)\ /hotels/(.*) \1\ /?d=\2
But that gives me http://ourdomain.com?d=vegas?cf=0
And
reqrep ^([^\ :]*)\ /hotels/([^/]+)/?\??(.*) \1\ /?d=\2&\3
Just gives me a 400 error.
It would be nice to do it with acl's but I can't see how that would work.
reqrep ^([^\ :]*)\ /hotels/([^/]+)/?\??(.*) \1\ /?d=\2&\3
Just gives me a 400 error.
([^/]+) is too greedy when everything following it /?\??(.*) is optional. It's mangling the last part of the request, leading to the 400.
Remember what sort of data you're working with:
GET /path?query HTTP/1.(0|1)
Replace ([^/]+) with ([^/\ ]+) so that anything after and including the space will be captured by \3, not \2.
Update: it seems that the above is not quite perfect, since the alignment of the ? still doesn't work out. This -- and the original 400 error -- highlight some of the pitfalls with req[i]rep -- it's very low level request munging.
HAProxy 1.6 introduced several new capabilities that make request tweaking much cleaner, and this is actually a good case to illustrate several of them together. Note that these examples also use anonymous ACLs, wrapped in { }. The documentation seems to discourage these a little bit -- but this is only because they're unwieldy to maintain when you need to test the same set of conditions for multiple reasons (named ACLs can of course be more easily reused), but they're perfect for a case like this. Note that the braces must be surrounded by at least 1 whitespace character due to configuration parser limitations.
Variables, scoped to request (go out of scope as soon as a back-end is selected), response (go into scope only after the back-end responds), transaction (persistent from request to response, these can be used before the trip to the back-end and are still in scope when the response comes back), or session (in scope across multiple requests by this browser during this connection, if the browser reuses the connection), can be used to stash values.
The regsub() converter takes the preceding value as its input and returns that value passed through a simple regex replacement.
If the path starts with /hotels/, capture the path, scrub out ^/hotels/ (replacing it with the empty string that appears after the next comma), and stash it in a request variable called req.hotel.
http-request set-var(req.hotel) path,regsub(^/hotels/,) if { path_beg /hotels/ }
Processing of most http-request steps is done in configuration file order, so, at the next instruction, if (and only if) that variable has a value, we use http-request set-path with an argument of / in order to empty the path. Testing the variable is needed so that we don't do this with every request -- only the ones for /hotels/. It might be that you actually need something more like if { path_reg /hotels/.+ } since /hotels/ by itself might be a valid path we should leave alone.
http-request set-path / if { var(req.hotel) -m found }
Then, we use http-request set-query to set the query string to a value created by concatenating the value of the req.hotel variable with & and the original query string, which we obtain with using the query fetch.
http-request set-query d=%[var(req.hotel)]&%[query] if { var(req.hotel) -m found }
Note that the query fetch and http-request set-query both have some magical behavior -- they take care of the ? for you. The query fetch does not return it, and http-request set-query does not expect you to provide it. This is helpful because we may need to be able to handle requests correctly whether or not the ? is present in the original request, without having to manage it ourselves.
With the above configuration, GET /hotels/vegas?&cf=0 HTTP/1.1 becomes GET /?d=vegas&cf=0 HTTP/1.1.
If the initial query string is completely empty, GET /hotels/vegas HTTP/1.1 becomes GET /?d=vegas& HTTP/1.1. That looks a little strange, but it should be completely valid. A slightly more convoluted configuration to test for the presence of an intial query string could prevent that, but I don't see it being an issue.
So, we've turned 1 line of configuration into 3, but I would argue that those three lines are much more intuitive about what they are accomplishing and it's certainly a less delicate operation than massaging the entire start line of the request with a regex. Here they are, together, with some optional whitespace:
http-request set-var(req.hotel) path,regsub(^/hotels/,) if { path_beg /hotels/ }
http-request set-path / if { var(req.hotel) -m found }
http-request set-query d=%[var(req.hotel)]&%[query] if { var(req.hotel) -m found }
This is a working solution using reqrep
acl is_destination path_beg /hotels/
reqrep ^([^\ :]*)\ /hotels/([^/\ \?]+)/?\??([^\ ]*)(.*)$ \1\ /?d=\2&\3\4 if is_destination
I'm hoping that the acl will remove the need to run regex on everything (hence lightening the load a bit), but I'm not sure that's the case.
I have an nginx reverse-proxy which needs to pass on the query string it receives. However this query string it receives is not well formatted and can contain JSON that is not URL encoded i.e. it contains curly brackets i.e. {}, commas, colons and double quotes! Unfortunately, I have no control over this and this causes the downstream server to barf when parsing the string.
Is there a way to correctly URL encode this string before proxying it?
I can replace the curly brackets as I know there will only be one instance of each using the config:
if ($args ~* '(.*){(.*)}(.*)') {
set $args $1%7B$2%7D$3;
rewrite (.*)$ $1;
}
proxy_pass http://127.0.0.1:8080;
However, I don't know in advance how many fields the JSON will have so it's difficult to use the same logic as above for the rest of the object.
I should also mention that I don't think this is related to nginx url-decoding parameters as I am not using a URI in the proxy_pass.
Thanks!
UPDATE: For the time being, the JSON object seems to be sending the same properties so this is what I've used as a workaround. It's pretty hideous and will break if the number of properties changes but does the job for now.
if ($args ~* '(.*){"(.*)":"(.*)","(.*)":"(.*)","(.*)":"(.*)","(.*)":"(.*)","(?<group10>.*)":"(?<group11>.*)"}(?<group12>.*)') {
set $args $1%7B%22$2%22%3A%22$3%22%2C%22$4%22%3A%22$5%22%2C%22$6%22%3A%22$7%22%2C%22$8%22%3A%22$9%22%2C%22${group10}%22%3A%22${group11}%22%7D${group12};
rewrite (.*)$ $1;
}
proxy_pass http://127.0.0.1:8080;
Note that since this returns more than 9 regex groups, I had to name groups 10, 11 and 12 otherwise they get interpreted as $1 + the digit 0, 1 or 2.
Is there a more robust way of doing this?
Personally, I don't like a solution with a single if statement, because it doesn't look very readable, flexible or maintainable. You may see whether having a combination of location or rewrite statements, where each one handles a specific encoding case, may work; see http://mdoc.su/ for a fun project that's very heavy with internal redirects, although I believe at one point nginx may have a limit on the total number of indirections.
Otherwise, provided that you cannot modify the backend, another option is to automatically redirect misbehaving clients and/or requests to an auxiliary backend, whose only purpose is to re-encode the string correctly, providing an X-Accel-Redirect HTTP Response Header as its output (as per http://nginx.org/r/proxy_ignore_headers), which nginx will use to make a subsequent internal redirect / request to the actual backend.
I'm using the Moovweb SDK and using Tritium. I want my mobile site to behave like my desktop site. I have different URLs pointing to my homepage. Should I use regex? A common element? And what's the best syntax for matching the path?
The mappings.ts file in the scripts directory is where particular pages are matched. The file is imported in html.ts and allows us to say "when a certain page is matched, make the following transformations."
Most projects already have a mappings file generated. A simple layout will be as so:
match($path) {
with(/home/) {
log("--> Importing pages/homes.ts in mappings.ts")
#import pages/home.ts
}
}
Every time you start working on a new page, you need to set up a new "map".
First: Match with a unique path
The Tritium above matches the path for the homepage. The path is the bit of a URL after the domain. For example, in www.example.com/search/item, "www.example.com" is the domain and "search/item" is the path.
The <>/home/<> is specifying the "home" part with regular expressions. You could also use a plain string if necessary:
with("home")
If Tritium matches the path with the matcher, it will import the home page.
It's probably true that the homepage of a site doesn't actually contain the word home. Most homepages are the URL without any matcher. A better string matcher could be:
match($path) {
with ("/")
}
Or, using regex:
with(/index|^\/$/) {
As you can see, the <>with()<> function of the mappings file is where knowledge of Regex can really come in handy. Check out our short guide on regex. Sometimes it will be simpler, such as <>(/search/)<>.
Remember to come up with the most unique aspect of the URL possible. If two <>with()<> functions match the same URL, then the one that appears first in the mappings file will be used. If you cannot find a unique URL matcher for different page types, you may have to match via other means.
Why Use Regex?
It might seem easier to use a string rather than a regex matcher. However, regex provides a lot more flexibility over which URLs are matched.
For example, a site could use a string of numbers in its product page URLs. Using a normal string matcher would not be practical - you'd have to list out all the numbers possible for all the items on the site. An easier way would be to use regex to say, "If there's a string of 5 digits, continue!" (The code for matching 5 digits: <>/\d{5}/<>.)
Second: Log the match
When matching a particular path, you should also use <>log()<> statements so you know exactly what's getting imported. The log statement will be printed in the command line window, so you can see if your regular expression accurately matches your path.
match($path) {
with(/index|^\/$/) {
log("--> importing pages/home.ts in mappings.ts")
}
}
Third: Import the file
Finally, use the <>#import<> function to include the page-specific tritium file.
match($path) {
with(/index|^\/$/) {
log("--> importing pages/home.ts in mappings.ts")
#import pages/home.ts
}
}
I am trying to build a homebrew web brower to get more proficient at Cocoa. I need a good way to validate whether the user has entered a valid URL. I have tried some regular expressions but NSString has some interesting quirks and doesn't like some of the back-quoting that most regular expressions I've seen use.
You could start with the + (id)URLWithString:(NSString *)URLString method of NSURL, which returns nil if the string is malformed.
If you need further validation, you can use the baseURL, host, parameterString, path, etc methods to give you particular components of the URL, which you can then evaluate in whatever way you see fit.
I've found that it is possible to enter some URLs that seem to be OK but are rejected by the NSURL creation methods. So we have a method to escape the string first to make sure it's in a good format. Here is the meat of it:
NSString *escapedURLString =
NSMakeCollectable(CFURLCreateStringByAddingPercentEscapes(NULL,
(CFStringRef)URLString,
(CFStringRef)#"%+#", // Characters to leave unescaped
NULL,
kCFStringEncodingUTF8));