Nginx: Escaping # in url rewrite - facebook-graph-api

I have a MVC JavaScript application that needs to support Facebook sharing, which means it needs to support unique OG meta HTML tags.
I'm doing an Nginx rewrite that will detect the Facebook crawler to server a custom version of the app with the proper OG tag for that section but Apache is ignore everything after the # sign (as server-side should do since that's a browser feature.) I would like to escape the "#" in my rewrite but am not sure how to do it in Nginx:
location / {
if ($http_user_agent ~* 'facebookexternalhit') {
rewrite ^(.*)$ /og.php?url=http://$host$uri;
proxy_pass http://127.0.0.1:8080;
break;
}
root /var/www/html/site.net;
}
Thanks for taking a look!

You cannot or don't have to. If you have an URL in your browser like http://www.example.tld/site.html#anchor then your browser's request will only consist of the non-anchor part: http://www.example.tld/site.html. After receiving the content the browser will look for a named anchor called anchor and scroll the page so that its content is visible.
Meaning nginx will never see the character #.
If, on the other hand, a website contains a link with # being part of the path part of the URL (and this is rather rare) then it has to be escaped with the usual URL escaping of %xx with xx being the hexadecimal number of that chacter -- %23 in the case of #.

Related

django url pattern strips forward slash when running in nginx+uwsgi

For the following url pattern:
re_path(r'proxy/(?P<url>.*)', myview)
When I send proxy/http://www.google.com
myview function receives url as http:/www.google.com (with single /)
It happens with uwsgi+nginx setup , when running with runserver url is http://www.google.com .
This is because nginx will automatically merge double slashes in URLs into a single one:
http://nginx.org/en/docs/http/ngx_http_core_module.html#merge_slashes
Enables or disables compression of two or more adjacent slashes in a URI into a single slash.
Note that compression is essential for the correct matching of prefix string and regular expression locations. Without it, the “//scripts/one.php” request would not match
You should disable it in your nginx.conf:
merge_slashes off;

Nginx rewrite: add trailing slash, preserve anchors and query strings

I need to ensure that all permalinks on a given site end with a trailing slash. That is, any URL that refers to a location that doesn't correspond to an actual, individual file. I also need to preserve any query strings and/or anchors that are passed with the URL.
Example
Say I have a page at the following location:
example.com/about/
If I get the following requests, I want them to rewrite as shown:
example.com/about > example.com/about/
example.com/about?src=fb > example.com/about/?src=fb
example.com/about#contact > example.com/about/#contact
example.com/about#contact?src=fb > example.com/about/#contact?src=fb
However, I want to make sure that I do not rewrite for any actual file paths - anything with a file extension.
What I have so far
This is the regex I have come up with thus far, which only addresses excluding real file paths, and adding a trailing slash when the end of the string doesn't have one:
^([^\.]*$(?<!\/))
I have not yet been able to figure how how to determine whether a trailing slash is present when there are anchors or query strings, and once that's established how to separately capture the parts that should be before the trailing slash and after it in order to assemble the final rewrite.
As it turns out, the regex I came up with does in fact address all of my rewrite needs. Here is the final result in my Nginx server configuration:
location / {
try_files $uri $uri/ #rewrites;
}
# Rewrite rules to sanitize and canonicalize URLs
location #rewrites {
rewrite ^([^\.]*$(?<!\/)) $1/ last;
}

rewrite any combination of index.html

I am interested in URL redirect of any letter case combination of index.html to all lowercase of index.html.
ie:
/foo/bar/INDEX.html
to
/foo/bar/index.html
or
/hello/world/funk/indeX.HTML
to
/hello/word/fund/index.html
I have tried couple regex but no luck. I am interested in Redirect only if there are any uppercase(s) in index.html
/hello/there/index.html
should not redirect anywhere.
I have access to httpd.conf hence I am using RewriteMap lc int:tolower
Try this: (?!index\.html)(?i)index\.html(?-i) it first checks if the string is not index.html, and then matches any string that is case insensitive index.html. Try it here: https://regex101.com/r/GNhAwG/1

Stripping "blog" from wordpress URLs with Regex

I've managed to get myself totally mired in the world of regex and could do with a hand out.
I've recently moved my wordpress install from just the /blog subdomain to the whole site so I need to strip /blog/ from all incoming urls except an exact match as /blog is still the blog.
For example I need:
http://foo.com/blog/bar
http://foo.com/blog/foobar/bar
foo.com/blog/bar
to all lose the /blog/ but I need
http://foo.com/blog
foo.com/blog
to keep theirs.
I'm using the wordpress Redirections plugin to manage this as it tracks 404 errors which put me on to this.
Can anyone help!?
You need to replace \/blog\/(.+) with /$1
Only replacing /blog/ with / would also replace http://foo.com/blog/
Update:
That works perfectly except that my previous previous post structure
was /y/m/d/postname. How would I strip that as well when it's present
but not affect the other redirect when it isn't?
In this case you could use \/blog\/(\d{4}\/\d{2}\/\d{2}\/(.+)).
Result:
url: http://foo.com/blog/2014/03/09/bar
$1: 2014/03/09/bar
$2: bar
Update 2:
In case you wanted to have both ways stripped use \/blog\/(?:\d{4}\/\d{2}\/\d{2}\/)?(.+)
Result:
url: http://foo.com/blog/bar
url: http://foo.com/blog/2014/03/09/bar
$1: bar
url: http://foo.com/blog/tag/sometag
$1: tag/sometag
Why not try simple replace()?. Unless I misunderstood the question, isn't the output shown the same as what you need?
public static void main(String[] args) {
String s = "http://foo.com/blog/bar";
String s1 = "http://foo.com/blog";
System.out.println(s.replace("/blog/", "/"));
System.out.println(s1.replace("/blog/", "/"));
}
O/P :
http://foo.com/bar
http://foo.com/blog

Explain this mod_rewrite rule

Can anyone explain what this mod_rewrite rule is doing?
I'm trying to comment the file, but the code seems to state the opposite of what I think it's doing
# Enable rewriting of URLs
RewriteEngine on
# Allow specified file types to be accessed
# Thing to test = URL
# Condition = not starting with
RewriteCond $1 !^(index\.php|images|css|js|robots\.txt)
# RewriteRule will only be performed if the preceeding RewriteCond is fulfilled
# Remove index.php from all URLs
# Pattern = anything (0 or more of any character)
# Substitution = index.php + the rest of the URL
RewriteRule ^(.*)$ /index.php/$1 [L]
The browser sends a request to the server (Apache, since you're using mod_rewrite):
GET profile/edit
Apache accepts this request and sees in its configuration files that you've configured it to pass all requests through mod_rewrite. So, it sends the string 'profile/edit' to mod_rewrite. Mod_rewrite then applies the rules you specified to it, which then transforms the request (in the way I explained in my previous post) to 'index.php/profile/edit'. After mod_rewrite is done, Apache continues processing the request, and sees 'oh, this guy is requesting the file index.php'. So it calls the php interpreter which then parses and executes index.php - and gets '/profile/edit' as arguments. The php code (CI in your case) parses these arguments and knows how to call the right module in your application.
So basically, it's a way to always call index.php, even when the url doesn't specify index.php. In that way, index.php works as the front controller: it routes all requests to the right location in your application.
^ = begin of line
( = begin group
.* = any character, any number of times
) = end group
The $1 in the second part is replaced by the group in the first part.
Is this a Symfony rule? The idea is to pass the whole query string to the index.php (the front controller) as a parameter, so that the front controller can parse and route it.
If the URL does not start with index.php or images or css or js or robots.txt, the string "/index.php/" is prefixed.
As index.php is probably an executable php app, the index.php then can read the rest of the URL from its cgi environment. (it is stored in ${PATH_INFO})