Wordpress Rewrite Rules - regex

UPDATE
I've gone through the codex docs on the rewrite API and now have the following in my functions.php :
function my_rewrite_rules() {
add_rewrite_rule('(a|b|c|d)/?$', 'index.php?pagename=$matches[1]-overview&myVar=var', 'top');
}
add_action('init', 'my_rewrite_rules');
Yes, I am going to the permalinks page to flush the rules after adjusting. The behavior is the same, the rule above 404s even though the page does exist and I can access it by typing directly into the address bar. However, if I hardcode one of the regex matches like so:
function my_rewrite_rules() {
add_rewrite_rule('(a|b|c|d)/?$', 'index.php?pagename=a-overview&myVar=var', 'top');
}
add_action('init', 'my_rewrite_rules');
then all works as expected, with query vars set correctly. Ideas?
ORIGINAL QUESTION
I've been trying to get Wordpress rewrite rules to work for quite some time now and am absolutely stumped as to why the following code (in functions.php) doesn't work:
function my_rewrite_rules($rules) {
$my_rules = array('(a|b|c|d)/?$' => 'index.php?pagename=$matches[1]-overview&my_var=somevar');
return array_merge($my_rules, $rules);
}
add_filter('page_rewrite_rules', 'my_rewrite_rules');
I have canonical redirects disabled and the rewrite just 404s. If redirecting is enabled, it does go to the correct page, but my query variable is stripped. If I remove '$matches[1]' and replace it with a, b, c, or d, everything works as expected with canonical redirecting disabled. I realize there are a few workarounds but I just want to understand why the following doesn't work? Thanks!

Apparently having the $matches variable directly after the pagename query variable is treated as a special case in the url_to_post() function of Wordpress. Here is a snippet from that code:
if ( $wp_rewrite->use_verbose_page_rules && preg_match( '/pagename=\$matches\[([0-9]+)\]/', $query, $varmatch ) ) {
// this is a verbose page match, lets check to be sure about it
if ( ! get_page_by_path( $matches[ $varmatch[1] ] ) )
continue;
}
If I read this correctly it seems that Wordpress assumes (incorrectly) that the $matches variable should match the page path. So in your example, if you do not have a page with the name a, b, c or d your rewrite rule will be skipped entirely (continue will be called).
I've deduced this from reading the Wordpress code, but I've not tested my theory (I've actually never worked with Wordpress at all). You could test my theory by making a pages with the names a, b, c, and d and running your code again. If I am correct this should make your rule work. I would suggest not using -overview behind your pagenames, thus solving the problem.

Related

Searching for url paths containing "/" in Kibana/EllasticSearch

I'm trying to write a regex in Kibana (v 7.9.1) and I want to get all paths that are like /rest/requirements/<ID_HERE>/ and nothing else at the end. I would expect that the following would work:
"/rest/requirements/[0-9]*/"
After several tests, I noticed that the following query don't work either: "/rest/requirements/"
While if I do .*requirements.*, for example, it works.
So there is something with "/" that I cannot understand. I tried the following as well without success:
.rest.requirements.*
//rest//requirements.*
\/rest\/requirements.*
\\rest\\requirements.*
Btw, I am using the filter as Query DSL as shown below.
Problem solved:
For some reason, that specific regex I mention on the question was not working with that JSON I was passing. But was working for other types. No idea why yet.
The following, however, worked just fine:
{
"regexp": {
"path.keyword": ".*/requirements/[0-9]*/"
}
}

How to replace characters in an nginx variable string?

Is there a way I can replace non alphanumeric characters returned with $request_uri with a space (or a +)?
What I'm trying to do is redirect all 404's in one of my sites to it's search engine, where the query is the uri requested. So, I have a block in my nginx.conf containing:
error_page 404 = #notfound;
location #notfound {
return 301 $scheme://$host/?s=$request_uri;
}
While this does indeed work, the url's it's returning are the actual uri's complete with -_/ characters causing the search to always return 0 results
For instance... give this url: https://example.com/my-articles, the redirect ends up as this: https://example.com/?s=/my-articles
What I would like is to end up (ultimately) like this: https://example.com/?s=my+articles (tho, the + at the beginning works fine too... https://example.com/?s=+my+articles
I will need to do this without LUA or Perl modules. So, how can I accomplish this?
You may need to tweak this depending upon how far down your directory structure you want the replacement to go, but this is the basic concept.
Named location for initial capture of 404s:
location #notfound {
rewrite (.*) /search$1 last;
}
Named locations are a bit limiting, so all this does is add /search/ to the beginning of the URI which returned 404. The last flag tells Nginx to break out of the current location and select the best location to process the request based on the rewritten URI, so we need a block to catch that:
location ^~ /search/ {
internal;
rewrite ^/search/(.*)([^a-z0-9\+])(.*)$ /search/$1+$3 last;
rewrite ^/search/(.*)$ /?s=$1 permanent;
}
The internal directive makes this location only accessible to the Nginx process itself, any client requests to this block will return 404.
The first rewrite will change the last non text, digit or + character into a + and then ask Nginx to reevaluate the rewritten URI.
The location block is defined with the ^~ modifier, which means requests matching this location will not be evaluated against any regex defined location blocks, so this block should keep catching the rewritten requests.
Once all the non word characters are gone the first rewrite will no longer match so the request will be passed to the next rewrite, which removes the /search from the front of the URI and adds the query string.
My logs look like this:
>> curl -L -v http://127.0.0.1/users-forum-name.1
<< "GET /?s=users+forum+name+1 HTTP/1.1"
>> curl -L -v http://127.0.0.1/users-forum-name/long-story/some_underscore
<< "GET /?s=users+forum+name+long+story+some+underscore"
You get the idea..
You can use lua module, transform this variable to what you need using lua string functions. I'am using OpenResty which is basicly nginx with lua enabled. But nginx lua module will do fine. Here is directive that allows you to use lua inside nginx configuration. It could be inside location using content_by_lua_block / access_by_lua_block or in separate file using content_by_lua_file / access_by_lua_file. Here is documentation on this https://github.com/openresty/lua-nginx-module#content_by_lua .
Here is an example from my app.
location ~/.*\.jpg$ {
set $test '';
access_by_lua_block {
ngx.var.test = string.sub(ngx.var.uri, 2)
}
root /var/www/luaProject/img/;
try_files $uri /index.html;
}
It is generally a bad idea to automatically issue redirects from 404 Not Found pages to elsewhere — the user might have simply mistyped a single character in the URL (e.g., on a mobile phone whilst copying the URL from a flier and having a "fat finger"), which would be very easy to correct once they see a 404 and the obvious typo in the address bar, yet may require starting from scratch if your search-engine doesn't deliver.
If you still want to do it, it might be more efficient to do it within the search engine itself — after all, if your search engine isn't capable of searching by URL, and correcting typos, then it doesn't sound like a very useful search engine, now does it?
If you still want to do it within the nginx alone in front of the search engine, then you can use the fact that http://nginx.org/r/rewrite directives essentially let you implement any sort of a DFA — Deterministic Finite Automaton — but, depending on the number of replacements required, it may result in too many cycles and somewhat inflexible rulesets.
Take a look at the following resources on recursive replacements of given characters within the URL for other characters:
How to replace underscore to dash with Nginx
nginx rewrite rule to remove - and _
https://serverfault.com/questions/477103/how-do-i-verify-site-ownership-on-google-webmaster-tools-through-nginx-conf
http://mdoc.su/

GoogleTagManager | Parsing URL - With or Without regex

I want to pass into a variable, the language of the user.
But, my client can't/didn't pass this information trough datalayer. So, the unique solution I've is to use the URL Path.
Indeed - The structure is:
http://www.website.be/en/subcategory/subsubcategory
I want to extract "en" information
No idea to get this - I check on Stack, on google, some people talk about regex, other ones about CustomJS, but, no result on my specific setup.
Do you have an idea how to proceed on this point ?
Many thanks !!
Ludo
Make sure the built in {{Page Path}} variable is enabled. Create a custom Javascript variable.
function() {
var parts = {{Page Path}}.split("/");
return parts[1];
}
This splits the path by the path delimiter "/" and gives you an array with the parts. Since the page path has a leading slash (I think), the first part is empty, so you return the second one (since array indexing starts with 0 the second array element has the index 1).
This might need a bit of refinement (for pages that do not start with a language signifier, if any), but that's the basic idea.
Regex is an alternative (via the regex table variable), but the above solution is a little easier to implement.

Rewrite engine. How to translate URL

I am new to regular expressions and rewrite engine
I want to translate:
domain.com/type/id
on
domain.com/index.php?type=type&id=id
I use
RewriteRule (\w+)/(\d+)$ ./index.php?id=$1&type=$2
I works almost fine and I am able to get two variables but website has a problem with including other files. My main URL is: http://domain.com/repos/site and after trying to type an URL like http://domain.com/repos/site/ee/9, firebug says:
"NetworkError: 404 Not Found - http://domain.com/repos/site/ee/lib/geoext/script/geoext.js"
It seems site takes "ee" as a part of ulr, not as a GET variable.
Yes, you will certainly have to change your paths. Paths behavior:
- href="mypath": will append "/mypath" to the URL from the current URL
- href="./mypath": same as before
- href="/mypath": will append mypath to the root. This is the behavior you want
Note: you can also use "../" to come back to the parent directory of where you are.

Nginx request_uri without args

How do I get the value of request_uri without the args appended on the end. I know there is a uri variable but I need the original value as the Nginx documentation states:
request_uri
This variable is equal to the original request URI as received from
the client including the args. It cannot be modified. Look at $uri for
the post-rewrite/altered URI. Does not include host name. Example:
"/foo/bar.php?arg=baz"
I use this map, which works without lua:
map $request_uri $request_uri_path {
"~^(?P<path>[^?]*)(\?.*)?$" $path;
}
You're looking for $uri. It does not have $args. In fact, $request_uri is almost equivalent to $uri$args.
If you really want exactly $request_uri with the args stripped, you can do this.
local uri = string.gsub(ngx.var.request_uri, "?.*", "")
You will need to have lua available but that will do exactly what you're asking.
The accepted answer pointed me in the right direction, but I had to figure out where to add that directive, After some time investigating I found set_by_lua_block
set_by_lua_block $request_uri_path {
return string.gsub(ngx.var.request_uri, "?.*", "")
}
I hope it saves some time to those who comes here.
Ryan Olson's answer is really good but map cannot be used everywhere.
If you need to get that information at a block level that does not accept the use of map, it can be done in a if statement. But remember that if is evil.
set $new_uri "";
if ($request_uri ~ "^([^?]*)(\?.*)?$") {
set $new_uri $1;
}
This code works out of the box (it does not use Lua or anything else).