product/1 instead of product.php?id=1 - regex

This is not really a problem but a question.
I have this file that shows the product information when you go to URL/product.php?id=1 How can I make it show the same when going to URL/product/1. id is a variable that changes.
Sorry, I have no clue how htaccess works, and how to rewrite..

You need to internally rewrite the request from /product/1 to /product.php?id=1. On Apache, you need to do this with mod_rewrite. In .htaccess this would take the form of:
# We must enable the rewrite engine before using mod_rewrite
RewriteEngine On
# Internally rewrite a request from "/product/1"
RewriteRule ^product/1$ /product.php?id=1 [L]
Note that this literally rewrites from /product/1 to /product.php?id=1 (as stated in the question) nothing else. And is internal to the server - the URL in the browser's address bar does not change.
The arguments to Apache directives are space separated:
^product/1$ - The first argument (pattern) to the RewriteRule directive is a regular expression (regex) that matches against the URL-path (only) of the request. Note that in .htaccess this URL-path does not start with a slash, so the URL-path that is matched is product/1 not /product/1, even though you are requesting example.com/product/1.
/product.php?id=1 - The second argument (substitution) is the string that is substituted for the requested URL. ie. the target URL. This is an "ordinary" string, not a regex.
[L] - The third argument (flags) are additional options that can change how the RewriteRule directive behaves. The argument must be surrounded in square brackets and contains a comma separated list of flags. The L (or last) flag signifies this is the last directive in this round of processing. If this is the last directive in the file then the L flag is not required. If you omit the L flag then processing continues and the request could be further rewritten (if you have more directives). Another common flag is the R (or redirect) flag. This changes the internal rewrite into an external redirect (which sends a Location HTTP response header back to the client and results in the browser being externally redirected to the new URL - the URL in the browser's address bar changes).
Additional Note: In this instance, since you are requesting "product" and a file with that basename exists (in fact, that is the file you are rewriting to) you also need to make sure that MultiViews is disabled (it is by default). If MultiViews is enabled (some shared hosts enable this for some reason) then mod_negotiation will trigger an internal subrequest for product.php before your mod_rewrite directive gets to rewrite the request and this will be missing the id URL parameter. (Numerous rewriting issues on SO are caused by conflicts with MultiViews.) To disable MultiViews, you can include this at the top of your .htaccess file:
Options -MultiViews
More generic
To make this more generic and rewrite /product/<number> to /product.php?id=<number>, where <number> is 1 or more digits, you can modify the regex (first argument) and create a backreference that you use in the substitution string (second argument). For example:
# Internally rewrite a request from "/product/<number>"
RewriteRule ^product/(\d+)$ /product.php?id=$1 [L]
This would rewrite a URL of the form /product/123 to /product.php?id=123. Where 123 is 1 or more digits, denoted by the regex subpattern \d+. (\d is a shorthand character class and is the same as the marginally more verbose [0-9]. + is a quantifier that indicates 1 or more of the preceding pattern - in this case digits.) By surrounding this in parentheses, we create a capturing group, which we can refer to in the substitution. That's what the $1 backreference is. $1 is essentially a variable that contains whatever value the regex captured.
In summary:
# Disable MultiViews
Options -MultiViews
# We must enable the rewrite engine before using mod_rewrite
RewriteEngine On
# Internally rewrite a request to "/product/<number>"
RewriteRule ^product/(\d+)$ /product.php?id=$1 [L]
Reference:
https://httpd.apache.org/docs/2.4/mod/mod_rewrite.html#rewriterule

Related

Simplifying redirect in htaccess

I have redirects:
RewriteRule ^(.*)/thema(.*)$ https://www.newurl.com [R=301,L]
RewriteRule ^(.*)/stichpunkt(.*)$ https://newurl.com [R=301,L]
RewriteRule ^(.*)/author(.*)$ https://www.newurl.com [R=301,L]
RewriteRule ^(.*)/2023(.*)$ https://www.newurl.com [R=301,L]
is there a way to simplify these into one line?
I need to disable category, tag, author and date archives in Wordpress
This should really be done in WordPress itself. Otherwise WP is still going to generate and publish these URLs (eg. Sitemap, RSS feed, etc.).
Otherwise, if .htaccess is your only option then you should serve a 404, rather than redirect to the homepage. Whilst a redirect to the homepage is likely to be treated as a soft-404 by Google (and possibly other search engines) it runs the risk of being indexed under these "archive" URLs (and accessible with a site: search).
For example, at the top of the root .htaccess file (before any existing WP directives):
# Whatever your custom 404 page is (could be WordPress)
ErrorDocument 404 /404.php
# Force a 404 for "category, tag, author and date archives
RewriteRule (^|/)(thema|author|stichpunkt|2\d{3})(/|$) - [R=404]
2\d{4} matches any 4 digit year (in the 2000's).
The regex matches any of those "words" only when they occur as a whole path segement (not partial matches).
R=404 - This is not a "redirect" (despite the use of the R flag). The 404 error document is served via an internal subrequest and a 404 HTTP response code is set on the initial response. If these URLs have previously been indexed then consider changing this to a "410 Gone" instead, ie. R=410 or simply G (shorthand flag).
You can use a simple alternation with |:
RewriteRule ^.*/(thema|author|stichpunkt|2023) https://www.newurl.com [R=301,L]
You don't need to capture parts that you don't need to refer back to, so I removed the () around the .*. Around the alternation they are still needed so even if you are not interested in capturing that part, otherwise it would not be clear where the first value starts and the last one ends.
And you don't need to match .*$ either, you can just leave of the $ that anchors this pattern at the end.

How to match string if it doesn't contain only numbers after slash?

I am redirecting certain urls with path to get variables like the following:
localhost2/post/myTitle => localhost2/post.php?title=myTitle
localhost2/post/123 => localhost2/post.php?id=123
So In my htaccess file, I use
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule ^post/(\d+) post.php?id=$1
RewriteRule ^post/(.*) post.php?title=$1
</IfModule>
This works no problem. But I want to learn how to write negative of ^post/(\d+), that is ^post/(NEGATE-ONLY-NUMBERS). In other words I want a regex that matches the whole input sting if there is not only numbers after post/. So post/abc, post/a23, post/ab3, post/12c and post/a2c should all pass but not post/123. I refered to this post, which suggest using:
(?!^\d+$)^.+$
I can't use ^post/(?!^\d+$)^.+$, because there can be only one ^ and one $. I don't know what regex anchor specifies first position in a substring. My best guess is
post\/(?!\d++).*
I think (?!\d++), with the ++ would eat all characters followig and check if all are digits. But this fails at post/1ab.
Another guess is:
post\/(?![\d,\/]+$).*
The works the best but it allows: post/3455/X.
Secondly, eventually I need to convert localhost2/post/myTitle/123 => localhost2/post.php?title=myTitle&repeat=123 as well. I ave come up with the following:
^post/(?!\d+($|/))(.+?($|/))(\d+$)?
Note: +? to use lazy quantifier, otherwise multiple slashes will be matched by .
and
^post/(?!\d+($|/))([^/\n\r]+($|/))(\d+$)?
Here I use [^/\n\r] instead of .+?
Patterns inside zero-width assertions like (?!\d++) are non-consuming, they do not "eat" chars, they only check the context while keeping the regex index at the same location as before matching the zero-width assertion pattern.
You can use any of the following:
^post/(?!\d+(?:/|$)).*
^post/(?!\d+(?=/|$)).*
^post/(?!\d+(?![^/])).*
See the regex demo. Details:
^post/ - start of input, post/ literal string
(?!\d+(?=/|$)) - a negative lookahead that fails the match if, immediately to the right of the current location, there are one or more digits followed with / or end of string
.* - the rest of the input.
Do not over complicate things when you can keep things simple by keeping 3 separate rewrite rules and since your query parameters are named differently you will need 3 separate rewrite rules anyway.
Consider:
Options -MultiViews
RewriteEngine On
RewriteRule ^post/(\d+) post.php?id=$1 [L,QSA,NC]
RewriteRule ^post/([^/]+)/(\d+) post.php?title=$1&repeat=$2 [L,QSA,NC]
RewriteRule ^post/([^/]*) post.php?title=$1 [L,QSA,NC]
Take note of Options -MultiViews. If this is not enabled in Apache config you must have it here otherwise it will keep all $_GET parameters empty in your php file.
Option MultiViews (see http://httpd.apache.org/docs/2.4/content-negotiation.html) is used by Apache's content negotiation module that runs before mod_rewrite and makes Apache server match extensions of files. So if /file is the URL then Apache will serve /file.html.

Redirect the URL from one query string to another

I have spent a great many hours trying to find a solution to this and tried many different approaches but nothing I have tried has worked so far.
I would like to redirect a URL with a query string to another URL that contains the value of that query string.
I want to redirect:
https://example.com/component/search/?searchword=XXXXXXXXX&searchwordsugg=&option=com_search
to
https://example.com/advanced-search?search=XXXXXXXXX
You can do something like the following using mod_rewrite at the top of your root .htaccess file:
RewriteEngine On
RewriteCond %{QUERY_STRING} (?:^|&)searchword=([^&]*)
RewriteRule ^component/search/?$ /advanced-search?search=%1 [NE,R=302,L]
The RewriteRule pattern matches against the URL-path only, which notably excludes the query string. To match against the query string we need a separate condition that checks against the QUERY_STRING server variable.
%1 is a backreference to the first capturing group in the preceding CondPattern, ie. the value of the searchworld URL parameter.
The regex (?:^|&)searchword=([^&]*) matches the searchworld URL parameter anywhere in the query string, not just at the start (as in your example). This also permits an empty value for the URL parameter.
The NE flag is required to prevent the captured URL parameter value being doubly encoded in the response. (Since the QUERY_STRING server variable is not %-decoded.)
The L flag prevents further processing during this pass of the rewrite engine.
Reference:
Apache docs: RewriteRule Directive
Apache docs: RewriteCond Directive

How to rewrite URLs from UPPERCASE to lowercase in .htaccess

I'm trying to assemble a regular expression to rewrite a URL containing uppercase characters to the same URL but in all lowercase.
Example:
example.com/foO-BAR-bAz rewrite to example.com/foo-bar-baz
example.com/FOO-BAR-BAZ rewrite to example.com/foo-bar-baz
example.com/foo-bar-baz does not match
I tried ^\/(?=.*[A-Z]) to match a string with at least one uppercase character but it doesn't match the full string. I also know that I need to use a "capturing group" but I'm not sure how.
I would be implementing this redirect rule in an .htaccess file of an Apache server
If you are on Apache 2.4 then in .htaccess you can use mod_rewrite with an Apache expression and make use of the tolower() function. (This doesn't require the use of a RewriteMap in the server config, as mentioned in comments.) For example:
RewriteEngine On
RewriteCond expr "tolower(%{REQUEST_URI}) =~ /(.*)/"
RewriteRule [A-Z] %1 [L]
The RewriteRule pattern simply checks there is at least one uppercase letter in the requested URL-path. The RewriteCond directive then calls the tolower() function on the URL-path (REQUEST_URI server variable) which is then effectively captured using the regex. The %1 backreference in the substitution string then holds the result of the tolower() function call, ie. the lowercased URL-path, which is internally rewritten to.
To "correct" the URL and issue an external redirect, then just add the R flag to the RewriteRule directive. For example:
:
RewriteRule [A-Z] %1 [R=301,L]
UPDATE: To eliminate a double redirect when redirecting HTTP to HTTPS (and/or non-www vs www) then include the full canonical URL as part of this rule and implement the canonical (scheme + hostname) redirects second.
For example:
# 1 - Upper to lowercase conversion (and HTTPS and WWW)
RewriteCond expr "tolower(%{REQUEST_URI}) =~ /(.*)/"
RewriteRule [A-Z] https://www.example.com%1 [R=301,L]
# 2 - HTTP to HTTPS
:
Note that the %1 backreference already includes the slash prefix at the start of the URL-path, so this is omitted in the substitution string.
HOWEVER, it is not necessarily incorrect to have a double redirect in this situation. ie. Redirect HTTP to HTTPS (same hostname and URL-path) first then canonicalise other elements of the requested URL (hostname, upper/lowercase URL-path etc.) second. These should be edge cases to begin with, so the real-world impact is minimal.
Note that if you are implementing HSTS then it is a requirement that you first redirect from HTTP to HTTPS on the same hostname, before canonicalising the hostname (ie. www vs non-www). In this case you should use the HTTP_HOST server variable (ie. %{HTTP_HOST}) as the hostname in the above redirect. A double redirect cannot be avoided in this scenario.

Unknown number of regex replacements, how?

I need to change a large number of URIs in the following way:
substitute %20 separators with dashes -,
substitute the old root with a new domain.
Examples:
/old_root/first/second.html -> http://new_domain.com/first/second
/old_root/first/second%20third.html -> http://new_domain.com/first/second-third
/old_root/first/second%20third%20fourth.html -> http://new_domain.com/first/second-third-fourth
The best I came up with using regex is to write as many pattern-replacement rules as the maximum number of %20 separators that can occur in my URIs:
old_root/(.*?)/(.*?)\.html -> http://new_domain.com/$1/$2
old_root/(.*?)/(.*?)%20(.*?)\.html -> http://new_domain.com/$1/$2-$3
old_root/(.*?)/(.*?)%20(.*?)%20(.*?)\.html -> http://new_domain.com/$1/$2-$3-$4
My question is: is it possible to obtain the same result using a single regular expression rule?
I thought I could use a two-step approach: first change all %20 separators to - and then use the rule old_root/(.*?)/(.*?)\.html -> http://new_domain.com/$1/$2/. However, I need to apply this rule in a .htaccess file as a RedirectMatch directive and, as far as I know, it is not possible to use two successive rules for the same redirect directive.
It turns out that Apache recursively applies all regex rules until they stop matching. Therefore, I am allowed to write two rules rather than one to solve my problem.
The following rules do what I was looking for, and more; I have tested them on my local Apache server and they work fine. Note that for them to work, you need to first turn on the rewrite engine by prepending
RewriteEngine on
Options +FollowSymlinks -MultiViews
in the local .htaccess file or in the global httpd.conf file.
Replace all spaces with hyphens
Replace both literal spaces and %20 with hyphens:
RewriteRule ^(.+)(\s|%20)(.+)$ /$1-$3 [R=301,NE,L]
Replace all apostrophes with hyphens
Replace all literal apostrophes and %60 with hyphens:
RewriteRule ^(.+)('|`|%60)(.+)$ /$1-$3 [R=301,NE,L]
Delete the trailing .html extension
RewriteRule (.+)\.html$ $1 [R=301,L]
Convert the last field in the URL to lower case
Convert the last field in the URL to lower case and prepend the new domain:
RewriteRule /whatever/(.*?)/(.*?)/(.*) http://new.domain.com/$1/$2/${lc:$3} [R=301,L]
Important: The lowercase conversion will only work if you include the following lines at the end of the Apache configuration file httpd.conf, which is usually located in the etc directory on the server:
RewriteEngine on
RewriteMap lc int:tolower
A last note: I recommend prepending each rule with a RewriteCond directive to restrict the field of application of the rule. For example, to apply the space-to-hyphens rule only to those URI that match a certain regex, you should write the following in your .htaccess file:
RewriteCond %{REQUEST_URI} regex_for_URIs
RewriteRule ^(.+)(\s|%20)(.+)$ /$1-$3 [R=301,NE,L]
where regex_for_URIs is the regular expression that the URI must match in order for the next RewriteRule to be applied; it can also be a simple string.
Well, you were almost done.
Problems
Don't return "%20" - We'll Use them as "delimiter" of parts of the path
Add condition on third & fourth group (because you might pass short URL i.e. your examples)
Solution
\/old_root\/(.*?)\/(\w*)(?:%20)?(\w*)?(?:%20)?(\w*)?\.html
See Demo
Explanation
(?:%20)? means "%20" is non catching group that can occurs 0 or 1 time.
Same logic applyies on possible 3rd & 4th part.