.htaccess redirect still goes to 404 - regex

I have the following rewrite in my .htaccess file, it is still landing on a 404 instead of redirecting.
RewriteCond %{QUERY_STRING} tab=auto_data(.*)$
RewriteRule ^(.*)$ https://test.example.com/automobile-data/ [L,R=301]
There are multiple pages that can possibly have the tab=auto_data query string parameter, and there are quite possibly other QSP appended behind tab=auto_data as well.
I need to redirect any URL that contains the QSP of tab=auto_data to a new page in the site. The domain would remain the same, just the page name is changing.
What am I doing wrong here?

The only other directives are the standard WordPress directives.
In that case, your external redirect should come before any WordPress routing directives. The RewriteEngine directive only needs to appear once, anywhere, in the file. Although it is obviously more logical if it occurs once at the top.
You also need to remove the query string from the substitution, otherwise you'll get a redirect loop since the domain is the same. If the domain/host remains the same then this can be omitted from the substitution.
Try the following:
RewriteCond %{QUERY_STRING} tab=auto_data
RewriteRule ^polk/$ /automobile-data/? [R=301,L]
This specifically matches only the URL-path /polk/ (as mentioned in comments), unless this needs to be more general? And tab=auto_data must match anywhere in the query string.
The ? on the end of the substitution removes the query string and so prevents a redirect loop. (Presumably the query string should be removed from the target?) Although since the source and target URL paths are different, this is not strictly necessary anymore.
If the "domain remains the same", then there is no specific need to include the scheme and host in the substitution. Unless you are hosting multiple domains etc.?
Make sure the browser cache is cleared before testing as 301s are notorious for caching. Testing with 302s can be preferable for this reason.
UPDATE: To specifically remove this query string parameter, but copy the remaining query string onto the target, try something like:
RewriteCond %{QUERY_STRING} ^tab=auto_data(?:&(.+))?
RewriteRule ^polk/$ /automobile-data/?%1 [R=301,L]
(?:&(.+))? - grabs any remaining query string (if any), but excludes the & prefix (param delimiter) from the captured group. %1 is a backreference to this captured group.

Related

.htaccess and regex: trying to convert parts of my url with mod_rewrite doesn't work as expected

I'm a bit stuck trying to figure out .htaccess and mod_rewrite properly.
I know that 90% of the problem is my terrible regex skill, 10% is due to apache (or my knowledge around its mod_rewrite best practices).
We have a web service that will soon be replaced by a new one, similar in functionality but different in terms of urls, params and other things.
What needs to happen for our users (most of them can't perform this update on their end, so we have to do it on our side; we also don't build the tool directly nor have access to the source code and we agreed with the vendor that these redirects will not be done on this new web service.
What I need apache to do, with mod_rewrite is to be able to replace parameters in the querystring one by one, based on a mapping I provide
Then it should replace certain separators; ultimately, it should replace the HTTP_REFERER as well and redirect with 301.
Here's the code I have so far:
RewriteEngine On
RewriteBase /
RewriteRule ^(.*/)?\.svn/ - [F,L] ErrorDocument 403 "Access Forbidden"
# One group of RewriteCond/RewriteRule per parameter
RewriteCond %{QUERY_STRING} ^([^&]*(?:&.*)?)?param=([^&]*(?:&.*)?)$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1param_changed=%2 [N,NE]
RewriteCond %{QUERY_STRING} ^([^&]*(?:&.*)?)?another_param=([^&]*(?:&.*)?)$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1another_param_modified=%2 [N,NE]
...
# This is meant to replace all | with , within the url
RewriteRule ^(.*)|(.*)$ $1,$2 [N]
# This is the one that should finalise the url replace
RewriteCond %{REQUEST_URI} ^http://this.old.url/api/1/access/activity.(xml|csv)([^&]*(?:&.*)?))$ [NC]
RewriteRule ^ https://the.new.one/api/activities/?format=1&%2$ [R=301,NE,L]
This is the result I'm expecting from an example call:
input:
http://this.old.url/api/1/access/activity.xml?reporting-org=GB-GOV-1|GB-1&recipient-country=BD&stream=True
output:
https://the.new.one/api/activities/?reporting_organisation=GB-GOV-1,GB-1&recipient_country_id=BD&format=xml
I'm trying it with the htaccess tester found here and these are the issues I am still facing:
rewrite of parameters works fine, but each parameter's modified version does not get propagated to the next RewriteCond/RewriteRule group
I can't have that | matched (it gets converted in %7C in the url, but regardless, I can't have it match).
The resulting url, at the end is:
https://the.new.one/api/activities/?format=1&%2$ which leads me to think that the regex I specify in the associated RewriteCond is wrong and doesn't match, so this works partially as a side effect (it's basically replacing the whole url I think) but I need it to get that .xml/csv format and the query string afterwards. I can't seem to be able to fix that regex to work as I need it to.
I know there's a lot in this post, so thanks In advance to whoever can help me sort out the 3 issues I'm still facing

Redirect "ugly" URL to new URL using htaccess

I have already rewritten my old "ugly" URL:
http://example.com/ppd-brands/generic/?gen_id=Mjky
to
http://example.com/ppd-brands/generic/gen_id/Mjky
using the code below
RewriteRule ^ppd-brands/generic/gen_id/([^/]*)$ /ppd-brands/generic/?gen_id=$1 [L]
and it's working.
Now my problem is how can I redirect the old "ugly" URL to the new URL when the user visits the old "ugly" URL?
RewriteRule ^ppd-brands/generic/gen_id/([^/]*)$ /ppd-brands/generic/?gen_id=$1 [L]
Just a precursor... whilst your old "ugly" URL was of the form /ppd-brands/generic/?gen_id=Mjky, you should ideally be rewriting to the actual file that handles the request, eg. index.php, instead of allowing mod_dir to issue an additional internal subrequest to the directory index - which is what I assume is happening here.
For example:
RewriteRule ^ppd-brands/generic/gen_id/([^/]*)$ /ppd-brands/generic/index.php?gen_id=$1 [L]
Now, your main question... to externally redirect from the old "ugly" URL to the new URL. In this case, you need to be careful of a redirect loop, since if we simply redirect then the above rewrite will rewrite it back again in an endless loop. You can't use a mod_alias Redirect (as the other answer suggests) for this reason. (And a mod_alias Redirect can't match the query string either - another reason.)
Aside: Since we changed the above rewrite to include index.php in the rewritten URL, which would appear to differ from the old "ugly" URL, we could perhaps get away with a simple redirect if you are on Apache 2.4 (but Apache 2.2 would result in a conflict because mod_dir would issue an internal subrequest for index.php before we can process the URL with mod_rewrite).
We need to only redirect initial requests, not requests that we have already rewritten. We can do this by checking against the REDIRECT_STATUS environment variable, which is empty on the initial request and set to "200" (as in 200 OK HTTP status) after the first successful rewrite. (Another way is to check against THE_REQUEST, instead of the dynamic/rewritable URL-path.)
For example, try the following before your existing rewrite:
# Redirect "old" to "new"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^gen_id=([^/&]*)
RewriteRule ^(ppd-brands/generic)/(?:index\.php)?$ /$1/gen_id/%1 [QSD,R=302,L]
Note that in order to match the query string we need a condition (RewriteCond directive) that checks against the QUERY_STRING server variable. The URL-path matched by the RewriteRule pattern notably excludes the query string.
The index.php in the request URL is optional, so it matches /ppd-brands/generic/?gen_id=Mjky or /ppd-brands/generic/index.php?gen_id=Mjky (if that is the actual URL).
The $1 backreference is simply to save typing/duplication. This will always contain ppd-brands/generic when the directive matches. We could have done the same with "gen_id", but that could make the susbstitution string look a bit too cryptic.
The %1 backreference (note the % prefix) is a backreference to the captured group in the last matched CondPattern (as opposed to $1 which refers to the RewriteRule pattern), ie. the value of the gen_id URL parameter.
The QSD flag (Apache 2.4+) strips the query string from the redirected URL. Otherwise gen_id=XYZ would be passed through to the target URL. If you are still on Apache 2.2 then you would need to append a ? to the end of the substitution string instead (essentially an empty query string). eg. /$1/gen_id/%1?
The "magic" is really the first condition that checks the REDIRECT_STATUS env var. As mentioned above, this ensures that we only process initial requests and not the rewritten request, thus avoiding redirect loop.
Note that this is currently a 302 (temporary) redirect. Only change to a 301 (permanent) once you have tested this works OK. 301s are cached persistently by the browser so can make testing problematic.
And just to clarify... a redirect like this should only be implemented once you have already changed all the URLs in your application. This redirect is to simply redirect search engines, backlinks and anyone who should manually type the URL (unlikely).
Redirect 301 /oldurl.htm /newurl.htm
change old and new URL according to your need. Hope it helps you

How to write RewriteRules for .htaccess?

I have a PHP file named as otp.php.
When URL in the URL bar is
http://localhost/college/otp/MTA=/teacher
It should be treated as
http://localhost/college/otp.php?user=MTA=&role=teacher
For this, I created .htaccess file in http://localhost/college/:
RewriteEngine On # Turn on the rewriting engine
RewriteRule ^otp/?$ otp.php [NC,L]
RewriteRule ^otp/[A-Za-z-]+/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
But, otp.php file says:
Notice: Undefined index: role in C:\wamp\www\college\otp.php on line
11
Notice: Undefined index: user in C:\wamp\www\college\otp.php on line
11
UPDATE
When URL in the URL bar is
http://localhost/college/otp/MTA/teacher
It should be treated as
http://localhost/college/otp.php?user=MTA&role=teacher
How do I solve this problem?
URL: /college/otp/MTA=/teacher
Target: /college/otp.php?user=MTA=&role=teacher
.htaccess: /college/.htaccess
RewriteRule ^otp/[A-Za-z-]+/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
Your RewriteRule pattern needs a slight modification to match your example URL, since it will currently fail on the = in MTA=. (Although I've just noticed that the "update" to your question does not show a = in the URL?) This pattern also needs to be capturing in order for the $1 to pick it up.
So, the above directive should read something like:
RewriteRule ^otp/([A-Za-z-]+=)/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
This assumes that = always appears at the end of the path segment, as in your initial example (include it inside the character class if it can occur anywhere - although would be a bit confusing). The NC flag is probably unnecessary, unless you also need to allow mixed case versions of otp (unadvisable). You already allow for mixed case in your regex.
UPDATE#1: It seems the second path segment is a base64 encoded string/integer. For this you will need to include digits in the regex and there could be 0, 1 or 2 trailing = characters. There is also no need to match a hyphen. For example:
RewriteRule ^otp/([A-Za-z0-9]+={0,2})/([A-Za-z0-9-]+)/?$ otp.php?user=$1&role=$2 [NC,L]
However, the other problem you seem to be experiencing (and the one which you are actually "seeing") is a conflict with MultiViews (part of mod_negotiation). This needs to be disabled for the above mod_rewrite directive to work (in fact, to do anything). If you are not enabling this in .htaccess then disable it by including the following at the top of your .htaccess file:
Options -MultiViews
If MultiViews is enabled then when you request otp (where a file with the same basename exists which would also return an appropriate mime-type) mod_negotiation issues an internal subrequest for otp.php. The problem here is that this occurs before mod_rewrite, so otp.php ends up being called without any URL parameters.
Aside:
Your code should not be generating these "undefined index" notices. Since this is essentially "user provided data", you should check for it in your script. For example:
$role = isset($_GET['role']) ? $_GET['role'] : null;
RewriteEngine On # Turn on the rewriting engine
Note that Apache does not support line-end comments, so you should remove the # Turn on the rewriting engine text from the first line. (Line-end comments can appear to "work", however, that is just a coincidence with how Apache directives work in general, other times they will result in a 500 internal server error.)
UPDATE#2:
If the URL bar has http://localhost/college/otp.php?user=MTA=&role=teacher, can it be changed to http://localhost/college/otp/MTA/teacher?
Yes this can be done. Although I assume that MTA= should appear in both places? (You have MTA= in the source and MTA in the target, which would presumably corrupt the base64 encoding?) I assume you are already linking to the correct URL internally and this is only to benefit stray requests (search engines, backlinks, etc.?)
You can implement an external redirect before the above rewrite, being careful not to redirect the rewritten URL and triggering a redirect loop. For example:
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^user=([A-Za-z0-9]+={0,2})&role=([A-Za-z0-9-]+)
RewriteRule ^(otp)\.php$ /college/$1/%1/%2 [QSD,R=302,L]
This is basically the reverse of the internal rewrite (that appears later in the .htaccess file). The condition that checks against the REDIRECT_STATUS environment variable ensures that it only triggers for direct requests and not rewritten requests.
Note that since this is an external redirect, you need to include a root-relative URL-path in the substitution argument. ie. include the /college subdirectory. (Or, you can use a relative substitution and set the RewriteBase - although you'd only do this if you have several of these directives.)
$1 is a backreference to the RewriteRule pattern (ie. always otp) and %1 and %2 are backreferences to the preceding CondPattern, ie. the value of the user and role URL parameters respectively.
The QSD flag (Apache 2.4+) discards the original query string from the request.
This tool might help you to write correct expressions for your RewriteRules. Maybe, this expression would give you an idea, where the problems may be:
(.*otp)\/([A-Za-z0-9-]+=)\/([A-Za-z0-9-]+)
RegEx Descriptive Graph
This link helps you to visualizes your expressions for the RewriteRule:
Then, you can write a RewriteRule, maybe something similar to:
<IfModule mod_rewrite.c>
RewriteEngine On # Turn on the rewriting engine
RewriteCond %{HTTP_HOST} localhost [NC]
RewriteRule (.*otp)\/([A-Za-z0-9-]+=)\/([A-Za-z0-9-]+) $1.php?user=$2&role=$3 [NC,L]
</IfModule>
You might want to clear your browser history, every time that you modify your .htaccess file.

match multiple slashes in url, but not in protocol

i try to catch multiple slashes at the ende or inside of url, but not those slashes (which could be two or more), which are placed after protocol (http://, https://, ftp:// file:///)
i tried many findings in similar SO-threads, like ^(.*)//+(.*)$ or [^:](\/{2,}), or ^(.*?)(/{2,})(.*)$ in http://www.regexr.com/, https://regex101.com and http://www.regextester.com/. But nothing worked clear for me.
Its pretty weird, that i can't find a working example - this goal isn't such rar. Could somebody share a working regex?
Here is a rule that you can use in your site root .htaccess to strip out multiple slashes anywhere from input URLs:
RewriteEngine On
RewriteCond %{THE_REQUEST} //
RewriteRule ^.*$ /$0 [R=301,L,NE]
THE_REQUEST variable represents original request received by Apache from your browser and it doesn't get overwritten after execution of some rewrite rules. Example value of this variable is GET /index.php?id=123 HTTP/1.1.
Pattern inside RewriteRule automatically converts multiple slashes into single one.

What's wrong with this regular expression in a .htaccess file?

I'm trying to understand why this regular expression isn't working in my .htaccess file. I want it so whenever a user goes to the job_wanted.php?jid=ID, they will be taken to job/ID.
What's wrong with this?
RewriteEngine On
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php?$ job/%1? [R]
I want it so when a user clicks on http://localhost/jobwehave.co.za/jobs/ID they are shown the same results as what below would show http://localhost/jobwehave.co.za/jobs?id=ID.
Sorry for the mix up. I still very confused to how this works.
The primary problem is that you can't match the query string as part of RewriteRule. You need to move that part into a RewriteCond statement.
RewriteEngine On
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php$ /job/%1?
Editing to reflect your updated question, which is the opposite of what I've shown here. For the reverse, to convert /job/123 into something your PHP script can consume, you'll want:
RewriteEngine On
RewriteRule ^/job/([0-9]+)$ /path/to/job_wanted.php?jid=$1
But you're probably going to have trouble putting this in an .htaccess file anywhere except the root, and maybe even there. If it works at the root, you'll likely need to strip the leading / from the RewriteRule I show here.
Second edit to reflect your comment: I think what you want is complicated, but this might work:
RewriteEngine On
RewriteRule ^/job/([0-9]+)$ /path/to/job_wanted.php?jid=$1 [L]
RewriteCond %{QUERY_STRING} jid=([0-9]+)
RewriteRule ^job_wanted\.php$ http://host.name/job/%1? [R]
Your fundamental problem is that you want to "fix" existing links, presumably out of your control. In order to change the URL in the browser address bar, you must redirect the browser. There is no other way to do it.
That's what the second cond+rule does: it matches incoming old URLs and redirects to your pretty URL format. This either needs to go in a VirtualHost configuration block or in the .htaccess file in the same directory as your PHP script.
The first rule does the opposite: it converts the pretty URL back into something that Apache can use, but it does so using an internal sub-request that hopefully will not trigger another round of rewriting. If it does, you have an infinite loop. If it works, this will invoke your PHP script with a query string parameter for the job ID and your page will work as it has all along. Note that because this rule assumes a different, probably non-existent file system path, it must go in a VirtualHost block or in the .htaccess file at your site root, i.e. a different location.
Spreading the configuration around different places sounds like a recipe for future problems to me and I don't recommend it. I think you'll be better off to change the links under your control to the pretty versions and not worry about other links.
The ^ anchors the regex at the beginning of the string.
RewriteRule matches the URI beginning with a / (unless it's in some per-directory configuration area).
Either prefix the / or remove the anchor ^ (depending on what you want to achieve)
You haven't captured the job ID in the regex, so you can't reference it in the rewritten URL. Something like this (not tested, caveat emptor, may cause gastric distress, etc.):
RewriteRule ^job/([0-9]+) job_wanted.php?jid=$1
See Start Rewriting for a tutorial on this.
You need to escape the ? and . marks if you want those to be literals.
^job_wanted\.php\?jid=9\?$
But although that explains why your pattern isn't matching, it doesn't address the issue of your URL rewriting. I'm also not sure why you want the ^ and $ are there, since that will prevent it from matching most URLs (e.g. http://www.yoursite.com/job_wanted.php?jid=9 won't work because it doesn't start with job_wanted.php).
I don't know htaccess well, so I can only address the regex portion of your question. In traditional regex syntax, you'd be looking for something like this:
s/job_wanted\.php\?jid=(\d*)/job\/$1/i
Hope that helps.
Did you try to escape special characters (like ?)?
The ? and . characters have a special meaning in regular expressions. You probably just need to escape them.
Also, you need to capture the jid value and use it in the rule.
Try to change your rules to this:
RewriteEngine On
RewriteRule ^job_wanted\.php\?jid=([0-9]+)$ /job/$1
Something like
ReWriteRule ^job\_wanted\.php\?jid\=([0-9-]+)$ /job/$1
should do the trick.