HTACCESS rule to encode get params (%3f) - regex

title pretty much surmises what I am trying to achieve,
this is a server only thing, no other intermediate languages but
Apache. The purpose of this is to render downloaded webpages that
have GET requests appended to them, and treat them like independent
web pages.
RewriteEngine On
RewriteRule ^(.*)$ /foo/bar/$1 [R=301,NC,L]
I know that this is possible as I did it by accident yesterday, but forgot to take notes on how it was accomplished.

You can try this rule for turning /index.asp%3fid=12345 internally into /index.asp?id=12345:
RewriteCond %{THE_REQUEST} \s/+([^?]*)\?([^=]+=[^\s&]+)
RewriteRule ^ /%1\%3f%2? [L,NE,R]
RewriteRule ^([^.]+\.(?:php|asp))[^=]+=(.+)$ /$1?id=$2 [NC,L,QSA]

Related

Apache Rewrite to remove index.php?

I am trying to rewrite my URL's to remove index.php? but I'm struggling a little to get it to work. The closest I can get is the answer here: remove question mark from 301 redirect using htaccess when the user enters the old URL
I need to convert the URLs to pretty URLs on the way out, and rewrite them back to the proper URL on the way in. The structure of the URLs is as follows:
https://sub.domain.com/index.php?/folder1/folder2-etc
Using the code from the referenced answer results in a double forward slash:
https://sub.domain.com//folder1/folder2-etc
The rewrite rules I'm using from the referenced answer are:
RewriteEngine On
RewriteCond %{THE_REQUEST} /index\.php [NC]
RewriteRule ^(.*?)index\.php$ /$1 [L,R=301,NC,NE]
RewriteCond %{THE_REQUEST} \s/+\?([^\s&]+) [NC]
RewriteRule ^ /%1? [R=301,L]
# internal forward from pretty URL to actual one
RewriteRule ^((?!web/)[^/.]+)/?$ /index.php?$1 [L,QSA,NC]
I suspect I know how to solve the first bit, but I'm struggling to understand the second rule for the internal forward.
Additionally, I'm wondering if this is the best way to do this. I'm currently running an Apache backend behind an Nginx reverse proxy. Would I be better doing the rewrite on the Nginx side and the internal forward on Apache?
EDIT:
Complication: I've noticed an additional structure to complicate things. Some URLs appear to have https://sub.domain.com/picture.php?/folder1/folder2-etc
For these, I'd be quite happy to keep 'picture' and just remove the .php? bit.
I'm guessing that for the first bit, Id need to do something like the following:
RewriteCond %{THE_REQUEST} \s/+index\.php\?/([^\s&]+) [NC]
RewriteRule ^ /%1? [R=301,L]
RewriteCond %{THE_REQUEST} \s/+picture\.php\?/([^\s&]+) [NC]
RewriteRule ^(.*)$ /picture/%1 [R=301,L]
But have no idea where to start with the opposite.... ie converting pretty urls back to standard. It would help if the following section could be explained to me?
^((?!web/)[^/.]+)/?$ /index.php?$1 [L,QSA,NC]
RewriteRule ^/*picture/(.*)$ /picture.php?/$1 [L]
RewriteRule ^/*(?!/*index\.php$)(.*)$ /index.php?/$1 [L]
should do the trick. I wasn't able to test it yet though.
I only used the [L] last flag to stop applying rules on match. The QSA query string append flag doesn't seem to make sense as you don't seem to use ?key=value&... syntax anyway. Also dunno if you actually need the NC case-insensitive flag...
Side note:
I hope your php files don't serve paths with .. in them, as that would allow people to read arbitrary files from disk, e.g. /picture/../../../etc/passwd
Apologies, but as it turns out, the main reason I can't get anything to work is due to the use of relative URLs and dynamically generated links within the PHP. Not something I can change unfortunately. The not perfect URLs are something I'm going to have to live with. For reference, the app I'm using is Piwigo

Cannot undertand the mixed outcomes of an .htaccess re-write rule / regex

I have a simple website comprised of one page with a div that gets populated with ajax content based on the links the user selects. This site is running on an Apache server with an .htaccess file in the domain's root directory. Requests to www.mydomain.com are directed to scripts/index.php while requests for dynamic content (but not resource files) are directed to the same .php script with the requested content passed as a parameter (e.g., www.mydomain.com/myProject will be rewritten as scripts/index.php?dynContent=myProject).
My rewrite rules are below and for the most part they are performing those described tasks properly; however, I've encountered some URLs that do not match the second condition even though I would expect them to -- though this is the first time I've had to write rules for an .htaccess file so I don't really know what I'm talking about... A good example of a URL that fails the second condition is www.mydomain.com/about, but I've encountered many more just by testing random words/letters.
Can you tell me why www.mydomain.com/about fails the second condition? Also, if there is a more elegant way to achieve the objectives I described above, I would love to learn about it. Thank you!!
RewriteCond %{HTTP_HOST} ^(www.)?mydomain.com$ [NC]
RewriteRule ^(/)?$ scripts/index.php [L]
RewriteCond %{REQUEST_URI} .*[^index.php|.css|.js|.jpg|.html|.swf]$
RewriteRule .* scripts/index.php?dynContent=$1 [L]
This is because regex in your 2nd rules is incorrect.
Change your code to:
RewriteCond %{HTTP_HOST} ^(www\.)?mydomain\.com$ [NC]
RewriteRule ^(/)?$ scripts/index.php [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !\.(php|css|js|jpe?g|html|swf)$
RewriteRule ^(.*)$ scripts/index.php?dynContent=$1 [L]

I need to rewrite all my urls. Should i put R=301 in RewriteRule?

Currently my URLS are horrible.
They are like:
http://www.racebooking.net/single_news.php?id=211
And i want them to look better and to be more SEO Friendly, like
http://www.racebooking.net/news/video-122.html
I am going to do it through Apache .htaccess. Surfing the web i found many different opinions about SEO. Some people say it's not good to use RewriteRule because it creates duplicated content and kills pagerank, but you have to send a 301 message.
Here comes the question: it's better to use
RewriteRule Pattern Substitution
or
RewriteRule Pattern Substitution [R=301,L]
to make my URLS look better without worsening my SEO?
Place this code in your DOCUMENT_ROOT/.htaccess file:
RewriteEngine On
# external redirect from actual URL to pretty one
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s/+single_news\.php\?id=([^\s&]+) [NC]
RewriteRule ^ /news/%1.html? [R=301,L]
# internal forward from pretty URL to actual one
RewriteRule ^news/([^/.]+)/?$ /single_news.php?id=$1 [L,QSA,NC]

RewriteCond in .htaccess with negated regex condition doesn't work?

I'm trying to prevent, in this case WordPress, from rewriting certain URLs. In this case I'm trying to prevent it from ever handling a request in the uploads directory, and instead leave those to the server's 404 page. So I'm assuming it's as simple as adding the rule:
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/
This rule should evaluate to false and make the chain of rules fail for those requests, thus stopping the rewrite. But no... Perhaps I need to match the cover the full string in my expression?
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/.*$
Nope, that's not it either. So after scratching my head I do a check of sanity. Perhaps something is wrong with the actual pattern. So I make a simple test case.
RewriteCond %{REQUEST_URI} ^/xyz/$
In this case, the rewrite happens if and only if the requested URL is /xyz/ and shows the server's 404 page for any other page. This is exactly what I expected. So I'll just stick in a ! to negate that pattern.
RewriteCond %{REQUEST_URI} !^/xyz/$
Now I'm expecting to see the exact opposite of the above condition. The rewrite should not happen for /xyz/ but for every other possible URL. Instead, the rewrite happens for every URL, both /xyz/ and others.
So, either the use of negated regexes in RewriteConds is broken in Apache, or there's something fundamental I don't understand about it. Which one is it?
The server is Apache2.
The file in its entirety:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/
RewriteRule . /index.php [L]
</IfModule>
WordPress's default file plus my rule.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_URI} !^/wp-content/uploads/ [OR]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
So, after a lot of irritation, I figured out the problem, sort of. As it turned out, the rule in my original question actually did exactly what it was supposed to. So did a number of other ways of doing the same thing, such as
RewriteRule ^wp-content/uploads/.*$ - [L]
(Mark rule as last if pattern matches) or
RewriteRule ^wp-content/uploads/.*$ - [S=1]
(Skip the next rule if pattern matches) as well as the negated rule in the question, as mentioned. All of those rules worked just fine, and returned control to Apache without rewriting.
The problem happened after those rules were processed. Instead, the problem was that I deleted a the default 404.shtml, 403.shtml etc templates that my host provided. If you don't have any .htaccess rewrites, that works just fine; the server will dish up its own default 404 page and everything works. (At least that's what I thought, but in actual fact it was the double error "Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.")
When you do have a .htaccess, on the other hand, it is executed a second time for the 404 page. If the page is there, it will be used, but now, instead the request for 404.shtml was caught by the catch-all rule and rewritten to index.php. For this reason, all other suggestions I've gotten here, or elsewhere, have all failed because in the end the 404 page has been rewritten to index.php.
So, the solution was simply to restore the error templates. In retrospect it was pretty stupid to delete them, but I have this "start from scratch" mentality. Don't want anything seemingly unnecessary lying around. At least now I understand what was going on, which is what I wanted.
Finally a comment to Cecil: I never wanted to forbid access to anything, just stop the rewrite from taking place. Not that it matters much now, but I just wanted to clarify this.
If /wp-content/uploads/ is really the prefix of the requested URI path, your rule was supposed to work as expected.
But as it obviously doesn’t work, try not to match the path prefix of the full URI path but only the remaining path without the contextual per-directory path prefix, in case of the .htaccess file in the document root directory the URI path without the leading /:
RewriteCond $0 !^wp-content/uploads/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .+ /index.php [L]
If that doesn’t work neither, it would certainly help to get some insight into mod_rewrite’s rewriting process by using its logging feature. So set RewriteLogLevel to a level of at least 4, make your request and take a look at the entries in the log file specified with RewriteLog. There you can see how mod_rewrite handles your request and with RewriteLogLevel greater or equal to 4 you will also see the values of variables like %{REQUEST_URI}.
I have found many examples like this when taking a "WordPress First" approach. For example, adding:
ErrorDocument 404 /error-docs/404.html
to the .htaccess file takes care of the message ("Additionally, a 404 Not Found error...").
Came across this trying to do the same thing in a Drupal site, but might be the same for WP since it all goes through index.php. Negating index.php was the key. This sends everything to the new domain except old-domain.org/my_path_to_ignore:
RewriteCond %{REQUEST_URI} !^/my_path_to_ignore$
RewriteCond %{REQUEST_URI} !index.php
RewriteCond %{HTTP_HOST} ^old-domain\.org$ [NC]
RewriteRule ^(.*)$ http%{ENV:protossl}://new-domain.org/$1 [L,R=301]

Temporary redirect 302 with .htaccess and mod-rewrite matching expression

I'm trying to match a a bunch of redirects for my website with basically moved to a different folder on the server. I need to make http://www.site.com/index.php?page=anypage go to http://www.site.com/newfolder/index.php?page=anypage. The thing is http://www.site.com/index.php and http://www.site.com/index.php?page=home should remain untouched. How can I accomplish this?
I was trying the following in the .htaccess file, but I am affraid to make a mistake. I really don't know how to test this, either.
Options +FollowSymlinks
RewriteEngine on
RewriteRule ^/index.php?page=(.*)$ http://www.site.com/newfolder/index.php?page=$1 [R=302,NC]
RewriteRule ^/index.php?page=home http://www.site.com/index.php?page=home [R=302,NC,L]
Now I figured that this is temporary, so I should know ho to reverse it! The next week, the links will have to redirect again to the root server. Also, what should I do to re-establish the normal redirection??
If I've followed your scenario correctly, you want something like this:
RewriteEngine On
RewriteCond %{QUERY_STRING} !=""
RewriteCond %{QUERY_STRING} !page=home
RewriteRule ^index.php /newfolder/index.php [R,L]
As far as testing goes, I prefer to try rules out on a local test server. If you have full control over the server (as is the case locally), there are some mod_rewrite directives that help you log what's going on, and that can be helpful in debugging. The module documentation has more information about this.
Edit: When you want to switch back, modify the RewriteRule above like so:
RewriteRule ^newfolder/index\.php /index.php [R,L]