Disallow strings in the URL - regex

I'm trying to figure out how to disallow any strings in my URL (I had a problem with an old host allowing ?PHPSESSID strings) and want to avoid any pages with a string to get indexed. Here is my current .htaccess. I've tried a few rules in the beginning that didn't work. Any other thoughts that might force strings back to the last segment?
http://pastie.org/pastes/8660658/text

Have it this way:
RewriteEngine On
RewriteBase /
## force a trailing slash ##
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)([^/])$ /$1$2/ [L,R=301]
# strip all query strings
RewriteCond %{QUERY_STRING} .+
RewriteRule ^ %{REQUEST_URI}? [L,R=301]
## BEGIN Expression Engine Rewrite ##
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond $1 !\.(gif|jpe?g|png)$ [NC]
RewriteRule ^(.*)$ /index.php/$1 [L]
## END Expression Engine Rewrite ##

Related

Redirect to default language except for /amp/ URLs

I'm struggling to solve a redirection but without any success.
I changed the URLs of my site forcing a default language, before it was site.com/help/ and now it's site.com/en/help/. Thanks to help from Stack Overflow I made the redirection, but then I faced a new problem with the AMP pages: site.com/amp/help/ are now redirected to site.com/en/amp/help/ while they are supposed to be site.com/amp/en/help/.
Again, thanks to help on this site, I changed the structure of URLs to site.com/en/help/amp/ (amp always at the end). To achieve this, I had to delete the .php extension I had in some pages and also decided to remove the trailing slash.
I'm now facing two new issues: the 301 redirection to a non .php page and URLs with trailing slashes to a non trailing slash don't work. Below is my htaccess code.
RewriteEngine on
# amp
RewriteRule ^(.*/)?amp/(.+?)/?$ /$1$2/amp [R=301,NC,L]
## redirect to default language (fr)
RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteCond %{REQUEST_URI} !/inc
RewriteCond %{REQUEST_URI} !/ajax/
RewriteCond %{REQUEST_URI} !/img/
RewriteRule ^(?![a-z]{2}(?:[/-]|$))(.*)$ /fr/$1 [R=301,L,NE]
## Unless directory, remove trailing slash
RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+)/$ /$1 [R=301,NE,L]
## add trailing slash in front of directories
RewriteCond %{DOCUMENT_ROOT}/$1 -d
RewriteRule ^[a-z]{2}(?:-[a-z]{2})?/(.+)$ /$1/ [L]
# remove .php
RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteCond %{REQUEST_FILENAME} !global.js
RewriteCond %{REQUEST_URI} !/ajax/
RewriteCond %{REQUEST_URI} !results.php
RewriteRule ^(.+)\.php(.*)$ /$1$2 [R=301,NC,NE,L]
## amp pages
RewriteRule ^(.*)/amp$ /$1?amp=1 [NC,QSA,L]
## folders of languages
#RewriteRule ^([a-z]{2}(?:-[a-z]{2})?)/(.*)$ /$2?lang=$1 [QSA,L]
RewriteRule ^([a-z]{2}|[a-z]{2}-[a-z]{2})$ /$2?lang=$1 [QSA,L]
RewriteRule ^([a-z]{2}|[a-z]{2}-[a-z]{2})/(.*)$ /$2?lang=$1 [QSA,L]
## hide .php extention
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_FILENAME} !global.js
RewriteRule ^(.+?)/?$ $1.php [L]
Have it this way:
RewriteEngine on
# changed amp URLs
RewriteRule ^(.*/)?amp/(.+?)/?$ /$1$2/amp/ [R=301,NC,L]
## redirect to default language (en)
RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteCond %{REQUEST_URI} !/img/
RewriteRule ^(?![a-z]{2}(?:[/-]|$))(.*)$ /en/$1 [R=301,L,NE]
## Unless directory, remove trailing slash
RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !/amp/$ [NC]
RewriteRule ^(.+)/$ /$1 [R=301,NE,L]
## add trailing slash in front of directories after lang rewrite
RewriteCond %{DOCUMENT_ROOT}/$1 -d
RewriteRule ^[a-z]{2}(?:-[a-z]{2})?/(.+[^/])$ /$0/ [L]
# remove .php
RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteRule ^(.+)\.php$ /$1/ [R=301,NC,NE,L]
## amp pages
RewriteRule ^(.+/)amp/?$ /$1?amp=1 [NC,QSA,L]
## folders of languages
RewriteRule ^([a-z]{2}(?:-[a-z]{2})?)/(.*)$ /$2?lang=$1 [QSA,L]
## hide .php extention
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_FILENAME} !global.js
RewriteRule ^(.+?)/?$ $1.php [L]
Explanation of this trailing slash rule:
## add trailing slash in front of directories after lang rewrite
RewriteCond %{DOCUMENT_ROOT}/$1 -d
RewriteRule ^[a-z]{2}(?:-[a-z]{2})?/(.+[^/])$ /$0/ [L]
Take an example URI: /fr/cart.
In a later rule we remove lang component from URL and pass it as lang=<fr|en> query parameter. Part after lang parameter e.g. /cart doesn't have a trailing slash and if it is a real directory then /cart?lang=fr will be redirected to /cart/?lang=fr by Apache's mod_dir module and your internal URL will be exposed in browser.
So in this current rule we capture part after lang component and check if we don't have a trailing slash and it is a directory then this rule internally rewrites to /fr/cart/ with a trailing slash. Later rule then rewrites it to /cart/?lang=fr and mod_dir doesn't redirect anymore.
#anubhava's solution works perfectly well except for one little case: /fr/amp/page.php redirects to /fr/page.php/amp, I had to make some changed to the code and managed to make it work. Below is the updated code with small changes I made:
1- removed some slashes at the end of some rules as I don't need them any more
2- removed this rule RewriteCond %{REQUEST_URI} !/amp/$ [NC]
3- to fix the .php problem, I replaced RewriteRule ^(.+)\.php$ /$1/ [R=301,NC,NE,L] by RewriteRule ^(.+)\.php(.*)$ /$1$2 [R=301,NC,NE,L].
RewriteEngine on
## changed amp URLs
RewriteRule ^(.*/)?amp/(.+?)/?$ /$1$2/amp [R=301,NC,L]
## redirect to default language (en)
RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteCond %{REQUEST_URI} !/img/
RewriteRule ^(?![a-z]{2}(?:[/-]|$))(.*)$ /en/$1 [R=301,L,NE]
## Unless directory, remove trailing slash
RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+)/$ /$1 [R=301,NE,L]
## add trailing slash in front of directories
RewriteCond %{DOCUMENT_ROOT}/$1 -d
RewriteRule ^[a-z]{2}(?:-[a-z]{2})?/(.+)$ /$1/ [L]
# remove .php
RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteRule ^(.+)\.php(.*)$ /$1$2 [R=301,NC,NE,L]
## amp pages
RewriteRule ^(.+/)amp$ /$1?amp=1 [NC,QSA,L]
## folders of languages
RewriteRule ^([a-z]{2}(?:-[a-z]{2})?)/(.*)$ /$2?lang=$1 [QSA,L]
## hide .php extention
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_FILENAME} !global.js
RewriteRule ^(.+?)/?$ $1.php [L]

Remove trailing slash and create query string with .htaccess

In my .htaccess file, I would like to remove the trailing slash from the URL without modifying the current setup for the query string.
I tried this in a two-step fashion, but it's not working as I expect:
# Remove trailing slash
RewriteRule ^(.+)/$ $1 [R=301]
# Create query string from canonical URL
RewriteRule ^(.+)$ index.php?url=$1 [QSA,L]
Cheers
You can use:
RewriteEngine On
RewriteBase /site/dev/
## Unless directory, remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+?)/$ $1 [R=302,L,NE]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+)$ index.php?url=$1 [QSA,L]

Conditionally redirecting directory structure URL to html file

My goal is to redirect any url such as http://mydomain.com/somename to http://mydomain.com/somename.html except for http://mydomain.com/name1 and http://mydomain.com/name2 For these two, I wish to redirect them to http://mydomain.com/main.php?g1=name1 (or name2, etc). If these later two have two or three more directories in the URL (i.e. http://mydomain.com/name1/val2/val3), I wish to add them as individual GET values such as http://mydomain.com/main.php?g1=name1&g2=val2&g3=val3. I would like the browser to keep showing the directory path, and not something like http://mydomain.com/somename.html.
Below is my unsuccessful attempt. How can I accomplish this? Thank you
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
## If the request is for a valid directory, file, or link, don't do anything
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -l
RewriteRule ^ - [L]
RewriteCond %{REQUEST_URI} !^/name1 [OR] RewriteCond %{REQUEST_URI} !^/name2
RewriteRule ^(.*)$ $1.html [L]
RewriteRule ^([^/]+)/([^/]+)/([^/]+)/?$ ?p=$1&c=$2&v=$3 [L,QSA]
RewriteRule ^([^/]+)/([^/]+)/?$ p=$1&c=$2 [L,QSA]
RewriteRule ^([^/]+)/?$ ?p=$1 [L,QSA]
</IfModule>
You have the right idea in most places, but you need to change the order of your rules to match more specific stuff before less specific stuff. Generally, I'm allowing a trailing / in the rules below via /?. Remove that from the end if you won't permit a trailing / to be matched.
RewriteEngine on
RewriteBase /
# This should be fine as you have it....
## If the request is for a valid directory, file, or link, don't do anything
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -l
RewriteRule ^ - [L]
# Reverse your rule order here.
# First match name1|name2 with an optional trailing /
# but nothing else following...
RewriteRule ^(name1|name2)/?$ main.php?g1=$1 [L,QSA]
# Two additional dirs
RewriteRule ^(name1|name2)/([^/]+)/([^/]+)/?$ main.php?g1=$1&g2=$2&g3=$3 [L,QSA]
# Three additional dirs
RewriteRule ^(name1|name2)/([^/]+)/([^/]+)/([^/]+)/?$ main.php?g1=$1&g2=$2&g3=$3&g4=$4 [L,QSA]
# Last, do the generic rule to rewrite to .html
# using [^.]+ to match anything not including a .
# You could be more specific with something like [a-z]+ if that
# corresponds to your expected input
RewriteRule ^([^.]+)$ $1.html [L,QSA]
If you wanted to check that the URI is not name1, name2 in a condition, use an [AND] which is implicit:
RewriteCond %{REQUEST_URI} !/name1
RewriteCond %{REQUEST_URI} !/name2
RewriteRule ^([^.]+)$ $1.html [L,QSA]

htaccess replace character in query and redirect

I need to replace '_' to '+' in query string than redirect:
site.com/abc_def/
to
site.com/search.php?q=abc+def
I tried this
RewriteRule ^([^/]+)/((.*)\_(.*))?$ /search.php?q=$1+$2 [R=301,L]
These are 2 rules that should work for you:
RewriteEngine On
# first replace _ by + recursively
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^_]*)_(.*)$ /$1+$2 [L]
# once all _s are gone, rewrite to /search.php?q=<search>
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^_]+)$ /search.php?q=$1 [L,QSA]

Using htaccess to force a trailing slash before the ? with a query string?

I have the following in my htaccess file:
RewriteEngine On
RewriteBase /
# Check to see if the URL points to a valid file
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# Trailing slash check
RewriteCond %{REQUEST_URI} !(.*)/$
# Add slash if missing & redirect
RewriteRule ^(.*)$ $1/ [L,R=301]
# Check to see if the URL points to a valid file
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# Send to index.php for clean URLs
RewriteRule ^(.*)$ index.php?/$1 [L]
This does work. It hides index.php, and it adds a trailing slash... except when there is a query string.
This URL:
http://example.com/some-page
gets redirected to:
http://example.com/some-page/
but this URL:
http://example.com/some-page?some-var=foo&some-other-var=bar
does not get redirected. I would like for the above to be sent to:
http://example.com/some-page/?some-var=foo&some-other-var=bar
I've reached the limits of my understanding of redirects with this. If you have a working answer, I would really appreciate a walkthrough of what every line is doing and why it works. Double bonus awesomeness for an explanation of why what I have right now doesn't work when there is a query string involved.
Try adding a [QSA] to the end of the last Redirect rule to preserve the original query string as below
# Send to index.php for clean URLs, preserve original query string
RewriteRule ^(.*)$ index.php?/$1 [L,QSA]
a walkthrough of what every line is doing and why it works.
See my comments below
#turn mod_rewrite engine on.
RewriteEngine On
#set the base for urls here to /
RewriteBase /
### if the is not a request for an existing file or directory
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
### and the URI does not end with a /
RewriteCond %{REQUEST_URI} !(.*)/$
### redirect and add the slash.
RewriteRule ^(.*)$ $1/ [L,R=301]
### if the is not a request for an existing file or directory
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# rewrite to index.php passing the URI as a path, QSA will preserve the existing query string
RewriteRule ^(.*)$ index.php?/$1 [L,QSA]
I believe that if you change this:
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ $1/ [L,R=301]
to this:
RewriteCond %{REQUEST_URI} !^([^?]*)/($|\?)
RewriteRule ^([^?]*) $1/ [L,R=301]
then it should do what you want.
The changes I made are:
In both rewrite-condition and -rule, I changed (.*) and ^(.*) to ^([^?]*), to ensure that, if there's a query-string, then it is not included in either regex. ([^…] means "any character that is not in …", so [^?] means "any character that is not a question mark".)
In the rewrite-condition, I changed $ to ($|\?), so as to match either end-of-URL or end-of-part-before-the-query-string.
In the rewrite-rule, I dropped the $, since it was no longer needed.