htaccess clean urls & replacing whitespaces and %20 with - - regex

I'm strugling to make this work. At the moment my htaccess contains the following code:
#Debugging - Error reporting
php_flag display_startup_errors on
php_flag display_errors on
php_flag html_errors on
#Commpression
<ifmodule mod_deflate.c="">
<filesmatch ".(js|css|html|png|jpg|jpeg|swf|bmp|gif|tiff|ico|eot|svg|ttf|woff|pdf)$"="">
SetOutputFilter DEFLATE
</filesmatch>
</ifmodule>
Options All -Indexes +FollowSymLinks -MultiViews
<IfModule mod_rewrite.c>
# Turn mod_rewrite on
RewriteEngine On
RewriteBase /
#RewriteCond %{THE_REQUEST} (\s|%20)
RewriteRule ^([^\s%20]+)(?:\s|%20)+([^\s%20]+)((?:\s|%20)+.*)$ $1-$2$3 [N,DPI]
RewriteRule ^([^\s%20]+)(?:\s|%20)+(.*)$ /$1-$2 [L,R=301,DPI]
#RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !^.*\.(png|jpg|bmp|gif|css|js)$ [NC]
RewriteRule ^([^/]+/?.+)$ /index.php?req=$1 [L,QSA]
</IfModule>
Everything works great except 1 thing if I try this url for example:
http://www.domain.com/ test/
the browser translates it like to: http://www.domain.com/%20test/
basically after the domain if the path starts with a whitespace or a %20 it fails.
can anyone please point to a solution where the starting spaces will be removed ?
UPDATE
The goal:
www.domain.com/ this is a test / hello there /
or
www.domain.com/ this is a test
to
www.domain.com/this-is-a-test/ or www.domain.com/this-is-a-test/hello-there

I am guilty of writing that code more than 2 years back :P
That can be hugely simplified by this code:
# remove spaces from start or after /
RewriteRule ^(.*/|)[\s%20]+(.+)$ $1$2 [L]
# remove spaces from end or before /
RewriteRule ^(.+?)[\s%20]+(/.*|)$ $1$2 [L]
# replace spaces by - in between
RewriteRule ^([^\s%20]*)(?:\s|%20)+(.*)$ $1-$2 [L,R]
PS: Must add that you need to fix the source of these URLs also because it is really not normal to be getting URLs like this.

This is works fine
<IfModule pagespeed_module>
ModPagespeed on
ModPagespeedEnableFilters collapse_whitespace,remove_comments
</IfModule>

Related

How to search and replace url using .htaccess (regex)

Below is the code from my .htaccess file:
RewriteEngine On
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} ^www\. [NC]
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://%1%{REQUEST_URI} [L,NE,R=301]
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
# php -- BEGIN cPanel-generated handler, do not edit
# Set the “ea-php74” package as the default “PHP” programming language.
<IfModule mime_module>
AddHandler application/x-httpd-ea-php74 .php .php7 .phtml
</IfModule>
# php -- END cPanel-generated handler, do not edit
Below is what I want to search and replace:
https://example.com/video-17i8mp51/27628401/0/ok-ask-me-right
to
https://example.com/video-17i8mp51/ok-ask-me-right
https://example.com/search/full+movie?top&id=57448561
to
https://example.com/search/full+movie
This URL is in over 10k of my site content's
https://anothersiteurl.com/search/full+movie
to
https://mysiteurl.com/search/full+movie
I'm assuming these are static one-to-one redirects, as seemingly confirmed in comments.
Both the following rules should go after the first rule (the canonical HTTP to HTTPS and www to non-www redirect) and before the front-controller pattern.
https://example.com/video-17i8mp51/27628401/0/ok-ask-me-right
to
https://example.com/video-17i8mp51/ok-ask-me-right
RewriteRule ^(video-17i8mp51)/27628401/0/(ok-ask-me-right)$ /$1/$2 [R=302,L]
Where the $1 and $2 backreferences contain the captured subgroups from the RewriteRule pattern, ie. video-17i8mp51 and ok-ask-me-right respectively. This simply saves repetition in the RewriteRule substitution string.
https://example.com/search/full+movie?top&id=57448561
to
https://example.com/search/full+movie
RewriteCond %{QUERY_STRING} ^top&id=57448561$
RewriteRule ^search/full\+movie$ /$0 [QSD,R=302,L]
The $0 backreference contains the full match of the RewriteRule pattern (ie. search/full_movie). Note that the literal + needs to be backslash escaped in the regex to negate it's special meaning in the regex.
The QSD (Query String Discard) flag removes the original query string from the redirect response.
You should not repeat the RewriteEngine directive.
Note that these are currently 302 (temporary) redirects. If these are intended to be permanent then change to 301 but only after you have tested that they work as intended, to avoid potential caching issues.
This url is in over 10k of my site content's
https://anothersiteurl.com/search/full+movie
to
https://mysiteurl.com/search/full+movie
This is not something you should be trying to do with .htaccess. If this URL appears in the site "content" then you need to modify the content of your pages before sending the response.
(Technically, you can use mod_substitute to do this - to modify the response body - but really that would be a last resort.)
Aside: The RewriteBase directive is not being used here and can therefore be removed.
Summary
Your resulting .htaccess file would then look like this:
RewriteEngine On
# Canonical redirect (HTTP to HTTPS and www to non-www)
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} ^www\. [NC]
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://%1%{REQUEST_URI} [L,NE,R=301]
# Point#1
RewriteRule ^(video-17i8mp51)/27628401/0/(ok-ask-me-right)$ /$1/$2 [R=302,L]
# Point#2
RewriteCond %{QUERY_STRING} ^top&id=57448561$
RewriteRule ^search/full\+movie$ /$0 [QSD,R=302,L]
# Front-controller pattern
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
# php -- BEGIN cPanel-generated handler, do not edit
# Set the “ea-php74” package as the default “PHP” programming language.
<IfModule mime_module>
AddHandler application/x-httpd-ea-php74 .php .php7 .phtml
</IfModule>
# php -- END cPanel-generated handler, do not edit
I was able to meet all three of your criteria with the following rules:
RewriteEngine On
RewriteBase /
# First request:
# Convert https://example.com/video-17i8mp51/27628401/0/ok-ask-me-right to
# https://example.com/video-17i8mp51/ok-ask-me-right
RewriteRule ^(video-[^/]+)/.+/(.+)/?$ $1/$2 [L]
# Second request:
# Convert https://example.com/search/full+movie?top&id=57448561 to
# https://example.com/search/full+movie
RewriteRule ^ %{REQUEST_URI}?
# Third request:
# Convert https://anothersiteurl.com/search/full+movie to
# https://mysiteurl.com/search/full+movie
RewriteRule ^(.*)$ https://mysiteurl.com/$1 [R=301,L]
You can see them in action here.

.htaccess rules resulting in 404 error for /page/ if /page.[ext] is present

The problem: the presence of an identical URL to /page/, but with some file extension, i.e., /page.xml, results in a 404 for /page/.
So for example, my HTML sitemap, example.com/sitemap will 404 if example.com/sitemap.xml is present.
The .htaccess file of my Wordpress site contains rewrite conditionals that, as expected, appends a trailing slash to pages in the form of example.com/page so they are rewritten as example.com/page/.
.htaccess as follows:
RewriteEngine On
RewriteCond %{HTTPS} !=on
RewriteRule .* https://example.com%{REQUEST_URI} [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)([^/])$ /$1$2/ [L,R=301]
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
# BEGIN MainWP
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^wp-content/plugins/mainwp-child/(.*)$ /wp-content/plugins/THIS_PLUGIN_DOES_NOT_EXIST [QSA,L]
</IfModule>
# END MainWP
So after some digging, I found the solution, which was to simply disable Multiviews in my .htaccess file, like so:
Options -MultiViews

RegEx to find whole word, but not abbreviation

I am using RegEx in my .htaccess file to determine what URIs get sent to my router file. I have a problem though because one page that I need to route contains a string that I'm filtering out, causing that URI not to be sent to the router. I don't want the URIs with "adm" in them to be sent to the router, but this also means that it filters out URIs with strings like "admonish" or "administrate".
.htaccess:
<IfModule mod_rewrite.c>
Options +FollowSymlinks
# Options +SymLinksIfOwnerMatch
RewriteEngine On
RewriteBase /
RewriteCond %{HTTPS} off
#RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI}
</IfModule>
<IfModule mod_rewrite.c>
RewriteCond %{HTTPS} !=on
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^http://%1%{REQUEST_URI} [R=301,L]
</IfModule>
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule !(^adm|^ajax|^google([a-z0-9])|^tools|^swf|^confirm|^style) index.php [nc]
I've tried things like RewriteRule !(^adm(![in])|^ajax|^google([a-z0-9])|^tools|^swf|^confirm|^style) index.php [nc] and RewriteRule !(^adm(!in)|^ajax|^google([a-z0-9])|^tools|^swf|^confirm|^style) index.php [nc], but with no success.
What is the correct way to match a portion of a word if it is not followed by characters other than "/"?
EDIT - This is the current Rewrite as suggested:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule !(^(?i)\badm(?=[a-z])|^ajax|^google([a-z0-9])|^tools|^swf|^confirm|^style) index.php [nc]
Still no luck with this, though.
UPDATE - Full .htaccess file:
DirectoryIndex index.php
<IfModule mod_rewrite.c>
Options +FollowSymlinks
# Options +SymLinksIfOwnerMatch
RewriteEngine On
RewriteBase /
RewriteCond %{HTTPS} off
#RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI}
</IfModule>
<IfModule mod_rewrite.c>
RewriteCond %{HTTPS} !=on
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^http://%1%{REQUEST_URI} [R=301,L]
</IfModule>
RewriteCond %{REQUEST_URI} !/(adm|ajax|google([a-z0-9])|tools|swf|confirm|style) [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . index.php [L]
# Rewrite requests for sitemap.xml
RewriteRule sitemap.xml$ sitemap.php?target=google [L]
# Rewrite requests for urllist.txt
RewriteRule urllist.txt$ sitemap.php?target=yahoo [L]
Options -MultiViews
# ----------------------------------------------------------------------
# Custom 404 page
# ----------------------------------------------------------------------
# You can add custom pages to handle 500 or 403 pretty easily, if you like.
# If you are hosting your site in subdirectory, adjust this accordingly
# e.g. ErrorDocument 404 /subdir/404.html
ErrorDocument 400 /error.php?e=400
ErrorDocument 401 /error.php?e=401
ErrorDocument 403 /error.php?e=403
ErrorDocument 404 /error.php?e=404
ErrorDocument 500 /error.php?e=500
# ----------------------------------------------------------------------
# UTF-8 encoding
# ----------------------------------------------------------------------
# Use UTF-8 encoding for anything served text/plain or text/html
AddDefaultCharset utf-8
# Force UTF-8 for a number of file formats
AddCharset utf-8 .atom .css .js .json .rss .vtt .xml
# ----------------------------------------------------------------------
# A little more security
# ----------------------------------------------------------------------
# To avoid displaying the exact version number of Apache being used, add the
# following to httpd.conf (it will not work in .htaccess):
# ServerTokens Prod
# "-Indexes" will have Apache block users from browsing folders without a
# default document Usually you should leave this activated, because you
# shouldn't allow everybody to surf through every folder on your server (which
# includes rather private places like CMS system folders).
<IfModule mod_autoindex.c>
Options -Indexes
</IfModule>
# Block access to "hidden" directories or files whose names begin with a
# period. This includes directories used by version control systems such as
# Subversion or Git.
<IfModule mod_rewrite.c>
RewriteCond %{SCRIPT_FILENAME} -d [OR]
RewriteCond %{SCRIPT_FILENAME} -f
RewriteRule "(^|/)\." - [F]
</IfModule>
# Block access to backup and source files. These files may be left by some
# text/html editors and pose a great security danger, when anyone can access
# them.
<FilesMatch "(\.(bak|config|sql|fla|psd|ini|log|sh|inc|swp|dist)|~)$">
Order allow,deny
Deny from all
Satisfy All
</FilesMatch>
# Increase cookie security
<IfModule php5_module>
php_value session.cookie_httponly true
php_value error_log /logs/php_errors.log
</IfModule>
# prevent access to PHP error log
<Files php_errors.log>
Order allow,deny
Deny from all
Satisfy All
</Files>
EDIT AGAIN:
I have also tried:
RewriteCond %{REQUEST_URI} !((adm[^/]+)/|ajax|google([a-z0-9])|tools|swf|confirm|style) [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . index.php [L,NC]
RewriteCond %{REQUEST_URI} !/((.*)/adm/(.*)|ajax|google([a-z0-9])|tools|swf|confirm|style) [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . index.php [L,NC]
Negative Lookahead
If I'm understanding correctly, the basic pattern you're looking for (with possible refinements) is:
adm(?![a-z])
(?![a-z]) is a lookahead that ensures that the following character is not a letter.
In mod-rewrite, you can make this case-insensitive with (?i)adm(?![a-z])
You can just add one more negative RewriteCond here to skip /adm/ URI from this rewrite:
RewriteCond %{REQUEST_URI} !/(adm|ajax|google([a-z0-9])|tools|swf|confirm|style) [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule !adm index.php [L,NC]
How about doing the opposite?
If it contains "/adm/" (including "slash") then stop
Otherwise redirect all to index.php
Like that:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.*)/adm/(.*) - [QSA,L]
RewriteRule (.*) index.php [QSA,L]

MOD_REWRITE is not behaving as expected

This doesn't work:
RewriteRule ^([^/]+)([/]?)$ /index.cgi?l=$1 [NC,L]
This doesn't work:
RewriteRule ^([^/]+)/?$ /index.cgi?l=$1 [NC,L]
There's no other rules in the .htaccess file. Here's the complete version:
Options -Indexes
Options ExecCGI
AddHandler cgi-script .cgi .pl .q
ErrorDocument 500 /error500.cgi
ErrorDocument 404 /error404.cgi
Options +FollowSymlinks
RewriteEngine On
RewriteBase /
RewriteRule ^([^/]+)/?$ /index.cgi?l=$1 [NC,L]
This DOES work, but it's not what I want.
RewriteRule ^([^/]+)/([^/]+)$ /index.cgi?l=$1&a=$2 [NC,L]
I want both the first slash and second directory to be optional. Why won't the question mark match 0 or 1 instances like it's supposed to? I am freaking here...
By using a $, you're specifying that it's the end of the text, so that won't match anything after a /. (in regex, ^ specifies the beginning of a string and $ specifies the end)
You could remove the $, then it will make the second parameter optional - that sounds like what you're looking for.
Try this to see if your mod_rewrite is working correctly:
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [R,L]

HTACCESS redirection with a word replacement in url

I'm having trouble with this reg expression which i belive is correct, but it is not working.
What im trying to do is redirect bunch of urls containing a specific string like this:
http://www.example.com/**undesired-string**_another-string to http://www.example.com/**new-string**_another-string
and
http://www.example.com/folder/**undesired-string**/another-string to http://www.example.com/folder/**new-string**/another-string
So i have this code in the .htaccess:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule (.+)+(undesired-string)+(.+) $1new-string$2 [R=301,L]
</IfModule>
This should replace ANY undesired-string in any url to new-string, but it is not working, any idea why ?
Thank you
Marwen: Try this:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^(.*)undesired-string(.*)$ yoursite.com/$1new-string$2 [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?/$1 [L]
RewriteCond %{HTTP_HOST} !^www.yoursite.com$ [NC]
RewriteRule ^(.*)$ yoursite.com//$1 [L,R=301]
</IfModule>
In your 'updated' code in the comments above, you had it applying the rewrite condition to the undesired-string... So if the actual file or directory was valid it would not rewrite...
Doing this though will always rewrite the undesired-string with new-string - even if its a file name... If that is fine or what you want then all you had to do was move your rewrite conditions to below the rewrite rule...
also.. Just an fyi.. If everything is on yoursite.com you dont need to list yoursite.com
i.e.
yoursite.com/$1new-string$2
just needs to be
/$1new-string$2
which does the same thing: rewrites to the base directory of yoursite.com
now if they are going from mysite.com to yoursite.com then you woulud want to include the domain name because you are redirecting across domain names
Edit: You may also want to use:
[QSA,L,R=301]
instead of just [L,R=301]
Your regex is not really correct. Try:
RewriteRule ^(.*)undesired-string(.*)$ $1new-string$2 [R=301,L]
Or if this doesn't work, try:
RewriteRule ^(.*)undesired-string(.*)$ http://yoursite.com/$1new-string$2 [R=301,L]
Explanation:
^ marks the beginning; $ marks the end; the first (..) goes to $1, the second (..) goes to $2 and so on; * is 0 or more chars; + is 1 or more chars.
To answer my own question. Laravel already redirects the trailing slashes. Problem was that Laravel was installed into a sub-directory. I added the location of the sub-directory to the redirect. My location in this case is: "/lumen/public/". See the fixed htaccess below.
<IfModule mod_rewrite.c>
<IfModule mod_negotiation.c>
Options -MultiViews
</IfModule>
RewriteEngine On
# Redirect Trailing Slashes If Not A Folder...
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /lumen/public/$1 [L,R=301]
# Handle Front Controller...
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^ index.php [L]
</IfModule>