.htaccess: param url to clean url - regex

I am trying to turn an ugly url with parameters into a nice url. At the moment I have:
http://myasite.com/index.php?reg=uk&area=london&id=16
Which I would like to have like so:
http://myasite.com/uk/london/16
I have tried using this .htaccess:
RewriteEngine On
RewriteRule ^/?$/?$ index.php?reg=$1&area=london&id=$2 [L,QSA]
Which I got from an online generator however when I run the page with /uk/16 in the url it just crashes.
What am I doing wrong?
In reply to Chris's reply below. All of these are optional.
Structure of url is like so:
myasite.com
myasite.com/uk (if set, This will always be text and always 2 chars long)
myasite.com/uk/london (if set, This will always be text, this will be any char length )
myasite.com/uk/london/16 (if set, This will always be integer and any char length)

Your regex is incorrect. Your ^/?$/?$ says the request can have 2 /s only, each is optional. You also aren't using any capture groups so $1 and $2 have no context. Here's a regex that would work for your provided example:
^/(uk)/(\d+)$
If uk can be any 2 lowercase letters you could use:
^/([a-z]{2})/(\d+)$
You can use regex101 to see how your regexs will function.
https://regex101.com/r/VyJE9d/1 (your rule)
https://regex101.com/r/VyJE9d/2
The right side of the page gives explanations.
As a rewrite rule:
RewriteEngine On
RewriteRule ^/([a-z]{2})/(\d+)$ index.php?reg=$1&id=$2 [L,QSA]

All you need to use is this in your .htaccess file:
RewriteEngine On
RewriteRule ^([^/]*)/([^/]*)/([^/]*)$ /index.php?reg=$1&area=$2&id=$3 [L]
This will leave you with your desired URL of: http://myasite.com/uk/london/16. Just make sure you clear your cache before testing this.

RewriteEngine On
RewriteRule ^([a-zA-Z0-9_-]+)/([0-9]+)/?$ index.php?reg=$1&id=$2 [L,QSA]
We are rewriting $1/$2/ and $1/$2 to index.php?reg=$1&id=$2

Related

How to write expression to grab all after an expression and then rewrite in htaccess

I'm new to the rewriting of urls and regex in general. I'm trying to rewrite a URL to make it a 'pretty url'
The original URL was
/localhost/house/category.php?cat=lounge&page=1
I want the new url to look like this:
/localhost/house/category?lounge&page=1
(like I say, I'm new so not trying to take it too far at the moment)
the closest I've managed to get it to is this:
RewriteRule ^category/(.*)$ ./category.php?cat=$1 [NC,L]
but that copies the whole URL and creates:
/localhost/house/category/house/category/lounge&page=1
I'm sure, there must be an easy way to say copy all after that expression, but I haven't managed to get there yet.
I will try to help you:
You probably have already, but try a mod rewrite generator and htaccess tester.
From this answer: The query (everything after the ?) is not part of the URL path and cannot be passed through or processed by RewriteRule directive without using [QSA].
I propose using RewriteCond and using %1 instead of $1 for query string matches as opposed to doing it all in RewriteRule.
For your solution, try:
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^house/category$ house/category.php?cat=%1 [NC,L]
This will insert the .php and cat= while retaining the &page=
Anticipating your next step, the below mod rewrite may help get started in converting
http://localhost/house/category/lounge/1
to
http://localhost/house/category.php?cat=lounge&page=1
Only RewriteRule necessary here, no query string:
RewriteRule ^house/category/([^/]*)/([0-9]*)/?$ house/category.php?cat=$1&page=$2 [NC,L]
Use regex101 for more help and detailed description on what these regexes do.
If it still not working, continue to make the regex more lenient until it matches correctly:
Try to remove the ^ in RewriteRule so it becomes
RewriteRule category$ category.php?cat=%1 [NC,L]
Then it will match that page at any directory level. Then add back in house/ and add /? wherever an optional leading/trailing slash may cause a problem, etc.
Thanks for all your suggestions, I took it back to this
RewriteRule category/([^/])/([0-9])/?$ category.php?cat=$1&page=$2 [NC,L]
which has done the trick, and I'll leave it at this for now.

mod_rewrite change a variable content

I am having a little trouble with apache mod_rewrite, I need to be able to modify (append) a variable name to something else depending the regex in another variable in the URL:
https://localhost:85/fight?shoes=baby.firstlove&type=textype&awesome=23481234
By this i mean that if "awesome=" is 234[8,7]1234, shoes=baby.firstlove should become shoes=baby.firstlovefirsttry, OR if awesome=234[1,2]1234, then shoes=baby.firstlove, should become shoes=baby.firstlovesecondtry .
My rewrites rule are something like this (trying to capture awesome=23411234 or awesome=23425678):
RewriteCond %{QUERY_STRING} shoes=(.+)\&awesome=(\b234(1|2)\d{4}\b)
RewriteRule ^(.*)$ http://localhost:85/fight?shoes=baby.firstloveactual&subscriber=%2 [P]
But they are not changing the "shoes=" variable content as expected.
The URL remains the same:
http://localhost:85/fight?shoes=baby.firstlove&type=textype&awesome=23481234
Please what am I doing wrong?
RewriteCond %{QUERY_STRING} shoes=(.*)\&awesome=(\b234(1|2)\d{4}\b)
Your regex does not match your example URL:
https://localhost:85/fight?shoes=baby.firstlove&type=textype&awesome=23481234
You have an 8 where your regex is expecting a 1 or 2. However, if you need to match one of a series of characters then you should use a character class (eg. [12]) rather than a parenthesised/capturing group. Also, I'm not sure what you are trying to do with the word boundaries (ie. \b)? What is the intention of using the P flag? Presumably you need to externally redirect?
But also, your code sample does not seem to match your textual description of the problem?
if "awesome=" is 234[8,7]1234, shoes=baby.firstlove should become shoes=baby.firstlovefirsttry
Try the following (assuming "1234" is the literal string, rather than any 4 digits):
RewriteCond %{QUERY_STRING} shoes=.+&awesome=(234[87]1234)
RewriteRule ^/?fight http://localhost:85/fight?shoes=baby.firstloveactual&subscriber=%1 [R,L]
You've used https in your example URLs, but http in your rewrite?

htaccess regex to find image and image number

I have such a url:
/keyword1/keyword2/slugged-title-8286-1.jpg?wx=292&hx=164
I would like to forward in this case to:
/images/8286-1.jpg?wx=292&hx=164
the listing number (here 8286) can be 4 or 5 digits and could perhaps contain letters. Also the parameters after ? could be different.
Could you please help me to get this solved?
I haven't done a lot with regex and not sure how this can be done.
You can use this rule in your site root .htaccess:
RewriteEngine On
RewriteRule -(\w+(?:-\d+)?\.jpe?g)$ /images/$1 [L,NE,R=302]
If you don't want a full redirect then use:
RewriteRule -(\w+(?:-\d+)?\.jpe?g)$ /images/$1 [L]
QUERY_STRING is automatically carried over to target URL.

Understanding RegEx - SEO Duplication on last term

i have a problem with duplicate pages for SEO on a website i'm trying to fix. www.example.com/category/c1234 loads just the same as www.example.com/category/c1234garbage
I've been reading online and testing the code and so far I narrowed it down to a possible regex problem. I have the following lines
# url rewrites
RewriteCond %{REQUEST_URI} ^/index\.cfm/.+ [NC]
RewriteRule ^/index.cfm/(([^/]+)/?([^/]+)?)/?(.*)? /index.cfm/$4?$2=$3 [NS,NC,QSA,N,E=SESDONE:true]
I added an R in the rule so I could see if it was passing through there and it is and after it passes that the garbage at the end disappears.
Can someone help me understand this and figure out a way to fix it so when you go to www.example.com/category/c1234garbage it redirects to www.example.com/category/c1234
I've been searching online for quite a while now and thought it might be time to post here since I can't seem to find a solution. I'm reading "Mastering Regular Expressions" but it might take take a while for me to find the answers I'm looking for.
I appreciate any help you can give me. Thank you.
EDIT: This is what i have before that
RewriteEngine On
Rewritebase /
# remove trailing index.cfm
RewriteCond %{QUERY_STRING} ^$
RewriteRule ^index.cfm(\?)?$ / [R=301,L]
# remove trailing slash
RewriteCond %{QUERY_STRING} ^$
RewriteRule (.*)/$ /$1 [R=301,L]
# Remove trailing ?
RewriteCond %{THE_REQUEST} \?\ HTTP [NC]
RewriteRule ^/?(index\.cfm)? /? [R=301,L]
# SEF URLs
SetEnv SEF_REQUEST false
RewriteRule ^[a-z\d\-]+/[a-z]\d+/? /index.cfm/$0 [NC,PT,QSA,E=SEF_REQUEST:true]
RequestHeader add SEF-Request %{SEF_REQUEST}e
RewriteCond %{HTTP:SEF_REQUES} ^true$ [NC]
RewriteRule . - [L]
EDIT: I was reading the htaccess again and found this that I don't understand but it might have some connection. It's located at the bottom of the file.
# lowercase the hostname, and set the TLD name to an enviroment variable
RewriteCond ${lowercase:%{SERVER_NAME}|NONE} ^(.+)$
RewriteCond %1 ^[a-z0-9.-]*?[.]{0,1}([a-z0-9-]*?\.[a-z.]{2,6})$
RewriteRule .? - [E=TLDName:%1]
From your description and your code, it sounds like this is the transformation that's happening here:
www.example.com/category/c1234garbage
↓
www.example.com/index.cfm?category=c1234garbage
So the problem, I think, is not your rewriting rules. The problem is how you're handling querystring parameters on the server side. If you have an actual page called index.cfm that's interpreting those parameters, you should tweak the code behind that page to validate them and redirect to /category/c1234 where appropriate.
I think the code in index.cfm is looking at the parameter, checking to see if it starts with something recognizable, and going from there. You need to make it more strict.
Alternatively, you could add another .htaccess rule to parse the c1234garbage part and decide which part is valid, and which part (if any) is garbage. I can't give you a regex for that, though, since I don't know the rules for a valid input in your application.
Edit:
I think I found the problem. This part here:
RewriteRule ^[a-z\d\-]+/[a-z]\d+/? /index.cfm/$0 [NC,PT,QSA,E=SEF_REQUEST:true]
You specify the beginning of the relative URL with ^, but you don't specify that you want it to match all the way to the end. So I think what's happening is that it's taking the part of the string that matches, throwing out everything else, and appending it to /index.cfm/. So it takes only the /category/c1234 part from /category/c1234garbage, because that's the part that matches ^[a-z\d\-]+/[a-z]\d+/?.
You can probably fix this with just a word break:
RewriteRule ^[a-z\d\-]+/[a-z]\d+\b/? /index.cfm/$0 [NC,PT,QSA,E=SEF_REQUEST:true]
If that doesn't work, I'm afraid we've reached the end of my htaccess knowledge. I'm more of a regex guy.
Just BTW, this still seems a little awkward. If I understand this right, part of the URL will still get thrown out if it doeesn't fit your exact pattern. E.g. /category/c1234?abc=123 will lose its querystring parameters. You might want to redesign how your rules are set up.
I partially solved the problem. I added
# Remove garbage from after category
RewriteCond %{REQUEST_URI} [a-z\d\-]+/[a-z]\d+(.+)
RewriteRule ^([a-z\d\-]+/[a-z]\d+)/? $1 [R=301]
on top of the SEF rules. It's doing what i want which is to remove the garbage from the url but it gives me an infinite loop because its redirecting even when the url is clean. Any hints?
EDIT: So i realized that the .+ at the end is matching the numbers as well... How do i change it to match anything other than numbers after the numbers? basically where I have the .+ i need to have a "match any character except for numbers"
EDIT: I finally got it to work with the following code:
# Remove garbage from after category
RewriteCond %{REQUEST_URI} [a-z\d\-]+/[a-z]\d+[A-Za-z-.]+
RewriteRule ^([a-z\d\-]+/[a-z]\d+)/? $1 [R=301]
The (.+) i was using previously was reading the 2nd number (c1234)as being part of the . so it would always pass the the condition as true unless it was something like c1

mod_rewrite regexp

I'm working on some rewrite rules, and for some reason a regexp I'm not expecting to pass (and does pass not on any of my regexp testers) is passing in mod_rewrite.
The URL in question is:
http://url.com/api/projects.json?division=aa
And the rewrite rule is:
RewriteEngine On
RewriteBase /
RewriteRule ^api\/([^.?#/%\s]+)\.([^#?\s]+)$ api.php?type=$1&format=$2 [NC,L]
Because the second capture is immediately followed by $ I'd expect that URL to fail because of the query string, but it seems to accept just fine and pass the two parameters to GET.
Any thoughts?
Note: Query String
The Pattern will not be matched
against the query string. Instead, you
must use a RewriteCond with the
%{QUERY_STRING} variable.
Snip from the bottom of the docs