Exclude a slash in capture group of regex - regex

i have string, which has the value like
"urls":[
{
"url":"https:\/\/t.co\/OjiDUThEvK",
"expanded_url":"http:(escape sequence slash)/(escape sequence slash)/fb.me\/7Wnh0hMLL",
"display_url":"fb.me(escape sequence slash)/7Wnh0hMLL",
"indices":[48,71]}],
"user_mentions":[],
"symbols":[]
}
]
i need to capture only "expanded url" i tried the following regex:
"expanded_url"\:\"http\:\\\/\\\/(.*?)\"
this gave a result :
"fb.me(escape sequence slash)/7Wnh0hMLL"
but i want to exclude the escape sequence slash in the URL, is it possible to achieve the same, kindly let me know the changes to me made to the regex

I'm not 100% sure if this is what you're after. Can you post the raw input without the "(escape sequence slash)" part I'm assuming that this is actually / in the text you're matching against.
match:
\"expanded_url\":\"http:\\\/\\\/([^\\]*)\\\/([^\\"]*)\"
replace with:
$1/$2

Related

Regex URI portion: Remove hyphens

I have to split URIs on the second portion:
/directory/this-part/blah
The issue I'm facing is that I have 2 URIs which logically need to be one
/directory/house-&-home/blah
/directory/house-%26-home/blah
This comes back as:
house-&-home and house-%26-home
So logically I need a regex to retrieve the second portion but also remove everything between the hyphens.
I have this, so far:
/[^(/;\?)]*/([^(/;\?)]*).*
(?<=directory\/)(.+?)(?=\/)
Does this solve your issue? This returns:
house-&-home and house-%26-home
Here is a demo
If you want to get the result:
house--home
then you should use a replace method. Because I am not sure what language you are using, I will give my example in java:
String regex = (?<=directory\/)(.+?)(?=\/);
String str = "/directory/house-&-home/blah"
Pattern.compile(regex).matcher(str).replaceAll("\&", "");
This replace method allows you to replace a certain pattern ( The & symbol ) with nothing ""

RegEx for REST url substitutions

I have an URL like that:
http://www.url.me/en/cats/dogs/potatoes/tomatoes/
I need to replace the first two REST parameters to get a result URL like that:
http://www.url.me/FIRST/cats/dogs/potatoes/tomatoes/
I tried this regex \/([^/]+)\/ but it's not working as expected in CF:
<cfset ret.REDIRECT = reReplace(currentUrl, "\/([^/]+)\/", "FIRST", "all") />
What do you suggest, both for the regex and the cf code?
Thank you.
Firstly, you do not need to escape / in regex. (Sometimes you'll see it escaped, such as in JavaScript regex literals, but that is the JS side being escaped, not the regex.)
However, even with that change it wont do what you want - you'll be replacing every other /-qualified segment instead of just the first one after the host part.
To do what you want, use something like this:
reReplace(CurrentUrl, "^(https?://[^/]+/)[^/]+/", "\1FIRST/")
The ^ anchors the replace to the start of the input.
The (..) part captures the protocol and hostname so they can be re-inserted with \1 in the replacement string.
The final [^/]+/ is what captures the first part of the request uri and replaces it with the FIRST/ in the replacement string.
(You can omit the trailing / if it's not required, or use (?=/) to assert that it is there without needing to put it in the replace side.)

Regex match multiple data in URL for IIS Rewrite rule

I'm looking for some help with a regex pattern for rewriting a URL. My URL structure is:
http://domain.com/[username]/[token]/[userid]/
The data types are:
username = alphanumeric
token = alphanumeric
userid = numeric
An example with data:
http://domain.com/john1975/aBc123/123456789/
Using a regular expressions I'm trying to get a reference for each piece of data, so I can rewrite to:
index.asp?username={R:1}&token={R:2}&userid={R:3}
Also keep in mind the regex shouldn't be too greedy, so I can still access files such as:
http://domain.com/about.asp
http://domain.com/images/logo.png
The regex I've tried is:
^[0-9a-z]+/[0-9a-z]+/[0-9]+$
This doesn't match my example URL.
You're missing the trailing forward slash. The regex should be :
^([0-9a-z]+)/([0-9a-z]+)/([0-9]+)/$
I'm assuming you're flagging it as case insensitive. If not then you need
^([0-9a-zA-Z]+)/([0-9a-zA-Z]+)/([0-9]+)/$
You also need the brackets so you can call your back references, which are also wrong - you want to match on 1,2 and 3, not 0, which is the match of the whole expression. They should read:
index.asp?username={R:1}&token={R:2}&userid={R:3}

Regex lookahead with multiple negative conditions

I am performing a regex on a HTML string to fetch URL's. I want to fetch all href's and src's that are not javascript. From another SO post I have the following pattern:
/(href|src)?\="http:\/\/www\.mydomain\.com\/(?:(?!\.js).)*"/
Which fetches me results like:
src="http://www.mydomain.com/path/to/resource/image.gif" alt="" border="0"
This is good because it is missing the .js results. It's bad because it's fetching additional tags in the element. I tried the following amendment to stop at the first ":
/(href|src)?\="http:\/\/www\.mydomain\.com\/(?:(?!\.js).)[^"]*"/
It works in that it returns href="$url", but it returns results ending in .js. Is there a way to combine a negative lookahead that says:
Match string until it comes across another " - i.e. [^"]*; and
Do not match string if it ends in .js"
Thanks in advance for any help/tips/pointers.
add a "?" to the "*" before the last quote. This will make the "*" non-greedy, ie: it will stop matching at the first quote, not the last
/(href|src)?\="http:\/\/www\.mydomain\.com\/(?:(?!\.js).)*?"/
Here's something a bit different. I used Debuggex with this expression:
(?:src|href)=(?&.quotStr)(?<!\.js")
which compiled it to this one:
$regex = '/(?:src|href)=(?:"((?:\\\\.|[^"\\\\]){0,})")(?<!\\.js")/';
Live Demo
If you only want to reject .js at the end of the string, you can use the following for the last part of the string match:
"(?![^"]*\.js").*?"
per this Rubular
EDIT
See: https://stackoverflow.com/a/18838123/1163653 for a better solution.
Fixed it:
/(href|src)?\="http:\/\/www\.mydomain\.com\/(?:(?!\.js"|").)*"/
Note that the lookahead is checking for any string (after the domain) that doesn't contain .js or ", both of which would cause it to be invalid. It allows hrefs ending in .css through as they only fail when they reach the first ", which is the behaviour needed.

Regex to match anything after /

I'm basically not in the clue about regex but I need a regex statement that will recognise anything after the / in a URL.
Basically, i'm developing a site for someone and a page's URL (Local URL of Course) is say (http://)localhost/sweettemptations/available-sweets. This page is filled with custom post types (It's a WordPress site) which have the URL of (http://)localhost/sweettemptations/sweets/sweet-name.
What I want to do is redirect the URL (http://)localhost/sweettemptations/sweets back to (http://)localhost/sweettemptations/available-sweets which is easy to do, but I also need to redirect any type of sweet back to (http://)localhost/sweettemptations/available-sweets. So say I need to redirect (http://)localhost/sweettemptations/sweets/* back to (http://)localhost/sweettemptations/available-sweets.
If anyone could help by telling me how to write a proper regex statement to match everything after sweets/ in the URL, it would be hugely appreciated.
To do what you ask you need to use groups. In regular expression groups allow you to isolate parts of the whole match.
for example:
input string of: aaaaaaaabbbbcccc
regex: a*(b*)
The parenthesis mark a group in this case it will be group 1 since it is the first in the pattern.
Note: group 0 is implicit and is the complete match.
So the matches in my above case will be:
group 0: aaaaaaaabbbb
group 1: bbbb
In order to achieve what you want with the sweets pattern above, you just need to put a group around the end.
possible solution: /sweets/(.*)
the more precise you are with the pattern before the group the less likely you will have a possible false positive.
If what you really want is to match anything after the last / you can take another approach:
possible other solution: /([^/]*)
The pattern above will find a / with a string of characters that are NOT another / and keep it in group 1. Issue here is that you could match things that do not have sweets in the URL.
Note if you do not mind the / at the beginning then just remove the ( and ) and you do not have to worry about groups.
I like to use http://regexpal.com/ to test my regex.. It will mark in different colors the different matches.
Hope this helps.
I may have misunderstood you requirement in my original post.
if you just want to change any string that matches
(http://)localhost/sweettemptations/sweets/*
into the other one you provided (without adding the part match by your * at the end) I would use a regular expression to match the pattern in the URL but them just blind replace the whole string with the desired one:
(http://)localhost/sweettemptations/available-sweets
So if you want the URL:
http://localhost/sweettemptations/sweets/somethingmore.html
to turn into:
http://localhost/sweettemptations/available-sweets
and not into:
localhost/sweettemptations/available-sweets/somethingmore.html
Then the solution is simpler, no groups required :).
when doing this I would make sure you do not match the "localhost" part. Also I am assuming the (http://) really means an optional http:// in front as (http://) is not a valid protocol prefix.
so if that is what you want then this should match the pattern:
(http://)?[^/]+/sweettemptations/sweets/.*
This regular expression will match the http:// part optionally with a host (be it localhost, an IP or the host name). You could omit the .* at the end if you want.
If that pattern matches just replace the whole URL with the one you want to redirect to.
use this regular expression (?<=://).+