Regex to extract second word from URL - regex

I want to extract a second word from my url.
Examples:
/search/acid/all - extract acid
/filter/ion/all/sss - extract ion
I tried to some of the ways
/.*/(.*?)/
but no luck.

A couple things:
The forward slashes / have to be escaped like this \/
The (.*?) will match the least amount of any character, including zero characters. In this case it will always match with an empty string.
The .* will take as many characters as it can, including forward slashes
A simple solution will be:
/.+?\/(.*?)\//
Update:
Since you are using JavaScript, try the following code:
var url = "/search/acid/all";
var regex = /.+?\/(.*?)\//g;
var match = regex.exec(url);
console.log(match[1]);
The variable match is a list. The first element of that list is a full match (everything that was matched), you can just ignore that, since you are interested in the specific group we wanted to match (the thing we put in parenthesis in the regex).
You can see the working code here

This regex will do the trick:
(?:[^\/]*.)\/([^\/]*)\/
Proof.

For me, I had difficulties with the above answers for URL without an ending forward slash:
/search/acid/all/ /* works */
/search/acid /* doesn't work */
To extract the second word from both urls, what worked for me is
var url = "/search/acid";
var regex = /(?:[^\/]*.)\/([^\/]*)/g;
var match = regex.exec(url);
console.log(match[1]);

Related

Regex match last substring among same substrings in the string

For example we have a string:
asd/asd/asd/asd/1#s_
I need to match this part: /asd/1#s_ or asd/1#s_
How is it possible to do with plain regex?
I've tried negative lookahead like this
But it didn't work
\/(?:.(?!\/))?(asd)(\/(([\W\d\w]){1,})|)$
it matches this '/asd/asd/asd/asd/asd/asd/1#s_'
from this 'prefix/asd/asd/asd/asd/asd/asd/1#s_'
and I need to match '/asd/1#s_' without all preceding /asd/'s
Match should work with plain regex
Without any helper functions of any programming language
https://regexr.com/
I use this site to check if regex matches or not
here's the possible strings:
prefix/asd/asd/asd/1#s
prefix/asd/asd/asd/1s#
prefix/asd/asd/asd/s1#
prefix/asd/asd/asd/s#1
prefix/asd/asd/asd/#1s
prefix/asd/asd/asd/#s1
and asd part could be replaced with any word like
prefix/a1sd/a1sd/a1sd/1#s
prefix/a1sd/a1sd/a1sd/1s#
...
So I need to match last repeating part with everything to the right
And everything to the right could be character, not character, digit, in any order
A more complicated string example:
prefix/a1sd/a1sd/a1sd/1s#/ds/dsse/a1sd/22$$#!/123/321/asd
this should match that part:
/a1sd/22$$#!/123/321/asd
Try this one. This works in python.
import re
reg = re.compile(r"\/[a-z]{1,}\/\d+[#a-z_]{1,}")
s = "asd/asd/asd/asd/1#s_"
print(reg.findall(s))
# ['/asd/1#s_']
Update:
Since the question lacks clarity, this only works with the given order and hence, I suppose any other combination simply fails.
Edits:
New Regex
reg = r"\/\w+(\/\w*\d+\W*)*(\/\d+\w*\W*)*(\/\d+\W*\w*)*(\/\w*\W*\d+)*(\/\W*\d+\w*)*(\/\W*\w*\d+)*$"

Regex wrapping word

Regex example
How can I exclude the first space in every match?
The same regex: (?:^|\W)#(\w+)(?!\w)
Is this what you're looking for?
http://regexr.com/3ca98
From the information you gave us until now, this regex should also be sufficient: #(\w+)(?!\w).
But maybe there's more to it than we know. What did you want to achieve with the (?:^|\W)?
Edit: Thinking about what you probably want to achieve, it occured to me that you might only match your pattern if it's not in the middle of another word (e.g. test#case). You probably don't want to match this.
To exclude such cases, you have to asure that there's some kind of whitespace character in front of it, or in other words: nothing else but whitespace characters or nothing.
I assume you use javascript because regexr.com does and sadly, there is no regex lookbehind available in javascripts regex implementation. So there is no real option to make sure there is only nothing or whitespace in front of your pattern.
One solution would be to work with capture groups. Take this regex:
(?:^|\s+)(#\w+)
It searches for one or more whitespace characters or linestarts in front of your pattern but doesn't use a capture group for that. Then your pattern is up and it's the first capture group in the whole expression.
To use this in javascript now, you need to instantiate a RegExp object and use its function exec until there are no more matches and save the first capture group to a result array.
JS code:
var txt = text.innerHTML;
var re = /(?:^|\s+)(#\w+)/g;
var res = [];
var tmpresult = [];
while ((tmpresult = re.exec(txt)) !== null) {
res.push(tmpresult[1]); // push first capture group to result stack
}
result.innerHTML = JSON.stringify(res, null, 2);
JSFiddle: https://jsfiddle.net/j41tw4hm/1/
Updated regexr.com: http://regexr.com/3ca9n

Regex: Negative lookahead after list match

Consider the following input string (part of css file):
url('data:image/png;base64,iVBORw0KGgoAAAAN...');
url(example.png);
The objective is to take the url part using regex and do something with it. So the first part is easy:
url\(['"]?(.+?)['"]?\)
Basically, it takes contents from inside url(...) with optional quotes symbols. Using this regexp I get the following matches:
data:image/png;base64,iVBORw0KGgoAAAAN...
example.png
So far so good. Now I want to exclude the urls which include 'data:image' in their text. I think negative lookahead is the proper tool for that but using it like this:
url\(['"]?(?!data:image)(.+?)['"]?\)
gives me the following result for the first url:
'data:image/png;base64,iVBORw0KGgoAAAAN...
Not only it doesn't exclude this match, but the matched string itself now includes quote character at the beginning. If I use + instead of first ? like this:
url\(['"]+(?!data:image)(.+?)['"]?\)
it works as expected, url is not matched. But this doesn't allow the optional quote in url (since + is 1 or more). How should I change the regex to exclude given url?
You can use negative lookahead like this:
url\((['"]?)((?:(?!data:image).)+?)\1?\)
RegEx Demo

Regular expression to find specific text within a string enclosed in two strings, but not the entire string

I have this type of text:
string1_dog_bit_johny_bit_string2
string1_cat_bit_johny_bit_string2
string1_crocodile_bit_johny_bit_string2
string3_crocodile_bit_johny_bit_string4
string4_crocodile_bit_johny_bit_string5
I want to find all occurrences of “bit” that occur only between string1 and string2. How do I do this with regex?
I found the question Regex Match all characters between two strings, but the regex there matches the entire string between string1 and string2, whereas I want to match just parts of that string.
I am doing a global replacement in Notepad++. I just need regex, code will not work.
Thank you in advance.
Roman
If I understand correctly here a code to do what you want
var intput = new List<string>
{
"string1_dog_bit_johny_bit_string2",
"string1_cat_bit_johny_bit_string2",
"string1_crocodile_bit_johny_bit_string2",
"string3_crocodile_bit_johny_bit_string4",
"string4_crocodile_bit_johny_bit_string5"
};
Regex regex = new Regex(#"(?<bitGroup>bit)");
var allMatches = new List<string>();
foreach (var str in intput)
{
if (str.StartsWith("string1") && str.EndsWith("string2"))
{
var matchCollection = regex.Matches(str);
allMatches.AddRange(matchCollection.Cast<Match>().Select(match => match.Groups["bitGroup"].Value));
}
}
Console.WriteLine("All matches {0}", allMatches.Count);
This regex will do the job:
^string1_(?:.*(bit))+.*_string2$
^ means the start of the text (or line if you use the m option like so: /<regex>/m )
$ means the end of the text
. means any character
* means the previous character/expression is repeated 0 or more times
(?:<stuff>) means a non-capturing group (<stuff> won't be captured as a result of the matching)
You could use ^string1_(.*(bit).*)*_string2$ if you don't care about performance or don't have large/many strings to check. The outer parenthesis allow multiple occurences of "bit".
If you provide us with the language you want to use, we could give more specific solutions.
edit: As you added that you're trying a replacement in Notepad++ I propose the following:
Use (?<=string1_)(.*)bit(.*)(?=_string2) as regex and $1xyz$2 as replacement pattern (replace xyz with your string). Then perform an "replace all" operation until N++ doesn't find any more matches. The problem here is that this regex will only match 1 bit per line per iteration - and therefore needs to be applied repeatedly.
Btw. even if a regexp matches the whole line, you can still only replace parts of it using capturing groups.
You can use the regex:
(?:string1|\G)(?:(?!string2).)*?\Kbit
regex101 demo. Tried it on notepad++ as well and it's working.
There're description in the demo site, but if you want more explanations, let me know and I'll elaborate!

ColdFusion - How to get only the URL's in this block of text?

How can I extract only the URL's from the given block of text?
background(http://w1.sndcdn.com/f15ikDS9X_m.png)
background-image(http://w1.sndcdn.com/5ikDIlS9X_m.png)
background('http://w1.sndcdn.com/m1kDIl9X_m.png')
background-image('http://w1.sndcdn.com/fm15iIlS9X_m.png')
background("http://w1.sndcdn.com/fm15iklS9X_m.png")
background-image("http://w1.sndcdn.com/m5iIlS9X_m.png")
Perhaps Regex would work, but I'm not advanced enough to work it out!
Many thanks!
Mikey
You're over-thinking the problem - all you need to do is match the URLs, which is a simple match:
rematch('\bhttps?:[^)''"]+',input)
That'll work based on the input provided - might need tweaking if different input used.
(e.g. You can optionally add a \s into the char class if that might be a factor.)
The regex itself is simple:
\bhttps?: ## look for http: or https: with no alphanumeric chars beforehand.
[^)'"]+ ## match characters that are NOT ) or ' or "
## match as many as possible, at least one required.
If this is matching false positives, you can of course look for a more refined URL regex, such as these.
DEMO
background(?:-image)?\((["']?)(?<url>http.*)\1\)
Explanation:
background(?:-image)? -> It matches background or background-image (without grouping)
\( -> matches a literal parentheses
(["']?) -> matches if there is a ' or " or VOID before the url
(?<url>http.*) -> matches the url
\1\) -> matches the grouped (third line of this explanation) and then a literal parentheses
If you want an answer without regular expressions, something like this will work.
YourString = "background(http://w1.sndcdn.com/f15ikDS9X_m.png)";
YourString = ListLast(YourString, "("); // yields http://w1.sndcdn.com/f15ikDS9X_m.png)
YourString = replace(YourString, ")", ""); // http://w1.sndcdn.com/f15ikDS9X_m.png
Since you are doing it more than once, you can make it a function. Also, you might need some other replace commands to handle the quotes that are in some of your strings.
Having said all that, getting a regex to work would be better.