RegEx to match text and enclosing braces - regex

I need to loop through all the matches in say the following string:
<a href='/Product/Show/{ProductRowID}'>{ProductName}</a>
I am looking to capture the values in the {} including them, so I want {ProductRowID} and {ProductName}
Here is my code so far:
Dim r As Regex = New Regex("{\w*}", RegexOptions.IgnoreCase)
Dim m As Match = r.Match("<a href='/Product/Show/{ProductRowID}'>{ProductName}</a>")
Is my RegEx pattern correct? How do I loop through the matched values? I feel like this should be super easy but I have been stumped on this this morning!

Your Pattern is missing a small detail:
\{\w*?\}
Curly braces must be escaped, and you want the non-greedy star, or your first (and only) match will be this: "{ProductRowID}'>{ProductName}".
Dim r As Regex = New Regex("\{\w*?\}")
Dim input As String = "<a href='/Product/Show/{ProductRowID}'>{ProductName}</a>"
Dim mc As MatchCollection = Regex.Matches(input, r)
For Each m As Match In mc
MsgBox.Show(m.ToString())
Next m
RegexOptions.IgnoreCase is not needed, because this particular regex is not case sensitive anyway.

You can just group your matches using a regex like the following:
<a href='/Product/Show/(.+)'\>(.+)</a>
In this way you have $1 and $2 matching the values you want to get.
You an also give your matches names so that they aren't anonymous/position oriented for retrieval:
<a href='/Product/Show/(?<rowid>.+)'\>(?<name>.+)</a>

Change your RegEx pattern to \{\w*\} then it will match as you expect.
You can test it with an online .net RegEx tester.

Related

C# Regex not matching string spanning multiple line

In c# I have a regex and I can't get my head around why it is not matching.
The pattern (abc\r\n)* should match the abc\r\nabc\r\n in the string 123\r\nabc\r\nabc\r\n345
Regex regex = new Regex("(abc\r\n)*", RegexOptions.Compiled);
var mat = regex.Match("123\r\nabc\r\nabc\r\n345");
The funny thing is that mat.Success returns true.
The same pattern matches online
The Match method works as expected.
Actually the pattern (abc\r\n)* will find 12 matches. The Match method returns to you the first match only which is an empty string.
So that if you are looking to match abc\r\nabc\r\n exactly you should use this pattern:
Regex regex = new Regex("(abc\r\nabc\r\n)", RegexOptions.Compiled);
if you like to match all abc\r\n you should use:
Regex regex = new Regex("(abc\r\n)", RegexOptions.Compiled);
var mat = regex.Matches("123\r\nabc\r\nabc\r\n345");
And so on, bottom line is that the problem is in the pattern itself.

Regex ignore optional HTML tags inside captured group

I just wanted to start off by saying i am a VB.Net user and i know all the concerns regarding HTML and Regular Expressions. This is simply for my own learning so please don't suggest alternative ways.
Now for the HTML
<td class="alt1 username">Stack
<td class="alt1 username"><font color="#FF0000"><strong>Overflow</strong></font>
Now you can see the optional font and string tags. My current pattern will capture the first example fine but also the optional tags. I know why my pattern fails just unsure how to include optional tags. Maybe it's not possible?
(?<=).+?(?=)
Thanks as always
Use this in case-insensitive mode:
[^<>]+(?=(?:\s*</(?!a>)[^>]*>)*\s*</a>)
See the matches in the regex demo.
To get all the matches in VB.NET:
Dim ResultList As StringCollection = New StringCollection()
Try
Dim RegexObj As New Regex("[^<>]+(?=(?:</(?!a>)[^>]*>)*</a>)", RegexOptions.IgnoreCase)
Dim MatchResult As Match = RegexObj.Match(SubjectString)
While MatchResult.Success
ResultList.Add(MatchResult.Value)
MatchResult = MatchResult.NextMatch()
End While
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try
Explanation
[^<>]+ matches chars that are neither < nor > (this is your match)
the lookahead (?=(?:</(?!a>)[^>]*>)*</a>) asserts that what follows is...
(?:</(?!a>)[^>]*>)* zero or more of tags that are not </a>, i.e. </ not followed by a>, then non-> chars, then >
then the closing </a>
Extended Spec
If you want the regex to only match when the class username is present, use this instead:
(?<=<td class="[^"]*username"><a(?:(?!</a).)+)[^<>]+(?=(?:\s*</(?!a>)[^>]*>)*\s*</a>)

Regex match a specific string

I am trying to extract the string <Num> from within Barcode(_<Num>_).PDF using Regex. I am looking at Regular Expression Language - Quick Reference but it is not easy. Thanks for any help.
Dim pattern As String = "^Barcode(_+_)\.pdf"
If Regex.IsMatch("Barcode(_abc123_).pdf", pattern) Then
Debug.Print("match")
End If
If you are trying to not only match but also READ the value of into a variable, then you will need to call the Regex.Match method instead of simply calling the boolean isMatch method. The Match method will return a Match object that will let you get to the groups and captures from your pattern.
Your pattern would need be something like "Barcode\(_(.*)_\)\.pdf"-- note the inner parenthesis which will create a capture group for you to obtain the value of the string between the underscores.. See a MSDN docs for examples of almost exactly what you are doing.
I don't know the regex in VB, but I can offer you a website to examine the correctness of your regex: Regex Tester. In this case, if the <Num> is numbers, you can use "Barcode(_\d+_).pdf"
Just for the record, this is what I ended up using:
'set up regex
'I'm using + instead of * in the pattern to ensure that if no value is
'present the match will fail
Dim pattern As String = "Barcode\(_(.+)_\)\.pdf"
Dim r As Regex = New Regex(pattern, RegexOptions.IgnoreCase)
'get match
Dim mat As Match
mat = r.Match("Barcode(_abc123_).pdf")
'output the matched string
If mat.Success Then
Dim g As Group = mat.Groups(1)
Dim cc As CaptureCollection = g.Captures
Dim c As Capture = cc(0)
Debug.Print(c.ToString)
End If
.NET Framework Regular Expressions

Regular expression to find specific text within a string enclosed in two strings, but not the entire string

I have this type of text:
string1_dog_bit_johny_bit_string2
string1_cat_bit_johny_bit_string2
string1_crocodile_bit_johny_bit_string2
string3_crocodile_bit_johny_bit_string4
string4_crocodile_bit_johny_bit_string5
I want to find all occurrences of “bit” that occur only between string1 and string2. How do I do this with regex?
I found the question Regex Match all characters between two strings, but the regex there matches the entire string between string1 and string2, whereas I want to match just parts of that string.
I am doing a global replacement in Notepad++. I just need regex, code will not work.
Thank you in advance.
Roman
If I understand correctly here a code to do what you want
var intput = new List<string>
{
"string1_dog_bit_johny_bit_string2",
"string1_cat_bit_johny_bit_string2",
"string1_crocodile_bit_johny_bit_string2",
"string3_crocodile_bit_johny_bit_string4",
"string4_crocodile_bit_johny_bit_string5"
};
Regex regex = new Regex(#"(?<bitGroup>bit)");
var allMatches = new List<string>();
foreach (var str in intput)
{
if (str.StartsWith("string1") && str.EndsWith("string2"))
{
var matchCollection = regex.Matches(str);
allMatches.AddRange(matchCollection.Cast<Match>().Select(match => match.Groups["bitGroup"].Value));
}
}
Console.WriteLine("All matches {0}", allMatches.Count);
This regex will do the job:
^string1_(?:.*(bit))+.*_string2$
^ means the start of the text (or line if you use the m option like so: /<regex>/m )
$ means the end of the text
. means any character
* means the previous character/expression is repeated 0 or more times
(?:<stuff>) means a non-capturing group (<stuff> won't be captured as a result of the matching)
You could use ^string1_(.*(bit).*)*_string2$ if you don't care about performance or don't have large/many strings to check. The outer parenthesis allow multiple occurences of "bit".
If you provide us with the language you want to use, we could give more specific solutions.
edit: As you added that you're trying a replacement in Notepad++ I propose the following:
Use (?<=string1_)(.*)bit(.*)(?=_string2) as regex and $1xyz$2 as replacement pattern (replace xyz with your string). Then perform an "replace all" operation until N++ doesn't find any more matches. The problem here is that this regex will only match 1 bit per line per iteration - and therefore needs to be applied repeatedly.
Btw. even if a regexp matches the whole line, you can still only replace parts of it using capturing groups.
You can use the regex:
(?:string1|\G)(?:(?!string2).)*?\Kbit
regex101 demo. Tried it on notepad++ as well and it's working.
There're description in the demo site, but if you want more explanations, let me know and I'll elaborate!

RegEx pattern to extract URLs

I have to extract all there is between this caracters:
<a href="/url?q=(text to extract whatever it is)&amp
I tried this pattern, but it's not working for me:
/(?<=url\?q=).*?(?=&amp)/
I'm programming in Vb.net, this is the code, but I think that the problem is that the pattern is wrong:
Dim matches As MatchCollection
matches = regex.Matches(TextBox1.Text)
For Each Match As Match In matches
listbox1.items.add(Match.Value)
Next
Could you help me please?
Your regex is seemed to be correct except the slash(/) in the beginning and ending of expression, remove it:
Dim regex = New Regex("(?<=url\?q=).*?(?=&amp)")
and it should work.
Some utilities and most languages use / (forward slash) to start and end (de-limit or contain) the search expression others may use single quotes. With System.Text.RegularExpressions.Regex you don't need it.
This regex code below will extract all urls from your text (or any other):
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?