Regular Expressions (regex) in vb.net - regex

Regular Expressions in vb.net 2010
I want to Extract number between font tags from a website in my vb.net form
<html>
....
When asked enter the code: <font color=blue>24006 </font>
....
</html>
The Number is Auto generated
i use:
Dim str As String = New WebClient().DownloadString(("http://www.example.com"))
Dim pattern = "When asked enter the code: <font color=blue>\d{5,}\s</font>"
Dim r = New Regex(pattern, RegexOptions.IgnoreCase)
Dim m As Match = r.Match(str)
If m.Success Then
Label1.Text = "Code" + m.Groups(1).ToString()
m = m.NextMatch()
Else
Debug.Print("Failed")
End If
But got Output:
Code
===========================
Thanks
Sorry for bad english...

something like this should help you. Exception Handling is up to you.
Dim matchCollection As MatchCollection = regex.Matches("When asked enter the code: <font color=blue>24006 </font>","<font color=.*?>(.*?)</font>",ReaderOptions.None)
For Each match As Match In matchCollection
If match.Groups.Count >0 then
Console.WriteLine(match.Groups(1).Value)
end if
Next
or with a bit linq
Dim matchCollection As MatchCollection = regex.Matches("When asked enter the code: <font color=blue>24006 </font>","<font color=.*?>(.*?)</font>",ReaderOptions.None)
For Each match As Match In From match1 As Match In matchCollection Where match1.Groups.Count >0
Console.WriteLine(match.Groups(1).Value)
Next
for more information see VB.NET Regex.Match and VB.NET Regex.Matches

You should not use regex to parse HTML.
Options :
A parser like HTML Agility Pack
The parser exposed in HTMLDocument.GetElementsByTagName
Any other HTML parser

Related

regex .NET to find and replace underscores only if found between > and <

I have a list of strings looking like this:
Title_in_Title_by_-_Mr._John_Doe
and I need to replace the _ with a SPACE from the text between the html"> and </a> ONLY.
so that the result to look like this:
Title in Title by - Mr. John Doe
I've tried to do it in 2 steps:
first isolate that part only with .*html">(.*)<\/a.* & ^.*>(.*)<.* & .*>.*<.* or ^.*>.*<.*
and then do the replace but the return is always unchanged and now I'm stuck.
Any help to accomplish this is much appreciated
How I would do it is to .split it and then .replace it, no need for regex.
Dim line as string = "Title_in_Title_by_-_Mr._John_Doe"
Dim split as string() = line.split(">"c)
Dim correctString as String = split(1).replace("_"c," "c)
Boom done
here is the string.replace article
Though if you had to use regex, this would probably be a better way of doing it
Dim inputString = "Title_in_Title_by_-_Mr._John_Doe"
Dim reg As New Regex("(?<=\>).*?(?=\<)")
Dim correctString = reg.match(inputString).value.replace("_"c, " "c)
Dim line as string = "Title_and_Title_by_-_Mr._John_Doe"
line = Regex.Replace(line, "(?<=\.html"">)[^<>]+(?=</a>)", _
Function (m) m.Value.Replace("_", " "))
This uses a regex with lookarounds to isolate the title, and a MatchEvaluator delegate in the form of a lambda expression to replace the underscores in the title, then it plugs the result back into the string.

Regular expression for extracting Classic ASP include file names

I am searching for a Regular Expression that can help me extract filename.asp from the below string. It seems like a simple task, but I am unable to find a solution.
This is my input:
<!-- #include file="filename.asp" -->
I want output of regular expression like this:
filename.asp
I did some research and find the following solution.
Regular Expression:
/#include\W+file="([^"]+)"/g
Example code (VB.NET):
Dim list As New List(Of String)
Dim regex = New System.Text.RegularExpressions.Regex("#include\W+file=""([^""]+)""")
Dim matchResult = regex.Match(filetext)
While matchResult.Success
list.Add(matchResult.Groups(1).Value)
matchResult = matchResult.NextMatch()
End While
Example code (C#):
var list = new List<string>();
var regex = new Regex("#include\\W+file=\"([^\"]+)\"");
var matchResult = regex.Match(fileContent);
while (matchResult.Success) {
list.Add(matchResult.Groups[1].Value);
matchResult = matchResult.NextMatch();
}
Improved Regular Expression (ignores spaces):
#include\W+file[\s]*=[\s]*"([^"]+)"

vb.net Regex - Replace a tags without replacing span tags

My function needs to replace a tags from a string if the data extracted in it has a url.
for example:
<a href=www.cnn.com>www.cnn.com</a>
will be replace with:
www.cnn.com
That works fine but when i have a string like:
www.cnn.com</span>
I get only:
www.cnn.com
when i actually want to stay with:
<span style="color: rgb(255, 0, 0);">www.cnn.com</span>
What do i need to add to the code for it to work?
This is my function:
Dim ret As String = text
'If it looks like a URL
Dim regURL As New Regex("(www|\.org\b|\.com\b|http)")
'Gets a Tags regex
Dim rxgATags = New Regex("<[^>]*>", RegexOptions.IgnoreCase)
'Gets all matches of <a></a> and adds them to a list
Dim matches As MatchCollection = Regex.Matches(ret, "<a\b[^>]*>(.*?)</a>")
'for each <a></a> in the text check it's content, if it looks like URL then delete the <a></a>
For Each m In matches
'tmpText holds the data extracted within the a tags. /visit at.../www.applyhere.com
Dim tmpText = rxgATags.Replace(m.ToString, "")
If regURL.IsMatch(tmpText) Then
ret = ret.Replace(m.ToString, tmpText)
End If
Next
Return ret
The following Regex will remove all HTML tags:
string someString = "www.visitus.com</span>";
string target = System.Text.RegularExpressions.Regex.Replace(someString, #"<[^>]*>", "", RegexOptions.Compiled).ToString();
This is the Regex you want : <[^>]*>
Result of my code : www.visitus.com
You may use the following regex - <a\s*[^<>]*>|</a> - that will match all <a> nodes, both opening and close ones.
You do not need to use regURL, this can be built into the rxATags regex. We can make sure it is an URL-referencing <a> tag by checking href and regURL alternatives, then grab everything in between the opening and close` tags, and then use only what is in between.
Dim ret As String = "www.visitus.com</span>"
'Gets a Tags regex
Dim rxgATags = New Regex("(<a\s*[^<>]*href=[""']?(?:www|\.org\b|\.com\b|http)[^<>]*>)((?>\s*<(?<t>[\w.-]+)[^<>]*?>[^<>]*?</\k<t>>\s*)+)(</a>)", RegexOptions.IgnoreCase)
Dim replacement As String = "$2"
ret = rxgATags.Replace(ret, replacement)
I add this to my code:
'Selects only the A tags without the data extracted between them
Dim rxgATagsOnly = New Regex("</?a\b[^>]*>", RegexOptions.IgnoreCase)
For Each m In matches
'tmpText holds the data extracted within the a tags. /visit at.../www.applyhere.com
Dim tmpText = rxgATagsContent.Replace(m.ToString, "")
'if the data extract between the tags looks like a URL then take off the a tags without touching the span tags.
If regURL.IsMatch(tmpText) Then
'select everything but a tags
Dim noATagsStr As String = rxgATagsOnly.Replace(m.ToString, Environment.NewLine)
'replaces string with a tag to non a tag string keeping it's span tags
ret = ret.Replace(m.ToString, noATagsStr)
End If
Next
so from the string:
www.cnn.com</span>
i selected only the a tags with Avinash Raj regex
and then replaced them with "".
Thank you all for answering.

Extract variables from pattern matching

I'm not a match-pattern expert and I've been working on this for a few hours with no chance :/
I have an input string just like this:
Dim text As String = "32 Barcelona {GM C} 2 {*** Some ""cool"" text here}"
And I just want to extract 3 things:
Barcelona
GM C
*** Some "cool" text here
The pattern I'm trying is something like this:
Dim pattern As String = "^32\s(?<city>[^]].*\s)\{(?<titles>.*\})*"
Dim m As Match = Regex.Match(text, pattern)
If (m.Success) Then
Dim group1 As Group = m.Groups.Item("city")
Dim group2 As Group = m.Groups.Item("titles")
If group1.Success Then
MsgBox("City:" + group1.Value + ":", MsgBoxStyle.Information)
End If
If group2.Success Then
MsgBox(group2.Value, MsgBoxStyle.Information)
End If
Else
MsgBox("fail")
End If
But it's not working anyway :(
What should the pattern be to extract these 3 variables ?
^\d*(?<City>[A-Z a-z0-9]*)\s*\{(?<Titles>[A-Z a-z0-9]*)\}.*?\{(?<Cool>.*?)\}$
Seems to match your sample input.
Expresso is a great tool for designing regular expressions.

How to Display Value between html tags in Vb.net Form

I want to Extract number between font tags from a website in my vb.net form
<html>
...
When asked enter the code: <font color=blue>24006 </font>
...
</html>
The 24006 is Auto generated number which Change automatically.
i use:
Dim str As String = New WebClient().DownloadString(("http://www.example.com"))
Dim pattern = "When asked enter the code: <font color=blue>\d{5,}\s</font>"
Dim r = New Regex(pattern, RegexOptions.IgnoreCase)
Dim m As Match = r.Match(str)
If m.Success Then
Label1.Text = "Code" + m.Groups(1).ToString()
m = m.NextMatch()
Else
Debug.Print("Failed")
End If
But got Output in Label1:
Code
You have to set the capturing group. Regex should be "When asked enter the code: <font color=blue>(\d{5,})\s<\/font>" (notice the parentheses around \d{5,}).
Regards