How to extract text using regex in vb6 - regex

How to extract text using regex in vb6
Dim MyText As String
MyText = "anything [*]textToExtract[*] anything"
Result Should Be :
textToExtract

Sub test()
Dim re As RegExp, m As MatchCollection
Set re = New RegExp
Dim MyText As String, extractedText As String
MyText = "anything textToExtract anything"
re.Pattern = "anything (.*) anything"
Set m = re.Execute(MyText)
extractedText = m(0).SubMatches(0)
End Sub

Related

Replace part of String, using regex and vb.net

My application stores the path of the userdata.dll in a String.
I need to convert this string: C:\Applications\User\userdata.dll
into this: C:\\Applications\\User\\userdata.dll
All \ will need to be duplicated, independent on how many \ the path have.
Something like:
Dim defaultPath As String = "C:\Applications\User\userdata.dll"
' Regex
Dim r As Regex = New Regex( ... )
' This is the replacement string
Dim Replacement As String = " ... $1 ... "
' Replace the matched text in the InputText using the replacement pattern
Dim modifiedPath As String = r.Replace(defaultPath,Replacement)
Any help on this? I am trying to follow this question:
How to replace some part of this string with vb.net?
But cant find out how to make this Regex...
You can use
Dim pattern As String = "\\"
Dim rgx As New Regex(pattern)
Dim input As String = "C:\Applications\User\userdata.dll"
Dim result As String = rgx.Replace(input, "\\")
Console.WriteLine(result)
Ideone Demo
If you mean to say that replace any number of \ to \\, then you can use
Dim pattern As String = "\\+"
Dim rgx As New Regex(pattern)
Dim input As String = "C:\\\\Applications\User\userdata.dll"
Dim result As String = rgx.Replace(input, "\\")
Ideone Demo

Replace the second char of a string

I have a string variable.
Dim str As String = "ABBCD"
I want to replace only the second 'B' character of str (I mean the second occurrence)
my code
Dim regex As New Regex("B")
Dim result As String = regex.Replace(str, "x", 2)
'result: AxxCD
'but I want: ABxCD
What's the easiest way to do this with Regular Expressions.
thanks
Dim str As String = "ABBCD"
Dim matches As MatchCollection = Regex.Matches(str, "B")
If matches.Count >= 2 Then
str = str.Remove(matches(1).Index, matches(1).Length)
str = str.Insert(matches(1).Index, "x")
End If
First we declare the string 'str', then find the matches of "B". If we found two results or more, replace the second result with "x".
How about:
resultString = Regex.Replace(subjectString, #"(B)\1", "$+x");
Use a positive lookbehind:
Dim regex As New Regex("(?<=B)B")
Live demo
If ABCABCABC should produce ABCAxCABC, then the following regex will work:
(?<=^[^B]*B[^B]*)B
Usage:
Dim result As String = Regex.Replace(str, "(?<=^[^B]*B[^B]*)B", "x")
I assume BB was just an example, it can be CC, DD, EE, etc..
Based on that, the regex below will replace any repeated character in the string.
resultString = Regex.Replace(subjectString, #"(\w)\1", "$1x");
'Alternative way to replace the second occurrence
'only of B in the string with X
Dim str As String = "ABBCD"
Dim pattern As String = "B"
Dim reg As Regex = New Regex(pattern)
Dim replacement As String = "X"
'find position of second B
Dim secondBpos As Integer = Regex.Matches(str, pattern)(1).Index
'replace that B with X
Dim result As String = reg.Replace(str, replacement, 1, secondBpos)
MessageBox.Show(result)

vb.net how to check if a string has a certain word

i want to check if a string matches a word in array not just matches characters. the contains method just checks the matches characters not the whole word. here is my code:
Dim builder As New StringBuilder()
Dim reader As New StringReader(txtOCR.Text)
Dim titles() As String = {"the", "a", "an", "of"}
Dim regex As New Regex(String.Join("|", titles), RegexOptions.IgnoreCase)
While True
Dim line As String = reader.ReadLine()
If line Is Nothing Then Exit While
Dim WordCount = New Regex("\w+").Matches(line).Count
If WordCount = 1 And Not line.ToLower().Contains("by") Then
builder.AppendLine(line)
ElseIf regex.IsMatch(line) Then
builder.AppendLine(line)
End If
End While
txtTitle.Text = builder.ToString()
You can use word boundaries \b to surround the individual words:
New Regex("\b" & String.Join("\b|\b", titles) & "\b")

regex code to extract html between 2 comments in vb.net not working

I'm trying to extract a portion of html between 2 comments.
here is the test code:
Sub Main()
Dim base_dir As String = "D:\"
Dim test_file As String = base_dir & "72.htm"
Dim start_comment As String = "<!-- start of content -->"
Dim end_comment As String = "<!-- end of content -->"
Dim regex_pattern As String = start_comment & ".*" & end_comment
Dim input_text As String = start_comment & "some more html text" & end_comment
Dim match As Match = Regex.Match(input_text, regex_pattern)
If match.Success Then
Console.WriteLine("found {0}", match.Value)
Else
Console.WriteLine("not found")
End If
Console.ReadLine()
End Sub
The above works.
When I try to load actual data from disk the below code fails.
Sub Main()
Dim base_dir As String = "D:\"
Dim test_file As String = base_dir & "72.htm"
Dim start_comment As String = "<!-- start of content -->"
Dim end_comment As String = "<!-- end of content -->"
Dim regex_pattern As String = start_comment & ".*" & end_comment
Dim input_text As String = System.IO.File.ReadAllText(test_file).Replace(vbCrLf, "")
Dim match As Match = Regex.Match(input_text, regex_pattern)
If match.Success Then
Console.WriteLine("found {0}", match.Value)
Else
Console.WriteLine("not found")
End If
Console.ReadLine()
End Sub
The HTML file contains the start and end comments and a good amount of HTML in-between.
Some content in the HTML file is in Arabic.
With thanks and regards.
Try passing in RegexOptions.Singleline into Regex.Match(...) like this:
Dim match As Match = Regex.Match(input_text, regex_pattern, RegexOptions.Singleline)
This will make the Dot's . match newlines.
I don't know vb.net, but does . match newlines or is there an option you have to set for that? Consider using [\s\S] instead of . to include newlines.

VB.Net Regular Expressions - Extracting Wildcard Value

I need help extracting the value of a wildcard from a Regular Expressions match. For example:
Regex: "I like *"
Input: "I like chocolate"
I would like to be able to extract the string "chocolate" from the Regex match (or whatever else is there). If possible, I also want to be able to retrieve several wildcard values from a single wildcard match. For example:
Regex: "I play the * and the *"
Input: "I play the guitar and the bass"
I want to be able to extract both "guitar" and "bass". Is there a way to do it?
In general regex utilize the concepts of groups. Groups are indicated by parenthesis.
So I like
Would be I like (.) . = All character * meaning as many or none of the preceding character
Sub Main()
Dim s As String = "I Like hats"
Dim rxstr As String = "I Like(.*)"
Dim m As Match = Regex.Match(s, rxstr)
Console.WriteLine(m.Groups(1))
End Sub
The above code will work for and string that has I Like and will print out all characters after including the ' ' as . matches even white space.
Your second case is more interesting because the first rx will match the entire end of the string you need something more restrictive.
I Like (\w+) and (\w+) : this will match I Like then a space and one or more word characters and then an and a space and one or more word characters
Sub Main()
Dim s2 As String = "I Like hats and dogs"
Dim rxstr2 As String = "I Like (\w+) and (\w+)"
Dim m As Match = Regex.Match(s2, rxstr2)
Console.WriteLine("{0} : {1}", m.Groups(1), m.Groups(2))
End Sub
For a more complete treatment of regex take a look at this site which has a great tutorial.
Here is my RegexExtract Function in VBA. It will return just the sub match you specify (only the stuff in parenthesis). So in your case, you'd write:
=RegexExtract(A1, "I like (.*)")
Here is the code.
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String) As String
Application.ScreenUpdating = False
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
RegexExtract = allMatches.Item(0).submatches.Item(0)
Application.ScreenUpdating = True
End Function
Here is a version that will allow you to use multiple groups to extract multiple parts at once:
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String) As String
Application.ScreenUpdating = False
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
Dim i As Long
Dim result As String
RE.Pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.Item(0).submatches.count - 1
result = result & allMatches.Item(0).submatches.Item(i)
Next
RegexExtract = result
Application.ScreenUpdating = True
End Function