VB NET regexp matching numerical substrings - regex

I'm trying to make a vb function that takes as input a String and returns, if exist, the string made of numeric digits from the beginning until the first non numerical char, so:
123 -> 123
12f -> 12
12g34 -> 12
f12 -> ""
"" -> ""
I wrote a function that incrementally compares the result matching the regex, but it goes on even on non numeric characters...
This is the function:
Public Function ParseValoreVelocita(ByVal valoreRaw As String) As String
Dim result As New StringBuilder
Dim regexp As New Regex("^[0-9]+")
Dim tmp As New StringBuilder
Dim stringIndex As Integer = 0
Dim out As Boolean = False
While stringIndex < valoreRaw.Length AndAlso Not out
tmp.Append(valoreRaw.ElementAt(stringIndex))
If regexp.Match(tmp.ToString).Success Then
result.Append(valoreRaw.ElementAt(stringIndex))
stringIndex = stringIndex + 1
Else
out = True
End If
End While
Return result.ToString
End Function
The output always equals the input string, so there's something wrong and I can't get out of it...

Here's a LINQ solution that doesn't need regex and increases readability:
Dim startDigits = valoreRaw.TakeWhile(AddressOf Char.IsDigit)
Dim result As String = String.Concat(startDigits)

Try this instead. You need to use a capture group:
Public Function ParseValoreVelocita(ByVal valoreRaw As String) As String
Dim result As New StringBuilder
Dim regexp As New Regex("^([0-9]+)")
Dim tmp As New StringBuilder
Dim stringIndex As Integer = 0
Dim out As Boolean = False
While stringIndex < valoreRaw.Length AndAlso Not out
tmp.Append(valoreRaw.ElementAt(stringIndex))
If regexp.Match(tmp.ToString).Success Then
result.Append(regexp.Match(tmp.ToString).Groups(1).Value)
stringIndex = stringIndex + 1
Else
out = True
End If
End While
Return result.ToString
End Function
The expression:
Dim regexp As New Regex("^([0-9]+)")
and the result appending lines have been updated:
result.Append(regexp.Match(tmp.ToString).Groups(1).Value)

You have made your code very complex for a simple task.
Your loop keeps trying to build a longer string and it keeps checking if it is still working with digits, and if so keep appending results.
So and input string of "123x" would, if your code worked, produce a string of "112123" as output. In other words it matches the "1", then "12", then "123"and concatenates each before exiting after it finds the "x".
Here's what you should be doing:
Public Function ParseValoreVelocita(valoreRaw As String) As String
Dim regexp As New Regex("^([0-9]+)")
Dim match = regexp.Match(valoreRaw)
If match.Success Then
Return match.Groups(1).Captures(0).Value
Else
Return ""
End If
End Function
No loop and you let the regex do the work.

Related

VBA function Regex

Do you have an idea of what is wrong in this code please? It should extract all caps and the pattern "1WO" if available.
For example in "User:399595:Account:ETH:balance", i should have "UAETH" and in "User:197755:Account:1WO:balance" i should have "UA1WO"
Thank you
Option Explicit
Function ExtractCap(Txt As String) As String
Application.Volatile
Dim xRegEx As Object
Set xRegEx = CreateObject("VBSCRIPT.REGEXP")
If xRegEx.Pattern = "[^A-Z]" Then
xRegEx.Global = True
xRegEx.MultiLine = False
ExtractCap = xRegEx.Replace(Txt, "")
Set xRegEx = Nothing
Else: xRegEx.Pattern = "1WO"
ExtractCap = xRegEx.Execute(Txt)
End If
End Function
I'm not a "RegEx" expert, so you may want to try an alternative:
Function ExtractCap(Txt As String) As String
Application.Volatile
Dim i As Long
For i = 1 To Len(Txt)
Select Case Asc(Mid(Txt, i, 1))
Case 65 To 90
ExtractCap = ExtractCap & Mid(Txt, i, 1)
End Select
Next
End Function
while, should the pattern of your data strictly be as you showed, you could also consider:
Function ExtractCap(Txt As String) As String
Application.Volatile
ExtractCap = "UA" & Split(Txt, ":")(3)
End Function
Your RegEx works like this:
Function ExtractCap(Txt As String) As String
Application.Volatile
Dim xRegEx As Object
Set xRegEx = CreateObject("VBScript.RegExp")
With xRegEx
.Pattern = "[^A-Z]"
.Global = True
.MultiLine = False
ExtractCap = .Replace(Txt, vbNullString)
End With
If Txt = ExtractCap Then ExtractCap = "1WO"
End Function
Public Sub TestMe()
Debug.Print ExtractCap("User:399595:Account:ETH:balance")
End Sub
In your code, there were 2 errors, which stopped the execution:
xRegEx was set to Nothing and then it was asked to provide a value;
the check If xRegEx.Pattern = "[^A-Z]" does not actually mean a lot to VBA. E.g., you are setting a Pattern and making a condition out of it. If you want to know whether a pattern exists in a RegEx, you should compare the two strings - before and after the execution of the pattern.
Your problem can be easily solved.
Firstly, I assumed that 1WO can appears at most once in your string.
Based on that assumption, logic is as follows:
Define function, which extracts all capital letters from strings.
Now, in the main function, you split your string first using 1WO as delimeter. Now, pass every string (after splitting) to function, get all the caps from those strings and concatenate them again with 1WO in its place.
Option Explicit
Public Function Extract(str As String) As String
Dim s As Variant
For Each s In Split(str, "1WO")
'append extracted caps with 1WO at the end
Extract = Extract & ExtractCaps(s) & "1WO"
Next
'delete lest 1WO from result
Extract = Left(Extract, Len(Extract) - 3)
End Function
Function ExtractCaps(str As Variant) As String
Dim i As Long, char As String
For i = 1 To Len(str)
char = Mid(str, i, 1)
If Asc(char) > 64 And Asc(char) < 91 And char = UCase(char) Then
ExtractCaps = ExtractCaps & char
End If
Next
End Function
If you put this code in inserted Module, you can use it in a worksheet in formula: =Extract(A1).

Regular Expression only returns 1 match

My VBA function should take a string referencing a range of units (i.e. "WWW1-5") and then return another string.
I want to take the argument, and put it in a comma separated string,
So "WWW1-5" should become "WWW1, WWW2, WWW3, WWW4, WWW5".
It's not always going to be a single digit. For example, I might need to separate "XXX11-18" or something similar.
I have never used regular expressions, but keep trying different things to make this work and it seems to only be finding 1 match instead of 3.
Any ideas? Here is my code:
Private Function split_group(ByVal group As String) As String
Dim re As Object
Dim matches As Object
Dim result As String
Dim prefix As String
Dim startVar As Integer
Dim endVar As Integer
Dim i As Integer
Set re = CreateObject("vbscript.regexp")
re.Pattern = "([A-Z]+)(\d+)[-](\d+)"
re.IgnoreCase = False
Set matches = re.Execute(group)
Debug.Print matches.Count
If matches.Count <> 0 Then
prefix = matches.Item(0)
startVar = CInt(matches.Item(1)) 'error occurs here
endVar = CInt(matches.Item(2))
result = ""
For i = startVar To endVar - 1
result = result & prefix & i & ","
Next i
split_group = result & prefix & endVar
Else
MsgBox "There is an error with splitting a group."
split_group = "ERROR"
End If
End Function
I tried setting global = true but I realized that wasn't the problem. The error actually occurs on the line with the comment but I assume it's because there was only 1 match.
I tried googling it but everyone's situation seemed to be a little different than mine and since this is my first time using RE I don't think I understand the patterns enough to see if maybe that was the problem.
Thanks!
Try the modified Function below:
Private Function split_metergroup(ByVal group As String) As String
Dim re As Object
Dim matches As Variant
Dim result As String
Dim prefix As String
Dim startVar As Integer
Dim endVar As Integer
Dim i As Integer
Set re = CreateObject("VBScript.RegExp")
With re
.Global = True
.IgnoreCase = True
.Pattern = "[0-9]{1,20}" '<-- Modified the Pattern
End With
Set matches = re.Execute(group)
If matches.Count > 0 Then
startVar = CInt(matches.Item(0)) ' <-- modified
endVar = CInt(matches.Item(1)) ' <-- modified
prefix = Left(group, InStr(group, startVar) - 1) ' <-- modified
result = ""
For i = startVar To endVar - 1
result = result & prefix & i & ","
Next i
split_metergroup = result & prefix & endVar
Else
MsgBox "There is an error with splitting a meter group."
split_metergroup = "ERROR"
End If
End Function
The Sub I've tested it with:
Option Explicit
Sub TestRegEx()
Dim Res As String
Res = split_metergroup("DEV11-18")
Debug.Print Res
End Sub
Result I got in the immediate window:
DEV11,DEV12,DEV13,DEV14,DEV15,DEV16,DEV17,DEV18
Another RegExp option, this one uses SubMatches:
Test
Sub TestRegEx()
Dim StrTst As String
MsgBox WallIndside("WAL7-21")
End Sub
Code
Function WallIndside(StrIn As String) As String
Dim objRegex As Object
Dim objRegMC As Object
Dim lngCnt As Long
Set objRegex = CreateObject("VBScript.RegExp")
With objRegex
.Global = True
.IgnoreCase = True
.Pattern = "([a-z]+)(\d+)-(\d+)"
If .test(StrIn) Then
Set objRegMC = .Execute(StrIn)
For lngCnt = objRegMC(0).submatches(1) To objRegMC(0).submatches(2)
WallIndside = WallIndside & (objRegMC(0).submatches(0) & lngCnt & ", ")
Next
WallIndside = Left$(WallIndside, Len(WallIndside) - 2)
Else
WallIndside = "no match"
End If
End With
End Function
#Shai Rado 's answer worked. But I figured out on my own WHY my original code was not working, and was able to lightly modify it.
The pattern was finding only 1 match because it was finding 1 FULL MATCH. The full match was the entire string. The submatches were really what I was trying to get.
And this is what I modified to make the original code work (asking for each submatch of the 1 full match):

Split comma delimited string to array using regex

I have a string as below, which needs to be split to an array, using VB.NET
10,"Test, t1",10.1,,,"123"
The result array must have 6 rows as below
10
Test, t1
10.1
(empty)
(empty)
123
So:
1. quotes around strings must be removed
2. comma can be inside strings, and will remain there (row 2 in result array)
3. can have empty fields (comma after comma in source string, with nothing in between)
Thanks
Don't use String.Split(): it's slow, and doesn't account for a number of possible edge cases.
Don't use RegEx. RegEx can be shoe-horned to do this accurately, but to correctly account for all the cases the expression tends to be very complicated, hard to maintain, and at this point isn't much faster than the .Split() option.
Do use a dedicated CSV parser. Options include the Microsoft.VisualBasic.TextFieldParser type, FastCSV, linq-to-csv, and a parser I wrote for another answer.
You can write a function yourself. This should do the trick:
Dim values as New List(Of String)
Dim currentValueIsString as Boolean
Dim valueSeparator as Char = ","c
Dim currentValue as String = String.Empty
For Each c as Char in inputString
If c = """"c Then
If currentValueIsString Then
currentValueIsString = False
Else
currentValueIsString = True
End If
End If
If c = valueSeparator Andalso not currentValueIsString Then
If String.IsNullOrEmpty(currentValue) Then currentValue = "(empty)"
values.Add(currentValue)
currentValue = String.Empty
End If
currentValue += c
Next
Here's another simple way that loops by the delimiter instead of by character:
Public Function Parser(ByVal ParseString As String) As List(Of String)
Dim Trimmer() As Char = {Chr(34), Chr(44)}
Parser = New List(Of String)
While ParseString.Length > 1
Dim TempString As String = ""
If ParseString.StartsWith(Trimmer(0)) Then
ParseString = ParseString.TrimStart(Trimmer)
Parser.Add(ParseString.Substring(0, ParseString.IndexOf(Trimmer(0))))
ParseString = ParseString.Substring(Parser.Last.Length)
ParseString = ParseString.TrimStart(Trimmer)
ElseIf ParseString.StartsWith(Trimmer(1)) Then
Parser.Add("")
ParseString = ParseString.Substring(1)
Else
Parser.Add(ParseString.Substring(0, ParseString.IndexOf(Trimmer(1))))
ParseString = ParseString.Substring(ParseString.IndexOf(Trimmer(1)) + 1)
End If
End While
End Function
This returns a list. If you must have an array just use the ToArray method when you call the function
Why not just use the split method?
Dim s as String = "10,\"Test, t1\",10.1,,,\"123\""
s = s.Replace("\"","")
Dim arr as String[] = s.Split(',')
My VB is rusty so consider this pseudo-code

match date pattern in the string vba excel

Edit:
Since my string became more and more complicated looks like regexp is the only way.
I do not have a lot experience in that and your help is much appreciated.
Basically from what I read on the web I construct the following exp to try matching occurrence in my sample string:
"My very long long string 12Mar2012 is right here 23Apr2015"
[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]
and trying this code. I do not have any match. Any good link on regexp tutorial much appreciated.
Dim re, match, RegExDate
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(^[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]$)"
re.Global = True
For Each match In re.Execute(str)
MsgBox match.Value
RegExDate = match.Value
Exit For
Next
Thank you
This code validates the actual date from the Regexp using DateValuefor robustness
Sub Robust()
Dim Regex As Object
Dim RegexMC As Object
Dim RegexM As Object
Dim strIn As String
Dim BDate As Boolean
strIn = "My very long long string 12Mar2012 is right here 23Apr2015 and 30Feb2002"
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Pattern = "(([0-9])|([0-2][0-9])|([3][0-1]))(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})"
.Global = True
If .test(strIn) Then
Set RegexMC = .Execute(strIn)
On Error Resume Next
For Each RegexM In RegexMC
BDate = False
BDate = IsDate(DateValue(RegexM.submatches(0) & " " & RegexM.submatches(4) & " " & RegexM.submatches(5)))
If BDate Then Debug.Print RegexM
Next
On Error GoTo 0
End If
End With
End Sub
thanks for all your help !!!
I managed to solve my problem using this simple code.
Dim rex As New RegExp
Dim dateCol As New Collection
rex.Pattern = "(\d|\d\d)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})?"
rex.Global = True
For Each match In rex.Execute(sStream)
dateCol.Add match.Value
Next
Just note that on my side I'm sure that I got valid date in the string so the reg expression is easy.
thnx
Ilya
The following is a quick attempt I made. It's far from perfect.
Basically, it splits the string into words. While looping through the words it cuts off any punctuation (period and comma, you might need to add more).
When processing an item, we try to remove each month name from it. If the string gets shorter we might have a date.
It checks to see if the length of the final string is about right (5 or 6 characters, 1 or 2 + 4 for day and year)
You could instead (or also) check to see that there all numbers.
Private Const MonthList = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
Public Function getDates(ByVal Target As String) As String
Dim Data() As String
Dim Item As String
Dim Index As Integer
Dim List() As String
Dim Index2 As Integer
Dim Test As String
Dim Result As String
List = Split(MonthList, ",")
Data = Split(Target, " ")
Result = ""
For Index = LBound(Data) To UBound(Data)
Item = UCase(Replace(Replace(Data(Index), ".", ""), ",", ""))
For Index2 = LBound(Data) To UBound(Data)
Test = Replace(Item, List(Index2), "")
If Not Test = Item Then
If Len(Test) = 5 Or Len(Test) = 6 Then
If Result = "" Then
Result = Item
Else
Result = Result & ", " & Item
End If
End If
End If
Next Index2
Next
getDates = Result
End Function

Regular expression to extract numbers from long string containing lots of punctuation

I am trying to separate numbers from a string which includes %,/,etc for eg (%2459348?:, or :2434545/%). How can I separate it, in VB.net
you want only the numbers right?
then you could do it like this
Dim theString As String = "/79465*44498%464"
Dim ret = Regex.Replace(theString, "[^0-9]", String.Empty)
hth
edit:
or do you want to split by all non number chars?
then it would go like this
Dim ret = Regex.Split(theString, "[^0-9]")
You could loop through each character of the string and check the .IsNumber() on it.
This should do:
Dim test As String = "%2459348?:"
Dim match As Match = Regex.Match(test, "\d+")
If match.Success Then
Dim result As String = match.Value
' Do something with result
End If
Result = 2459348
Here's a function which will extract all of the numbers out of a string.
Public Function GetNumbers(ByVal str as String) As String
Dim builder As New StringBuilder()
For Each c in str
If Char.IsNumber(c) Then
builder.Append(c)
End If
Next
return builder.ToString()
End Function