VBA RegEx getting String with only Number and Hyphen - regex

I have a string with something like
Bl. 01 - 03
I want this to be reduced to only
01-03
Everything other than digits & hyphen should be removed. Any ideas how to do it using regex or any other method?

you can use this pattern in a replace expression:
reg.Pattern = "[^\d-]+"
Debug.Print reg.Replace(yourstring, "")

Barring a more complete description of exactly what you mean by something like "BI. 01 - 03", this:
^.*(\d{2}\s?-\s?\d{2}).*$
Will capture the portion you seem to be interested in as group 1. If you want to get rid of the spaces as well, then something like:
^.*(\d{2})\s?-\s?(\d{2}).*$
might be more suited, where you will have the two numbers in groups 1 and 2, and can replace the hyphen in output.

Here's a function with a non-RegEx approach to remove anything but digits and the hyphen from a given input string:
Function removeBadChars(sInput As String) As String
Dim i As Integer
Dim sResult As String
Dim sChr As String
For i = 1 To Len(sInput)
sChr = Mid(sInput, i, 1)
If IsNumeric(sChr) Or sChr = "-" Then
sResult = sResult & sChr
End If
Next
removeBadChars = sResult
End Function

Related

Extract an 8 digits number from a string with additional conditions

I need to extract a number from a string with several conditions.
It has to start with 1-9, not with 0, and it will have 8 digits. Like 23242526 or 65478932
There will be either an empty space or a text variable before it. Like MMX: 23242526 or bgr65478932
It could have come in rare cases: 23,242,526
It ends with an emty space or a text variable.
Here are several examples:
From RE: Markitwire: 120432889: Mx: 24,693,059 i need to get 24693059
From Automatic reply: Auftrag zur Übertragung IRD Ref-Nr. MMX_23497152 need to get 23497152
From FW: CGMSE 2019-2X A1AN XS2022418672 Contract 24663537 need to get 24663537
From RE: BBVA-MAD MMX_24644644 + MMX_24644645 need to get 24644644, 24644645
Right now I'm using the regexextract function(found it on this web-site), which extracts any number with 8 digits starting with 2. However it would also extract a number from, let's say, this expression TGF00023242526, which is incorrect. Moreover, I don't know how to add additional conditions to the code.
=RegexExtract(A11, ""(2\d{7})\b"", ", ")
Thank you in advance.
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String, _
Optional seperator As String = "") As String
Dim i As Long, j As Long
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = extract_what
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.Count - 1
For j = 0 To allMatches.Item(i).SubMatches.Count - 1
result = result & seperator & allMatches.Item(i).SubMatches.Item(j)
Next
Next
If Len(result) <> 0 Then
result = Right(result, Len(result) - Len(seperator))
End If
RegexExtract = result
End Function
You may create a custom boundary using a non-capturing group before the pattern you have:
(?:[\D0]|^)(2\d{7})\b
^^^^^^^^^^^
The (?:[\D0]|^) part matches either a non-digit (\D) or 0 or (|) start of string (^).
As an alternative to also match 8 digits in values like 23,242,526 and start with a digit 1-9 you might use
\b[1-9](?:,?\d){7}\b
\b Word boundary
[1-9] Match the firstdigit 1-9
(?:,?\d){7} Repeat 7 times matching an optional comma and a digit
\b Word boundary
Regex demo
Then you could afterwards replace the comma's with an empty string.

Replace 2 step Regex with 1 step Regex to get one upper case letter between underscores

I have a string, myFile, that looks like: Name_2019-11-29_D_HPSeries.txt. I need to extract the letter D between the underscores...the letter could be any uppercase letter. Right now I am using a 2 step Regex code.
Dim bC As String = Regex.Match(myFile, "_[A-Z]+_").ToString
boatClass = Regex.Match(bC, "[A-Z]+").ToString
This works but I believe it could be done with one line. I tried the code below but it doesn't work.
boatClass = Regex.Replace(myFile, "_[A-Z]_", "[A-Z]").ToString
You can use positive lookarounds to avoid a 2-step process, checking that the characters before and after the letter are underscores without capturing them:
Dim myFile AS String = "Name_2019-11-29_D_HPSeries.txt"
Dim bC As String = Regex.Match(myFile, "(?<=_)[A-Z](?=_)").ToString
Console.WriteLine(bc)
Output:
D
You were almost there with a single char A-Z, but you could wrap it in a capturing group and then use the Match.Groups property.
_([A-Z])_
Regex demo | VB.Net Demo
For example
Dim myFile AS String = "Name_2019-11-29_D_HPSeries.txt"
Dim bC As String = Regex.Match(myFile, "_([A-Z])_").Groups(1).Value
Console.WriteLine(bc)
Result
D

Extract largest numeric sequence from string (regex, or?)

I have strings similar to the following:
4123499-TESCO45-123
every99999994_54
And I want to extract the largest numeric sequence in each string, respectively:
4123499
99999994
I have previously tried regex (I am using VB6)
Set rx = New RegExp
rx.Pattern = "[^\d]"
rx.Global = True
StringText = rx.Replace(StringText, "")
Which gets me partway there, but it only removes the non-numeric values, and I end up with the first string looking like:
412349945123
Can I find a regex that will give me what I require, or will I have to try another method? Essentially, my pattern would have to be anything that isn't the longest numeric sequence. But I'm not actually sure if that is even a reasonable pattern. Could anyone with a better handle of regex tell me if I am going down a rabbit hole? I appreciate any help!
You cannot get the result by just a regex. You will have to extract all numeric chunks and get the longest one using other programming means.
Here is an example:
Dim strPattern As String: strPattern = "\d+"
Dim str As String: str = "4123499-TESCO45-123"
Dim regEx As New RegExp
Dim matches As MatchCollection
Dim match As Match
Dim result As String
With regEx
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = strPattern
End With
Set matches = regEx.Execute(str)
For Each m In matches
If result < Len(m.Value) Then result = m.Value
Next
Debug.Print result
The \d+ with RegExp.Global=True will find all digit chunks and then only the longest will be printed after all matches are processed in a loop.
That's not solvable with an RE on its own.
Instead you can simply walk along the string tracking the longest consecutive digit group:
For i = 1 To Len(StringText)
If IsNumeric(Mid$(StringText, i, 1)) Then
a = a & Mid$(StringText, i, 1)
Else
a = ""
End If
If Len(a) > Len(longest) Then longest = a
Next
MsgBox longest
(first result wins a tie)
If the two examples you gave, are of a standard where:
<long_number>-<some_other_data>-<short_number>
<text><long_number>_<short_number>
Are the two formats that the strings come in, there are some solutions.
However, if you are searching any string in any format for the longest number, these will not work.
Solution 1
([0-9]+)[_-].*
See the demo
In the first capture group, you should have the longest number for those 2 formats.
Note: This assumes that the longest number will be the first number it encounters with an underscore or a hyphen next to it, matching those two examples given.
Solution 2
\d{6,}
See the demo
Note: This assumes that the shortest number will never exceed 5 characters in length, and the longest number will never be shorter than 6 characters in length
Please, try.
Pure VB. No external libs or objects.
No brain-breaking regexp's patterns.
No string manipulations, so - speed. Superspeed. ~30 times faster than regexp :)
Easy transform on variouse needs.
For example, concatenate all digits from the source string to a single string.
Moreover, if target string is only intermediate step,
so it's possible to manipulate with numbers only.
Public Sub sb_BigNmb()
Dim sSrc$, sTgt$
Dim taSrc() As Byte, taTgt() As Byte, tLB As Byte, tUB As Byte
Dim s As Byte, t As Byte, tLenMin As Byte
tLenMin = 4
sSrc = "every99999994_54"
sTgt = vbNullString
taSrc = StrConv(sSrc, vbFromUnicode)
tLB = LBound(taSrc)
tUB = UBound(taSrc)
ReDim taTgt(tLB To tUB)
t = 0
For s = tLB To tUB
Select Case taSrc(s)
Case 48 To 57
taTgt(t) = taSrc(s)
t = t + 1
Case Else
If CBool(t) Then Exit For ' *** EXIT FOR ***
End Select
Next
If (t > tLenMin) Then
ReDim Preserve taTgt(tLB To (t - 1))
sTgt = StrConv(taTgt, vbUnicode)
End If
Debug.Print "'" & sTgt & "'"
Stop
End Sub
How to handle sSrc = "ev_1_ery99999994_54", please, make by yourself :)
.

regex with XE currency

guys I'm trying to make my personal app with VB.Net
and all of my code is working fine except one thing, which is the regex
I want to get this value
The Highlighted Value that I need
From this URL
I tried this regex:
("([0-9]+.+[1-9]+ (SAR)+)")
and it's not working very well (only works with some currency but not all).
so guys can you help with the perfect regex ?
***Update:
here is the whole function code:
Private Sub doCalculate()
' Need the scraping
Dim Str As System.IO.Stream
Dim srRead As System.IO.StreamReader
Dim strAmount As String
strAmount = currencyAmount.Text
' Get values from the textboxes
Dim strFrom() As String = Split(currecnyFrom.Text, " - ")
Dim strTo() As String = Split(currecnyTo.Text, " - ")
' Web fetching variables
Dim req As System.Net.WebRequest = System.Net.WebRequest.Create("https://www.xe.com/currencyconverter/convert.cgi?template=pca-new&Amount=" + strAmount + "&From=" + strFrom(1) + "&To=" + strTo(1) + "&image.x=39&image.y=9")
Dim resp As System.Net.WebResponse = req.GetResponse
Str = resp.GetResponseStream
srRead = New System.IO.StreamReader(Str)
' Match the response
Try
Dim myMatches As MatchCollection
Dim myRegExp As New Regex("(\d+\.\d+ SAR)")
myMatches = myRegExp.Matches(srRead.ReadToEnd)
' Search for all the words in the string
Dim sucessfulMatch As Match
For Each sucessfulMatch In myMatches
mainText.Text = sucessfulMatch.Value
Next
Catch ex As Exception
mainText.Text = "Unable to connect to XE"
Finally
' Close the streams
srRead.close()
Str.Close()
End Try
convertToLabel.Text = strAmount + " " + strFrom(0) + " Converts To: "
End Sub
Thanks.
You need to get the currency value that appears first. Thus, you need to replace
myMatches = myRegExp.Matches(srRead.ReadToEnd)
' Search for all the words in the string
Dim sucessfulMatch As Match
For Each sucessfulMatch In myMatches
mainText.Text = sucessfulMatch.Value
Next
with the following lines:
Dim myMatch As Match = myRegExp.Match(srRead.ReadToEnd)
mainText.Text = myMatch.Value
I also recommend using the following regex:
\b\d+\.\d+\p{Zs}+SAR\b
Explanation:
\b - word boundary
\d+ - 1+ digits
\. - a literal dot
\d+ - 1+ digits
\p{Zs}+ - 1 or more horizontal whitespace
SAR\b - whole word SAR.
You should use this regex.
Regex: (\d+\.\d+ SAR)
Explanation:
\d+ looks for multiple digits.
\.\d+ looks for decimal digits.
SAR matches literal string SAR which is your currency unit.
Regex101 Demo
I tried this regex:
("([0-9]+.+[1-9]+ (SAR)+)") and it's not working very well (only works
with some currency but not all).
What you are doing here is matching multiple digits anything multiple digits SAR multiple times.

Regex VB.Net Regex.Replace

I'm trying to perform a simple regex find and replace, adding a tab into the string after some digits as outlined below.
From
a/users/12345/badges
To
a/users/12345 /badges
I'm using the following:
s = regex.replace(s, "(a\/users\/\d*)("a\/users\/\d*\t)", $1 $2")
But im clearly doing something wrong.
Where am I going wrong, I know its a stupid mistake but help would be gratefully received.
VBVirg
You can achieve that with a mere look-ahead that will find the position right before the last /:
Dim s As String = Regex.Replace("a/users/12345/badges", "(?=/[^/]*$)", vbTab)
Output:
a/users/12345 /badges
See IDEONE demo
Or, you can just use LastIndexOf owith Insert:
Dim str2 As String
Dim str As String = "a/users/12345/badges"
Dim idx = str.LastIndexOf("/")
If idx > 0 Then
str2 = str.Insert(idx, vbTab)
End If
When I read, "adding a tab into the string after some digits" I think there could be more than one set of digits that can appear between forward slashes. This pattern:
"/(\d+)/"
Will capture only digits that are between forward slashes and will allow you to insert a tab like so:
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim str As String = "a/54321/us123ers/12345/badges"
str = Regex.Replace(str, "/(\d+)/", String.Format("/$1{0}/", vbTab))
Console.WriteLine(str)
Console.ReadLine()
End Sub
End Module
Results (NOTE: The tab spaces can vary in length):
a/54321 /us123ers/12345 /badges
When String is "a/54321/users/12345/badges" results are:
a/54321 /users/12345 /badges