Better way to extract numbers from a string - regex

I have been trying to change a string like this, {X=5, Y=9} to a string like this (5, 9), as it would be used as an on-screen coordinate.
I finally came up with this code:
Dim str As String = String.Empty
Dim regex As Regex = New Regex("\d+")
Dim m As Match = regex.Match("{X=9")
If m.Success Then str = m.Value
Dim s As Match = regex.Match("Y=5}")
If s.Success Then str = "(" & str & ", " & s.Value & ")"
MsgBox(str)
which does work, but surely there must be a better way to do this (I not familiar with Regex).
I have many to convert in my program, and doing it like above would be torturous.

You may use
Dim result As String = Regex.Replace(input, ".*?=(\d+).*?=(\d+).*", "($1, $2)")
The regex means
.*? - any 0+ chars other than newline chars as few as possible
= - an equals sign
(\d+) - Group 1: one or more digits
.*?= - any 0+ chars other than newline chars as few as possible and then a = char
(\d+) - Group 2: one or more digits
.* - any 0+ chars other than newline chars as many as possible
The $1 and $2 in the replacement pattern are replacement backreferences that point to the values stored in Group 1 and 2 memory buffer.

Related

RegEx array / list / collection of all matches in VBA

I'm trying to use RegEx to get all instances of varying strings that exist in between a particular pair set of strings. E.g. in the following string:
"The Start. Hello. Jamie. Bye. The Middle. Hello. Sarah. Bye. The End"
I want to get a collection / array consisting of "Jamie" and "Sarah" by checking in between "Hello. " and ". Bye. "
My RegEx object is working fine and I feel I'm nearly successful:
Sub Reggie()
Dim x As String: x = "The Start. Hello. Jamie. Bye. The Middle. Hello. Sarah. Bye. The End"
Dim regEx As RegExp
Set regEx = New RegExp
Dim rPat1 As String: rPat1 = "Hello. "
Dim rPat2 As String: rPat2 = " Bye."
Dim rPat3 As String: rPat3 = ".*"
With regEx
.Global = True
.ignorecase = True
.Pattern = "(^.*" & rPat1 & ")(" & rPat3 & ")(" & rPat2 & ".*)"
.MultiLine = True
' COMMAND HERE
End With
End Sub
But the last bit COMMAND HERE I'm trying .replace(x, "$2") which gives me a string of the last instance of a match i.e. Sarah
I've tried .Execute(x) which gives me a MatchCollection object and when browsing the immediate window I see that object only has the last instance of a match.
Is what I'm requiring possible and how?
That is because .* matches as many any chars as possible and you should not match the whole string by adding .* on both ends of your regular expression.
Besides, you need to escape special chars in the regex pattern, here, . is special as it matches any char other than a line break char.
You need to fix your regex declaration like
rPat1 = "Hello\. "
rPat2 = " Bye\."
rPat3 = ".*?"`
.Pattern = rPat1 & "(" & rPat3 & ")" & rPat2
Or, to further enhance the regex, you may
Replace literal spaces with \s* (zero or more whitespaces) or \s+ (one or more whitespaces) to support any whitespace
Match any non-word chars after the captures string with \W+ or \W*.
rPat1 = "Hello\.\s*"
rPat2 = "\W+Bye\."
rPat3 = ".*?"`
.Pattern = rPat1 & "(" & rPat3 & ")" & rPat2
See the regex demo. Details:
Hello\. - Hello. string
\s* - zero or more whitespaces
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
\W+ - one or more chars other than ASCII letters/digits/_
Bye\. - Bye. string.

How do i pick file names with specified pattern in scala

OTC_omega_20210302.csv
CH_delta_20210302.csv
MD_omega_20210310.csv
CD_delta_20210310.csv
val hdfsPath = "/development/staging/abcd-efgh"
val fs = org.apache.hadoop.fs.FileSystem.get(spark.sparkContext.hadoopConfiguration)
val files = fs.listStatus(new Path(s"${hdfsPath}")).filterNot(_.isDirectory).map(_.getPath)
val regX = "OTC_*[0-9].csv|CH_*[0-9].csv".stripMargin.r
val filteredFiles = files.filter(fName => regX.findFirstMatchIn(fName.getName).isDefined)
What is regex do i need to give if i need any file name that starts with either (OTC_ or CH_ ) and ends with YYYYMMDD.csv ?
As per the above files i need two outputs
OTC_omega_20210302.csv
CH_delta_20210302.csv
Please help
You can use
val regX = "^(?:OTC|CH)_.*[0-9]{8}\\.csv$".r
val regX = """^(?:OTC|CH)_.*[0-9]{8}\.csv$""".r
See the regex demo.
Details:
^ - start of string
(?:OTC|CH) - a non-capturing group matching either OTC or CH char sequences
_ - a _ char
.* - any zero or more chars other than line break chars, as many as possible
[0-9]{8} - eight digits
\. - a literal dot (note . matches any char other than a line break char, you must escape . to make it match a dot)
csv - a csv string
$ - end of string.

Remove optional whitespace when splitting with math operators while keeping them in the result

How to remove whitespace characters from the input string? I am using the following code that
Dim input As String = txtInput.Text
Dim symbol As String = "([-+*/])"
Dim substrings() As String = Regex.Split(input, symbol)
Dim cleaned As String = Regex.Replace(input, "\s", " ")
For Each match As String In substrings
lstOutput.Items.Add(match)
Next
Input: z + x
Output: z, + and x.
I want to get rid of the whitespace in the last item.
You may remove the redundant whitespace while splitting with
\s*([-+*/])\s*
See the regex demo. Also, it is a good idea to trim the input before passing to the regex replace method with .Trim().
Pattern details:
\s* - matches 0+ whitespaces (these will be discarded from the result as they are not captured)
([-+*/]) - Group 1 (captured texts will be output to the resulting array) capturing 1 char: -, +, * or /
\s* - matches 0+ whitespaces (these will be discarded from the result as they are not captured)

Extracting Parenthetical Data Using Regex

I have a small sub that extracts parenthetical data (including parentheses) from a string and stores it in cells adjacent to the string:
Sub parens()
Dim s As String, i As Long
Dim c As Collection
Set c = New Collection
s = ActiveCell.Value
ary = Split(s, ")")
For i = LBound(ary) To UBound(ary) - 1
bry = Split(ary(i), "(")
c.Add "(" & bry(1) & ")"
Next i
For i = 1 To c.Count
ActiveCell.Offset(0, i).NumberFormat = "#"
ActiveCell.Offset(0, i).Value = c.Item(i)
Next i
End Sub
For example:
I am now trying to replace this with some Regex code. I am NOT a regex expert. I want to create a pattern that looks for an open parenthesis followed by zero or more characters of any type followed by a close parenthesis.
I came up with:
\((.+?)\)
My current new code is:
Sub qwerty2()
Dim inpt As String, outpt As String
Dim MColl As MatchCollection, temp2 As String
Dim regex As RegExp, L As Long
inpt = ActiveCell.Value
MsgBox inpt
Set regex = New RegExp
regex.Pattern = "\((.+?)\)"
Set MColl = regex.Execute(inpt)
MsgBox MColl.Count
temp2 = MColl(0).Value
MsgBox temp2
End Sub
The code has at least two problems:
It will only get the first match in the string.(Mcoll.Count is always 1)
It will not recognize zero characters between the parentheses. (I think the .+? requires at least one character)
Does anyone have any suggestions ??
By default, RegExp Global property is False. You need to set it to True.
As for the regex, to match zero or more chars as few as possible, you need *?, not +?. Note that both are lazy (match as few as necessary to find a valid match), but + requires at least one char, while * allows matching zero chars (an empty string).
Thus, use
Set regex = New RegExp
regex.Global = True
regex.Pattern = "\((.*?)\)"
As for the regex, you can also use
regex.Pattern = "\(([^()]*)\)"
where [^()] is a negated character class matching any char but ( and ), zero or more times (due to * quantifier), matching as many such chars as possible (* is a greedy quantifier).

regex with XE currency

guys I'm trying to make my personal app with VB.Net
and all of my code is working fine except one thing, which is the regex
I want to get this value
The Highlighted Value that I need
From this URL
I tried this regex:
("([0-9]+.+[1-9]+ (SAR)+)")
and it's not working very well (only works with some currency but not all).
so guys can you help with the perfect regex ?
***Update:
here is the whole function code:
Private Sub doCalculate()
' Need the scraping
Dim Str As System.IO.Stream
Dim srRead As System.IO.StreamReader
Dim strAmount As String
strAmount = currencyAmount.Text
' Get values from the textboxes
Dim strFrom() As String = Split(currecnyFrom.Text, " - ")
Dim strTo() As String = Split(currecnyTo.Text, " - ")
' Web fetching variables
Dim req As System.Net.WebRequest = System.Net.WebRequest.Create("https://www.xe.com/currencyconverter/convert.cgi?template=pca-new&Amount=" + strAmount + "&From=" + strFrom(1) + "&To=" + strTo(1) + "&image.x=39&image.y=9")
Dim resp As System.Net.WebResponse = req.GetResponse
Str = resp.GetResponseStream
srRead = New System.IO.StreamReader(Str)
' Match the response
Try
Dim myMatches As MatchCollection
Dim myRegExp As New Regex("(\d+\.\d+ SAR)")
myMatches = myRegExp.Matches(srRead.ReadToEnd)
' Search for all the words in the string
Dim sucessfulMatch As Match
For Each sucessfulMatch In myMatches
mainText.Text = sucessfulMatch.Value
Next
Catch ex As Exception
mainText.Text = "Unable to connect to XE"
Finally
' Close the streams
srRead.close()
Str.Close()
End Try
convertToLabel.Text = strAmount + " " + strFrom(0) + " Converts To: "
End Sub
Thanks.
You need to get the currency value that appears first. Thus, you need to replace
myMatches = myRegExp.Matches(srRead.ReadToEnd)
' Search for all the words in the string
Dim sucessfulMatch As Match
For Each sucessfulMatch In myMatches
mainText.Text = sucessfulMatch.Value
Next
with the following lines:
Dim myMatch As Match = myRegExp.Match(srRead.ReadToEnd)
mainText.Text = myMatch.Value
I also recommend using the following regex:
\b\d+\.\d+\p{Zs}+SAR\b
Explanation:
\b - word boundary
\d+ - 1+ digits
\. - a literal dot
\d+ - 1+ digits
\p{Zs}+ - 1 or more horizontal whitespace
SAR\b - whole word SAR.
You should use this regex.
Regex: (\d+\.\d+ SAR)
Explanation:
\d+ looks for multiple digits.
\.\d+ looks for decimal digits.
SAR matches literal string SAR which is your currency unit.
Regex101 Demo
I tried this regex:
("([0-9]+.+[1-9]+ (SAR)+)") and it's not working very well (only works
with some currency but not all).
What you are doing here is matching multiple digits anything multiple digits SAR multiple times.