Exclude some words from regular expression - regex

I have function which inserts space after characters like : / -
Private Function formatColon(oldString As String) As String
Dim reg As New RegExp: reg.Global = True: reg.Pattern = "(\D:|\D/|\D-)"
Dim newString As String: newString = reg.Replace(oldString, "$1 ")
formatColon = Replace(Replace(Replace(newString, ": ", ": "), "/ ", "/ "), "- ", "- ")
End Function
The code excludes dates easily. I want to exclude some a particular strings like 'w/d' also.
Is there any way?
before abc/abc/15/06/2017 ref:123243-11 ref-111 w/d
after abc/ abc/ 15/06/2017 ref: 123243-11 ref- 111 w/ d
i want to exclude last w/d

You may use a (?!w/d) lookahead to avoid matching w/d with your pattern:
Dim oldString As String, newString As String
Dim reg As New RegExp
With reg
.Global = True
.Pattern = "(?!w/d)\D[:/-]"
End With
oldString = "abc/abc/15/06/2017 ref:123243-11 ref-111 w/d"
newString = reg.Replace(oldString, "$& ")
Debug.Print newString
See the regex demo.
Pattern details
(?!w/d) - the location not followed with w/d
\D - any non-digit char
[:/-] - a :, / or - char.
The $& backreference refers to the whole match from the replacement pattern, no need to enclose the whole pattern with the capturing parentheses.

Here is another solution.
^/(?!ignoreme$)(?!ignoreme2$)[a-z0-9]+$

Related

RegEx array / list / collection of all matches in VBA

I'm trying to use RegEx to get all instances of varying strings that exist in between a particular pair set of strings. E.g. in the following string:
"The Start. Hello. Jamie. Bye. The Middle. Hello. Sarah. Bye. The End"
I want to get a collection / array consisting of "Jamie" and "Sarah" by checking in between "Hello. " and ". Bye. "
My RegEx object is working fine and I feel I'm nearly successful:
Sub Reggie()
Dim x As String: x = "The Start. Hello. Jamie. Bye. The Middle. Hello. Sarah. Bye. The End"
Dim regEx As RegExp
Set regEx = New RegExp
Dim rPat1 As String: rPat1 = "Hello. "
Dim rPat2 As String: rPat2 = " Bye."
Dim rPat3 As String: rPat3 = ".*"
With regEx
.Global = True
.ignorecase = True
.Pattern = "(^.*" & rPat1 & ")(" & rPat3 & ")(" & rPat2 & ".*)"
.MultiLine = True
' COMMAND HERE
End With
End Sub
But the last bit COMMAND HERE I'm trying .replace(x, "$2") which gives me a string of the last instance of a match i.e. Sarah
I've tried .Execute(x) which gives me a MatchCollection object and when browsing the immediate window I see that object only has the last instance of a match.
Is what I'm requiring possible and how?
That is because .* matches as many any chars as possible and you should not match the whole string by adding .* on both ends of your regular expression.
Besides, you need to escape special chars in the regex pattern, here, . is special as it matches any char other than a line break char.
You need to fix your regex declaration like
rPat1 = "Hello\. "
rPat2 = " Bye\."
rPat3 = ".*?"`
.Pattern = rPat1 & "(" & rPat3 & ")" & rPat2
Or, to further enhance the regex, you may
Replace literal spaces with \s* (zero or more whitespaces) or \s+ (one or more whitespaces) to support any whitespace
Match any non-word chars after the captures string with \W+ or \W*.
rPat1 = "Hello\.\s*"
rPat2 = "\W+Bye\."
rPat3 = ".*?"`
.Pattern = rPat1 & "(" & rPat3 & ")" & rPat2
See the regex demo. Details:
Hello\. - Hello. string
\s* - zero or more whitespaces
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
\W+ - one or more chars other than ASCII letters/digits/_
Bye\. - Bye. string.

Match and replace first and last character and substring in string using regex VB NET

Sorry for my simple question but I don't know how to do.
I have this string:
Dim SourceString = "{capital?} has a bridge for {people?}"
Now I want ResultString like this:
ResultString = "capital_Den has a bridge for people_Den"
I used
Dim str As String = "{capital?} has a bridge for {people?}"
Dim str1 As String str1 = Regex.Replace(str, "\{?\?\}", "_DEN}")
Result: {capital_DEN} has a bridge for {people_DEN}
But I want this result: capital_DEN has a bridge for people_DEN
The \{?\?\} pattern matches an optional {, a ? and then a } char.
You may use
str1 = Regex.Replace(str, "\{(\w+)\?\}", "$1_DEN")
Or, if there can be more than just word chars inside:
str1 = Regex.Replace(str, "\{([^{}]+)\?\}", "$1_DEN")
See the VB.NET demo online and the regex demo. The pattern matches:
\{ - a { char
(\w+) - Group 1: one or more word chars
[^{}]+ - 1+ chars other than { and }
\?\} - a ?} substring.
Full VB.NET code snippet:
Dim str As String = "{capital?} has a bridge for {people?}"
Dim str1 As String
str1 = Regex.Replace(str, "\{(\w+)\?\}", "$1_DEN")
Console.WriteLine(str1)
' -> capital_DEN has a bridge for people_DEN
First Make a ConsoleApplication
Module Module1
Sub Main()
Console.Title = "Combine"
Dim a As String = "capital_Den"
Dim b As String = "people_Mar"
Dim ResultString As String = a & " has a bridge for " & b
Console.ForegroundColor = ConsoleColor.Green
Console.WriteLine(ResultString)
Console.ReadKey()
End Sub
End Module

regex with XE currency

guys I'm trying to make my personal app with VB.Net
and all of my code is working fine except one thing, which is the regex
I want to get this value
The Highlighted Value that I need
From this URL
I tried this regex:
("([0-9]+.+[1-9]+ (SAR)+)")
and it's not working very well (only works with some currency but not all).
so guys can you help with the perfect regex ?
***Update:
here is the whole function code:
Private Sub doCalculate()
' Need the scraping
Dim Str As System.IO.Stream
Dim srRead As System.IO.StreamReader
Dim strAmount As String
strAmount = currencyAmount.Text
' Get values from the textboxes
Dim strFrom() As String = Split(currecnyFrom.Text, " - ")
Dim strTo() As String = Split(currecnyTo.Text, " - ")
' Web fetching variables
Dim req As System.Net.WebRequest = System.Net.WebRequest.Create("https://www.xe.com/currencyconverter/convert.cgi?template=pca-new&Amount=" + strAmount + "&From=" + strFrom(1) + "&To=" + strTo(1) + "&image.x=39&image.y=9")
Dim resp As System.Net.WebResponse = req.GetResponse
Str = resp.GetResponseStream
srRead = New System.IO.StreamReader(Str)
' Match the response
Try
Dim myMatches As MatchCollection
Dim myRegExp As New Regex("(\d+\.\d+ SAR)")
myMatches = myRegExp.Matches(srRead.ReadToEnd)
' Search for all the words in the string
Dim sucessfulMatch As Match
For Each sucessfulMatch In myMatches
mainText.Text = sucessfulMatch.Value
Next
Catch ex As Exception
mainText.Text = "Unable to connect to XE"
Finally
' Close the streams
srRead.close()
Str.Close()
End Try
convertToLabel.Text = strAmount + " " + strFrom(0) + " Converts To: "
End Sub
Thanks.
You need to get the currency value that appears first. Thus, you need to replace
myMatches = myRegExp.Matches(srRead.ReadToEnd)
' Search for all the words in the string
Dim sucessfulMatch As Match
For Each sucessfulMatch In myMatches
mainText.Text = sucessfulMatch.Value
Next
with the following lines:
Dim myMatch As Match = myRegExp.Match(srRead.ReadToEnd)
mainText.Text = myMatch.Value
I also recommend using the following regex:
\b\d+\.\d+\p{Zs}+SAR\b
Explanation:
\b - word boundary
\d+ - 1+ digits
\. - a literal dot
\d+ - 1+ digits
\p{Zs}+ - 1 or more horizontal whitespace
SAR\b - whole word SAR.
You should use this regex.
Regex: (\d+\.\d+ SAR)
Explanation:
\d+ looks for multiple digits.
\.\d+ looks for decimal digits.
SAR matches literal string SAR which is your currency unit.
Regex101 Demo
I tried this regex:
("([0-9]+.+[1-9]+ (SAR)+)") and it's not working very well (only works
with some currency but not all).
What you are doing here is matching multiple digits anything multiple digits SAR multiple times.

VBS script to report AD groups - Regex pattern not working with multiple matches

Having an issue with getting a regex statement to accept two expressions.
The "re.pattern" code here works:
If UserChoice = "" Then WScript.Quit 'Detect Cancel
re.Pattern = "[^(a-z)^(0,4,5,6,7,8,9)]"
re.Global = True
re.IgnoreCase = True
if re.test( UserChoice ) then
Exit Do
End if
MsgBox "Please choose either 1, 2 or 3 ", 48, "Invalid Entry"
While the below "regex.pattern " code does not. I want to use it to format the results of a DSQUERY command where groups are collected, but I don't want any of the info after the ",", nor do i want the leading CN= that is normally collected when the following dsquery is run:
"dsquery.exe user forestroot -samid "& strInput &" | dsget user -memberof")
The string I want to format would look something like this before formatting:
CN=APP_GROUP_123,OU=Global Groups,OU=Accounts,DC=corp,DC=contoso,DC=biz
This is the result I want:
APP_GROUP_123
Set regEx = New RegExp
**regEx.Pattern = "[,.*]["CN=]"**
Result = regEx.Replace(StrLine, "")
I'm only able to get the regex to work when used individually, either
regEx.Pattern = ",."
or
regEx.Pattern = "CN="
code is nested here:
Set InputFile = FSO.OpenTextFile("Temp.txt", 1)
Set InputFile = FSO.OpenTextFile("Temp.txt", 1)
set OutPutFile = FSO.OpenTextFile(StrInput & "-Results.txt", 8, True)
do While InputFile.AtEndOfStream = False
StrLine = InputFile.ReadLine
If inStr(strLine, TaskChoice) then
Set regEx = New RegExp
regEx.Pattern = "[A-Za-z]{2}=(.+?),.*"
Result = regEx.Replace(StrLine, "")
OutputFile.write(Replace(Result,"""","")) & vbCrLf
End if
This should get you started:
str = "CN=APP_GROUP_123,OU=Global Groups,OU=Accounts,DC=corp,DC=contoso,DC=biz"
Set re = New RegExp
re.pattern = "[A-Za-z]{2}=(.+?),.*"
if re.Test(str) then
set matches = re.Execute(str)
matched_str = "Matched: " & matches(0).SubMatches(0)
Wscript.echo matched_str
else
Wscript.echo "Not a match"
end if
Output:Matched: APP_GROUP_123
The regex you need is [A-Za-z]{2}=(.+?),.*
If the match is successful, it captures everything in the parenthesis. .+? means it will match any character non-greedily up until the first comma. The ? in .+? makes the expression non-greedy. If you were to omit it, you would capture everything up to the final comma at ,DC=biz
Your regular expression "[,.*]["CN=]" doesn't work for 2 reasons:
It contains an unescaped double quote. Double quotes inside VBScript strings must be escaped by doubling them, otherwise the interpreter would interpret your expression as a string "[,.*][", followed by an (invalid) variablename CN=] (without an operator too) and the beginning of the next string (the 3rd double quote).
You misunderstand regular expression syntax. Square brackets indicate a character class. An expression [,.*] would match any single comma, period or asterisk, not a comma followed by any number of characters.
What you meant to use was an alternation, which is expressed by a pipe symbol (|), and the beginning of a string is matched by a caret (^):
regEx.Pattern = ",.*|^CN="
With that said, in your case a better approach would be using a group and replacing the whole string with just the group match:
regEx.Pattern = "^cn=(.*?),.*"
regEx.IgnoreCase = True
Result = regEx.Replace(strLine, "$1")

VB.Net Regular Expressions - Extracting Wildcard Value

I need help extracting the value of a wildcard from a Regular Expressions match. For example:
Regex: "I like *"
Input: "I like chocolate"
I would like to be able to extract the string "chocolate" from the Regex match (or whatever else is there). If possible, I also want to be able to retrieve several wildcard values from a single wildcard match. For example:
Regex: "I play the * and the *"
Input: "I play the guitar and the bass"
I want to be able to extract both "guitar" and "bass". Is there a way to do it?
In general regex utilize the concepts of groups. Groups are indicated by parenthesis.
So I like
Would be I like (.) . = All character * meaning as many or none of the preceding character
Sub Main()
Dim s As String = "I Like hats"
Dim rxstr As String = "I Like(.*)"
Dim m As Match = Regex.Match(s, rxstr)
Console.WriteLine(m.Groups(1))
End Sub
The above code will work for and string that has I Like and will print out all characters after including the ' ' as . matches even white space.
Your second case is more interesting because the first rx will match the entire end of the string you need something more restrictive.
I Like (\w+) and (\w+) : this will match I Like then a space and one or more word characters and then an and a space and one or more word characters
Sub Main()
Dim s2 As String = "I Like hats and dogs"
Dim rxstr2 As String = "I Like (\w+) and (\w+)"
Dim m As Match = Regex.Match(s2, rxstr2)
Console.WriteLine("{0} : {1}", m.Groups(1), m.Groups(2))
End Sub
For a more complete treatment of regex take a look at this site which has a great tutorial.
Here is my RegexExtract Function in VBA. It will return just the sub match you specify (only the stuff in parenthesis). So in your case, you'd write:
=RegexExtract(A1, "I like (.*)")
Here is the code.
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String) As String
Application.ScreenUpdating = False
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
RegexExtract = allMatches.Item(0).submatches.Item(0)
Application.ScreenUpdating = True
End Function
Here is a version that will allow you to use multiple groups to extract multiple parts at once:
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String) As String
Application.ScreenUpdating = False
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
Dim i As Long
Dim result As String
RE.Pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.Item(0).submatches.count - 1
result = result & allMatches.Item(0).submatches.Item(i)
Next
RegexExtract = result
Application.ScreenUpdating = True
End Function