A newbie to regex, I'm trying to skip the first set of brackets [word1], and match any remaining text bracketed with the open bracket and closing brace [...}
Text: [word1] This is a [word2]bk{not2} sentence [word3]bk{not3}
Pattern: [^\]]\[.*?\}
So what I want is to match [word2]bk{not2} and [word3]bk{not3}, and it works, kind of, but I'm ending up with a leading space on each of the matches. Been playing with this for a couple of days (and doing a lot of reading), but I'm obviously still missing something.
\[[^} ]*}
Try this.See demo .
https://regex101.com/r/qJ8qW5/1
[^]] in your pattern match leading space. That matches any character without ].
For example, when text is [word1] This is a X[word2]bk{not2},
pattern [^\]]\[.*?\} matches X[word2]bk{not2}.
if any open brackets doesn't appear between [wordN} and {notN}, you can use:
\[[^\[}]*}
Or, you can also use Submatches with capturing groups.
Sub test()
Dim objRE As Object
Dim objMatch As Variant
Dim objMatches As Object
Dim strTest As String
strTest = "[word1] This is a [word2]bk{not2} sentence [word3]bk{not3}"
Set objRE = CreateObject("VBScript.RegExp")
With objRE
.Pattern = "[^\]](\[.*?\})"
.Global = True
End With
Set objMatches = objRE.Execute(strTest)
For Each objMatch In objMatches
Debug.Print objMatch.Submatches(0)
Next
Set objMatch = Nothing
Set objMatches = Nothing
Set objRE = Nothing
End Sub
In this sample code, pattern has Parentheses for grouping.
Related
I use this code below to check if the string is match to pattern or not.
Sub chkPattern(str As String, pattern As String)
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
objRegex.pattern = pattern
MsgBox objRegex.test(str)
End Sub
Specifically, I want to check if string match whole string "abc" or "cde" of "xy"
For example, if inputs are "abccde" or "abcxy" or "abccdexyz", I expect it will return false
Some patterns that I have already try like : "abc|cde|xyz" , "\b(abc|cde|xyz)\b)" are not working
Can this be done in VBA by using Regex?
It is possible yes. As I read your question you want to apply the OR with the pipe character.
Sub Test()
Dim arr As Variant: arr = Array("abc", "cde", "xy")
With CreateObject("VBScript.RegExp")
.Pattern = "^(" & Join(arr, "|") & ")$"
Debug.Print .Test("abcd") 'Will return False
Debug.Print .Test("abc") 'Will return True
End With
End Sub
The key to match the whole string here are the start string ancor ^ and the end string ancor $. If you meant you wanted to test for partial match, you have simply reversed the slashes. Use backslashes instead of forward slashes > \b(abc|cde|xyz)\b as a pattern.
Remember, when you want to ignore case comparison, use .IgnoreCase = True.
Alternatively use the build-in Like operator.
To match whole word use
(\w+)
https://regex101.com/r/sve6Tp/1
(\w+) Capturing Group
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible,
\babc\b|\bcde\b|\bxy\b should work for "abc" or "cde" or "xy" but not other variants.
I have a multi-line string variable that holds a large data string. Some of that data is enclosed between square brackets.
Example data variable:
[text 123]
text [text] 234 [blah] blah
some more [text 123]
I need to extract all the data between the square brackets into a query or table, so it would be something like this:
text 123
test
blah
text 123
Here is my VBA code below:
Dim dataString As String
dataString = "test [field 1] mroe text [field 2] etc"
Dim searchStr As String
Dim regExp As Object
Dim colregmatch As MatchCollection
Dim match As Variant
searchStr = dataString
Set regExp = CreateObject("vbscript.regexp")
With regExp
.pattern = "(?<=\[)(.*?)(?=\])"
.IgnoreCase = True
.Global = True
.Multiline = True
End With
Set colregmatch = regExp.Execute(searchStr)
If colregmatch.Count <> 0 Then
For Each match In colregmatch
MsgBox match.Value
Debug.Print match.Value
Next
End If
Set colregmatch = Nothing
Set regExp = Nothing
UPDATE: I get a 5017 run time error when using this pattern. If I use "[([^]]+)]" as the pattern, it works but the brackets are not removed...
The following regex should work:
/(?<=\[).*?(?=\])/gm
See Regex Demo of the regex in action.
Regex Breakdown:
(?<=\[): Positive lookbehind
\[: matches the character [ literally (case sensitive)
.*?: matches any character (except for line terminators) lazily (as few as possible)
(?=\]): Positive lookahead
\]: matches the character ] literally (case sensitive)
gm: global and multi-line modifiers
In this topic, the idea is to take "strip" the numerics, divided by a x through a RegEx. -> How to extract ad sizes from a string with excel regex
Thus from:
uni3uios3_300x250_ASDF.html
I want to achieve through RegEx:
300x250
I have managed to achieve the exact opposite and I am struggling some time to get what needs to be done.
This is what I have until now:
Public Function regExSampler(s As String) As String
Dim regEx As Object
Dim inputMatches As Object
Dim regExString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Pattern = "(([0-9]+)x([0-9]+))"
.IgnoreCase = True
.Global = True
Set inputMatches = .Execute(s)
If regEx.test(s) Then
regExSampler = .Replace(s, vbNullString)
Else
regExSampler = s
End If
End With
End Function
Public Sub TestMe()
Debug.Print regExSampler("uni3uios3_300x250_ASDF.html")
Debug.Print regExSampler("uni3uios3_34300x25_ASDF.html")
Debug.Print regExSampler("uni3uios3_8x4_ASDF.html")
End Sub
If you run TestMe, you would get:
uni3uios3__ASDF.html
uni3uios3__ASDF.html
uni3uios3__ASDF.html
And this is exactly what I want to strip through RegEx.
Change the IF block to
If regEx.test(s) Then
regExSampler = InputMatches(0)
Else
regExSampler = s
End If
And your results will return
300x250
34300x25
8x4
This is because InputMatches holds the results of the RegEx execution, which holds the pattern you were matching against.
As requested by the OP, I'm posting this as an answer:
Solution:
^.*\D(?=\d+x\d+)|\D+$
Demonstration: regex101.com
Explanation:
^.*\D - Here we're matching every character from the start of the string until it reaches a non-digit (\D) character.
(?=\d+x\d+) - This is a positive lookahead. It means that the previous pattern (^.*\D) should only match if followed by the pattern described inside it (\d+x\d+). The lookahead itself doesn't capture any character, so the pattern \d+x\d+ isn't captured by the regex.
\d+x\d+ - This one should be easy to understand because it's equivalent to [0-9]+x[0-9]+. As you see, \d is a token that represents any digit character.
\D+$ - This pattern matches one or more non-digit characters until it reaches the end of the string.
Finally, both patterns are linked by an OR condition (|) so that the whole regex matches one pattern or another.
I need to extract a word from incoming mail's body.
I used a Regex after referring to sites but it is not giving any result nor it is throwing an error.
Example: Description: sample text
I want only the first word after the colon.
Dim reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim EAI As String
Set reg1 = New RegExp
With reg1
.Pattern = "Description\s*[:]+\s*(\w*)\s*"
.Global = False
End With
If reg1.Test(Item.Body) Then
Set M1 = reg1.Execute(Item.Body)
For Each M In M1
EAI = M.SubMatches(1)
Next
End If
Note that your pattern works well though it is better written as:
Description\s*:+\s*(\w+)
And it will match Description, then 0+ whitespaces, 1+ : symbols, again 0 or more whitespaces and then will capture into Group 1 one or more word characters (as letters, digits or _ symbols).
Now, the Capture Group 1 value is stored in M.SubMatches(0). Besides, you need not run .Test() because if there are no matches, you do not need to iterate over them. You actually want to get a single match.
Thus, just use
Set M1 = reg1.Execute(Item.body)
If M1.Count > 0 Then
EAI = M1(0).SubMatches(0)
End If
Where M1(0) is the first match and .SubMatches(0) is the text residing in the first group.
I am busy with a regular expression for VB and I cant seem to find where I am going wrong here.
Example:
Pattern:(?<=\d{10,11})(.|[\r\n])*(?=Mobile)
Input: 6578543567 Text I want to retain Mobile Operation
Output: #Name?
List item
The number consists of 10 and 11 digit telephone numbers.
The text I want to retain varies in length.
The text always precedes the word Mobile.
Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
Dim replaceNumber As Integer
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
With outputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\$(\d+)"
End With
With outReplaceRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Set inputMatches = inputRegexObj.Execute(strInput)
If inputMatches.count = 0 Then
regex = False
Else
Set replaceMatches = outputRegexObj.Execute(outputPattern)
For Each replaceMatch In replaceMatches
replaceNumber = replaceMatch.SubMatches(0)
outReplaceRegexObj.Pattern = "\$" & replaceNumber
If replaceNumber = 0 Then
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
Else
If replaceNumber > inputMatches(0).SubMatches.count Then
'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
regex = CVErr(xlErrValue)
Exit Function
Else
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
End If
End If
Next
regex = outputPattern
End If
End Function
IIRC, VBA doesn't support lookarounds in it's Regular Expression implementation.
But, this appears to be a relatively easy string to match. You have a group of consecutive numbers, followed by a space, and then you want to match an undisclosed amount of words up to the word "Mobile".
You could use the following pattern to accomplish this:
\d+\s(.*?)\sMobile
Details (See it in action here):
\d any digit
+ (Quantifier) One to unlimited times - greedy
\s a single whitespace character
(...) capturing group to grab the text you want to return
. any character
*? (Quantifier) Zero to unlimited times - lazy
\s a single whitespace character
Mobile literally matches the word Mobile
What's with the greedy vs lazy quantifiers?
The first quantifier + is Greedy. What makes this greedy? The lack of the ? immediately following this quantifier makes it greedy. What this essentially does is it will consume as much ass it possibly can of the \d.
Since we added a \s to the end of that statement, this won't really change the outcome because it will have to match all the digits anyway to get to that space \s. However, if you decided you wanted to capture (...) the space and you removed the \s, then this would be important - because your .*? will consume all but one of your numbers \d if this was lazy.
So, then why are we using a lazy quantifier with .*?? Well, if your input string contained two words that said Mobile, a greedy quantifier would consumer the first word and match up up to the second. If you only want to match up to the first word of Mobile, then you want to make it lazy.
So Finally - Now how do I retrieve the text in my capturing group (...)?
With VBA, you would use the Matches object. First I would recommend testing to ensure that there is a match - this can be done in a simple If...Then statement. Once this test passes, you can then safely obtain your return value.
With New RegExp
.Pattern = "\d+\s(.*?)\sMobile"
.IngoreCase = True 'If your 'Mobile' word can be any case, switch to false
If .Test(inputString) Then
retVal = .Execute(inputString)(0).SubMatches(0)
End If
End With
inputString would be the string that contains the test values.
retVal would be what is returned from your capturing group.