MS Access Extract all text between brackets to new query - regex

I have a multi-line string variable that holds a large data string. Some of that data is enclosed between square brackets.
Example data variable:
[text 123]
text [text] 234 [blah] blah
some more [text 123]
I need to extract all the data between the square brackets into a query or table, so it would be something like this:
text 123
test
blah
text 123
Here is my VBA code below:
Dim dataString As String
dataString = "test [field 1] mroe text [field 2] etc"
Dim searchStr As String
Dim regExp As Object
Dim colregmatch As MatchCollection
Dim match As Variant
searchStr = dataString
Set regExp = CreateObject("vbscript.regexp")
With regExp
.pattern = "(?<=\[)(.*?)(?=\])"
.IgnoreCase = True
.Global = True
.Multiline = True
End With
Set colregmatch = regExp.Execute(searchStr)
If colregmatch.Count <> 0 Then
For Each match In colregmatch
MsgBox match.Value
Debug.Print match.Value
Next
End If
Set colregmatch = Nothing
Set regExp = Nothing
UPDATE: I get a 5017 run time error when using this pattern. If I use "[([^]]+)]" as the pattern, it works but the brackets are not removed...

The following regex should work:
/(?<=\[).*?(?=\])/gm
See Regex Demo of the regex in action.
Regex Breakdown:
(?<=\[): Positive lookbehind
\[: matches the character [ literally (case sensitive)
.*?: matches any character (except for line terminators) lazily (as few as possible)
(?=\]): Positive lookahead
\]: matches the character ] literally (case sensitive)
gm: global and multi-line modifiers

Related

Regex pattern to replace date and time node in xml of word document

I need to replace the date and time in xml file using regex pattern.
xml text would contain:
w:date="2022-12-01T01:17:00Z"
w:date="2022-12-01T02:17:00Z"
w:date="2022-12-02T03:17:00Z"
possible regex pattern for the above would be:
w:date="[\d\W]\w[\d\W]\w"
but it is not replacing anything and the resulted string remain intact in the following VBA code:
Sub ChangeDateTime()
Dim sWOOXML As String
Set objRegEx = CreateObject("vbscript.regexp")
objRegEx.Global = True
objRegEx.IgnoreCase = True
objRegEx.MultiLine = True
objRegEx.Pattern = "w:date=" & Chr(34) & "[\d\W]\w[\d\W]\w" & Chr(34)
sWOOXML = ActiveDocument.Content.WordOpenXML
sWOOXML = objRegEx.Replace(sWOOXML, "")
ActiveDocument.Content.InsertXML sWOOXML
Beep
End Sub
Your [\d\W]\w[\d\W]\w regex prevents from matching since it only finds two repetitions of a digit or non-word char + a word char sequence between two double quotes, while you have many more chars there.
You can use
objRegEx.Pattern = "w:date=""\d{4}-\d{1,2}-\d{1,2}T\d{1,2}:\d{1,2}:\d{1,2}Z"""
See the regex demo. Note you may add a double quote to the string using a doubled ", no need to use Chr(34).
This is a verbose pattern where \d{1,2} matches one or two digits and \d{4} matches four digits, the rest is self-explanatory.

VBA check String is matched exactly word

I use this code below to check if the string is match to pattern or not.
Sub chkPattern(str As String, pattern As String)
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
objRegex.pattern = pattern
MsgBox objRegex.test(str)
End Sub
Specifically, I want to check if string match whole string "abc" or "cde" of "xy"
For example, if inputs are "abccde" or "abcxy" or "abccdexyz", I expect it will return false
Some patterns that I have already try like : "abc|cde|xyz" , "\b(abc|cde|xyz)\b)" are not working
Can this be done in VBA by using Regex?
It is possible yes. As I read your question you want to apply the OR with the pipe character.
Sub Test()
Dim arr As Variant: arr = Array("abc", "cde", "xy")
With CreateObject("VBScript.RegExp")
.Pattern = "^(" & Join(arr, "|") & ")$"
Debug.Print .Test("abcd") 'Will return False
Debug.Print .Test("abc") 'Will return True
End With
End Sub
The key to match the whole string here are the start string ancor ^ and the end string ancor $. If you meant you wanted to test for partial match, you have simply reversed the slashes. Use backslashes instead of forward slashes > \b(abc|cde|xyz)\b as a pattern.
Remember, when you want to ignore case comparison, use .IgnoreCase = True.
Alternatively use the build-in Like operator.
To match whole word use
(\w+)
https://regex101.com/r/sve6Tp/1
(\w+) Capturing Group
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible,
\babc\b|\bcde\b|\bxy\b should work for "abc" or "cde" or "xy" but not other variants.

RegEx to extract a word from mail's body

I need to extract a word from incoming mail's body.
I used a Regex after referring to sites but it is not giving any result nor it is throwing an error.
Example: Description: sample text
I want only the first word after the colon.
Dim reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim EAI As String
Set reg1 = New RegExp
With reg1
.Pattern = "Description\s*[:]+\s*(\w*)\s*"
.Global = False
End With
If reg1.Test(Item.Body) Then
Set M1 = reg1.Execute(Item.Body)
For Each M In M1
EAI = M.SubMatches(1)
Next
End If
Note that your pattern works well though it is better written as:
Description\s*:+\s*(\w+)
And it will match Description, then 0+ whitespaces, 1+ : symbols, again 0 or more whitespaces and then will capture into Group 1 one or more word characters (as letters, digits or _ symbols).
Now, the Capture Group 1 value is stored in M.SubMatches(0). Besides, you need not run .Test() because if there are no matches, you do not need to iterate over them. You actually want to get a single match.
Thus, just use
Set M1 = reg1.Execute(Item.body)
If M1.Count > 0 Then
EAI = M1(0).SubMatches(0)
End If
Where M1(0) is the first match and .SubMatches(0) is the text residing in the first group.

UDF Regex - yyyy only

I am just learning some regex, and I need help spitting out matches generated by my regex code. I found some very useful resources here to output anything not matched, but I want to output only the parts of a cell that do match. I am looking for dates in cells, that may be a single yyyy date or yyyy-yy, or the like (as shown from the sample data below).
Sample data:
1951/52
1909-13
2005-2014
7 . (1989)-
1 (1933/34)-2 (1935/36)
1979-2012/2013
Current Function Code: (A snippet found from an existing post here, but returns the replacement value instead of what was matched)
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "([12][0-9]{3}[/][0-9]{2,4})|([12][0-9]{3}[-][0-9]{2,4})|([12][0-9]{3})"
You may use
\b[12][0-9]{3}(?:[,/-][0-9]{2,4})*\b
See the regex demo
Note that \b might be removed if you are not interested in a whole word search.
Pattern details:
\b - leading word boundary (the preceding char must be either a non-word char or the start of string)
[12][0-9]{3} - 1 or 2 followed with any 3 digits
(?:[,/-][0-9]{2,4})* - zero or more sequences ((?:...)*) of:
[,/-] - a ,, / or - characters
[0-9]{2,4} - any 2 to 4 digits
\b - trailing word boundary (there must be a non-word char or the end of string after).
Sample VBA code to grab all those values using RegExp#Execute:
Sub FetchDateLikeStrs()
Dim cellContents As String
Dim reg As regexp
Dim mc As MatchCollection
Dim m As match
Set reg = New regexp
reg.pattern = "\b[12][0-9]{3}(?:[,/-][0-9]{2,4})*\b"
reg.Global = True
cellContents = "1951/52 1909-13 2005-2014 7 . (1989)- 1 (1933/34)-2 (1935/36) 1979-2012/2013 1951,52"
Set mc = reg.Execute(cellContents)
For Each m In mc
Debug.Print m.Value
Next
End Sub

leading space with regex match

A newbie to regex, I'm trying to skip the first set of brackets [word1], and match any remaining text bracketed with the open bracket and closing brace [...}
Text: [word1] This is a [word2]bk{not2} sentence [word3]bk{not3}
Pattern: [^\]]\[.*?\}
So what I want is to match [word2]bk{not2} and [word3]bk{not3}, and it works, kind of, but I'm ending up with a leading space on each of the matches. Been playing with this for a couple of days (and doing a lot of reading), but I'm obviously still missing something.
\[[^} ]*}
Try this.See demo .
https://regex101.com/r/qJ8qW5/1
[^]] in your pattern match leading space. That matches any character without ].
For example, when text is [word1] This is a X[word2]bk{not2},
pattern [^\]]\[.*?\} matches X[word2]bk{not2}.
if any open brackets doesn't appear between [wordN} and {notN}, you can use:
\[[^\[}]*}
Or, you can also use Submatches with capturing groups.
Sub test()
Dim objRE As Object
Dim objMatch As Variant
Dim objMatches As Object
Dim strTest As String
strTest = "[word1] This is a [word2]bk{not2} sentence [word3]bk{not3}"
Set objRE = CreateObject("VBScript.RegExp")
With objRE
.Pattern = "[^\]](\[.*?\})"
.Global = True
End With
Set objMatches = objRE.Execute(strTest)
For Each objMatch In objMatches
Debug.Print objMatch.Submatches(0)
Next
Set objMatch = Nothing
Set objMatches = Nothing
Set objRE = Nothing
End Sub
In this sample code, pattern has Parentheses for grouping.