I have a REGEX pattern I inherited that seems to work in online testers (demo here: https://regex101.com/r/KPpoLS/1)
However, it uses negative lookbehinds to avoid grabbing sub-stings of longer serial numbers. I need to use this REGEX pattern in a VBA macro to identify all the serials number patterns (listed below) - but VBA does not support negative lookbehinds from what I can tell.
Unfortunately, I'm not well versed enough in REGEX to re-engineer this REGEX pattern - can anyone offer any suggestions?
\b(?<!-)(\d\/[A-Z]{2}\/\d{6}-\d{2})|([A-Z]-\d\/[A-Z]{2}\/\d{6}-\d{2})|(?<![A-Z])([A-Z]\/[A-Z]{2}\/\d{6}-\d{2})|([A-Z]{2}\/[A-Z]{2}\/\d{6}-\d{2})|([A-Z][0-9]\/[A-Z]{2}\/\d{6}-\d{2})|(?<![A-Z])([A-Z]-[A-Z]{3}\/\d{6}-\d{2})|([0-9]\/[A-Z]{2}\/[A-Z]{3}\/\d{6}-\d{2})|([A-Z]\/[A-Z]{2}\/[A-Z]{3}\/\d{6}-\d{2})|([0-9]\/[A-Z]{2}\/[A-Z]{3}\/\d{4}-\d{2})|([0-9]\/[A-Z]{2}\/[A-Z]{3}\/\d{6}-\d{2})
[list arranged by sub-strings]
1/AA/111111-11
A-1/AA/111111-11
A/AA/111111-11
AA/AA/111111-11
A1/AA/111111-11
A-AAA/111111-11
1/AA/AAA/111111-11
A/AA/AAA/111111-11
1/AA/AAA/1111-11
[orginal list]
1/AA/111111-11
A/AA/111111-11
A1/AA/111111-11
AA/AA/111111-11
A-AAA/111111-11
1/AA/AAA/1111-11
A-1/AA/111111-11
1/AA/AAA/111111-11
A/AA/AAA/111111-11
Sub regex_test_by_word_story_ranges()
Dim stringone As String
Dim regexone As Object
Dim doc As Word.Document
Dim rng As Word.Range
Dim para As Word.Paragraph
Dim i as Long
Dim x As Long
Dim regcount As Long
Dim rngstory As Range
Dim serialArray() as String
Set regexone = New RegExp
Set doc = ActiveDocument
'=========================================
'Loop #1 to find Category 1 serial numbers
'=========================================
regexone.Pattern = ""
regexone.Global = True
regexone.Pattern = IgnoreCase
For Each rngstory in doc.StoryRanges
On Error Resume Next
rngstory.Select
stringone = stringone & rngstory.Text
Next rngstory
Set theMatches = regexone.Execute(stringone)
regcount = theMatches.Count
debug.print regcount
With theMatches
If .Count > 0 Then
ReDim Preserve serialArray(.Count, 5)
x = 1
For Each Match In theMatches
debug.print Match.value
serialArray(x, 1) = Match.value 'this will become a seach term for another macro
serialArray(x, 2) = Replace(Match.value, "/", "!") 'this will becom part of a URL
serialArray(x, 3) = "www.baseURL.com/" & Replace(Match.value, "/", "!") 'This is a base URL, which (X, 2) on the end. Search term from next macro will find and insert this hyperlink
serialArray(x, 4) = "Placeholder3" 'extra, will delete
serialArray(x, 5) = "Placeholder4" 'extra, will delete
x = x + 1
Next Match
End If
End With
'checking output of array:
For x = LBound(serialArray) To UBound(serialArray)
debug.print serialArray(x, 1) & ", " & serialArray(x, 2) & ", " & serialArray(x, 3) & ", " & serialArray(x, 4) & ", " & serialArray(x, 5)
Next x
'=========================================
'Loop #2 to find Category 2 serial numbers
'=========================================
'Same loop as above code, but I have not developed the REGEX for Category 2 serial numbers yet
'Loop #2 need add Matches from Loop #2 REGEX to the serialArray()
'This portion is beyond my original question, but would welcome any help on adding subsequent loop matches to the serialArray()
'https://stackoverflow.com/questions/60831517/alternative-to-negative-lookbehinds-in-vba-macro?noredirect=1#comment107630601_60831517
End Sub
Related
Im trying to look up a string which contains wildcards. I need to find where in a specific row the string occurs. The string all take form of "IP##W## XX" where XX are the 2 letters by which I look up the value and the ## are the number wildcards that can be any random number. Hence this is what my look up string looks like :
FullLookUpString = "IP##W## " & LookUpString
I tried using the Find Command to find the column where this first occurs but I keep on getting with errors. Here's what I had so far but it doesn't work :L if anyone has an easy way of doing. Quite new to VBA -.-
Dim GatewayColumn As Variant
Dim GatewayDateColumn As Variant
Dim FirstLookUpRange As Range
Dim SecondLookUpRange As Range
FullLookUpString = "IP##W## " & LookUpString
Set FirstLookUpRange = wsMPNT.Range(wsMPNT.Cells(3, 26), wsMPNT.Cells(3, lcolumnMPNT))
Debug.Print FullLookUpString
GatewayColumn = FirstLookUpRange.Find(What:=FullLookUpString, After:=Range("O3")).Column
Debug.Print GatewayColumn
Per the comment by #SJR you can do this two ways. Using LIKE the pattern is:
IP##W## [A-Z][A-Z]
Using regular expressions, the pattern is:
IP\d{2}W\d{2} [A-Z]{2}
Example code:
Option Explicit
Sub FindString()
Dim ws As Worksheet
Dim rngData As Range
Dim rngCell As Range
Set ws = ThisWorkbook.Worksheets("Sheet1") '<-- set your sheet
Set rngData = ws.Range("A1:A4")
' with LIKE operator
For Each rngCell In rngData
If rngCell.Value Like "IP##W## [A-Z][A-Z]" Then
Debug.Print rngCell.Address
End If
Next rngCell
' with regular expression
Dim objRegex As Object
Dim objMatch As Object
Set objRegex = CreateObject("VBScript.RegExp")
objRegex.Pattern = "IP\d{2}W\d{2} [A-Z]{2}"
For Each rngCell In rngData
If objRegex.Test(rngCell.Value) Then
Debug.Print rngCell.Address
End If
Next rngCell
End Sub
If we can assume that ALL the strings in the row match the given pattern, then we can examine only the last three characters:
Sub FindAA()
Dim rng As Range, r As Range, Gold As String
Set rng = Range(Range("A1"), Cells(1, Columns.Count))
Gold = " AA"
For Each r In rng
If Right(r.Value, 3) = Gold Then
MsgBox r.Address(0, 0)
Exit Sub
End If
Next r
End Sub
Try this:
If FullLookUpString Like "*IP##W##[a-zA-Z][a-zA-Z]*" Then
MsgBox "Match is found"
End If
It will find your pattern (pattern can be surrounded by any characters - that's allowed by *).
I try to highlight a word found by RegEx, and if the right to replace it with its corresponding substitute.
The code works correctly only if NOT substituted.
Probably should every time rearrange???
Sub Replace()
Dim regExp As Object
Set regExp = CreateObject("vbscript.regexp")
Dim arr As Variant
Dim arrzam As Variant
Dim i As Long
Dim choice As Integer
Dim Document As Word.Range
Set Document = ActiveDocument.Content
On Error Resume Next
'EGN
'IBAN
arr = VBA.Array("((EGN(:{0,1})){0,1})[0-9]{10}", _
"[a-zA-Z]{2}[0-9]{2}[a-zA-Z0-9]{4}[0-9]{7}([a-zA-Z0-9]?){0,16}")
arrzam = VBA.Array("[****]", _
"[IBAN]")
With regExp
For i = 0 To UBound(arr)
.Pattern = arr(i)
.Global = True
For Each Match In regExp.Execute(Document)
ActiveDocument.Range(Match.FirstIndex, Match.FirstIndex + Match.Length).Duplicate.Select
choice = MsgBox("Replace " & Chr(34) & Match.Value & Chr(34) & " with " & Chr(34) & arrzam(i) & Chr(34) & "?", _
vbYesNoCancel + vbDefaultButton1, "Replace")
If choice = vbYes Then
Document = .Replace(Document, arrzam(i))
ElseIf choice = vbCancel Then
Next
End If
Next
Next
End With
End Sub
Actually, there are several things wrong with this.
First, the each Match in Each Match is static, determined at the moment of the first loop. You're changing the document in the meantime, so each successive Match looks at an old position.
Second, you're replacing all the occurrences at one time, so there is no need to loop through them. It seems a one line, one time Replace could do the same thing.
I have a cell like this :
1 parent
1 child
I am getting the value in vba :
Dim nbChild As String
Dim nbParent As String
Sheets("Feuil1").Cells(C.Row - 1, C.Column).Value
However, I would like to put the number of parent and child in 2 separates variables nbParent and nbChild, so I was thinking to use regex to capture the groups (digit number before parent and digit number before child).
But I don't know how to do it. Thanks in advance for your help
Dim arr, parent, child
arr = Split(ActiveCell.Value, Chr(10))'split on hard return
parent=arr(0)
child=arr(1)
'then split each line on space....
debug.print Split(parent," ")(0) 'number
debug.print Split(parent," ")(1) 'text
debug.print Split(child," ")(0) 'number
debug.print Split(child," ")(1) 'text
With data like:
click on a cell and run:
Sub Family()
ary = Split(ActiveCell.Value, " ")
nParent = CLng(ary(0))
nChild = CLng(ary(2))
MsgBox nParent & vbCrLf & nChild
End Sub
(regex is not necessary)
I agree with #Gary's Student that you don't seem to need Regex but if you insist on it, I think the code below works. You'll need to add in reference to Microsoft VBScript Regular Expressions 5.5 from Tools -> References.
Sub main()
Dim value As String
Dim re As VBScript_RegExp_55.RegExp
Dim matches As VBScript_RegExp_55.MatchCollection
Dim match As VBScript_RegExp_55.match
value = Range("A1").value
Set re = New VBScript_RegExp_55.RegExp
re.Pattern = "\d+"
re.Global = True
Set matches = re.Execute(value)
For Each match In matches
Debug.Print match.value
Next
End Sub
Edit:
Since my string became more and more complicated looks like regexp is the only way.
I do not have a lot experience in that and your help is much appreciated.
Basically from what I read on the web I construct the following exp to try matching occurrence in my sample string:
"My very long long string 12Mar2012 is right here 23Apr2015"
[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]
and trying this code. I do not have any match. Any good link on regexp tutorial much appreciated.
Dim re, match, RegExDate
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(^[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]$)"
re.Global = True
For Each match In re.Execute(str)
MsgBox match.Value
RegExDate = match.Value
Exit For
Next
Thank you
This code validates the actual date from the Regexp using DateValuefor robustness
Sub Robust()
Dim Regex As Object
Dim RegexMC As Object
Dim RegexM As Object
Dim strIn As String
Dim BDate As Boolean
strIn = "My very long long string 12Mar2012 is right here 23Apr2015 and 30Feb2002"
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Pattern = "(([0-9])|([0-2][0-9])|([3][0-1]))(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})"
.Global = True
If .test(strIn) Then
Set RegexMC = .Execute(strIn)
On Error Resume Next
For Each RegexM In RegexMC
BDate = False
BDate = IsDate(DateValue(RegexM.submatches(0) & " " & RegexM.submatches(4) & " " & RegexM.submatches(5)))
If BDate Then Debug.Print RegexM
Next
On Error GoTo 0
End If
End With
End Sub
thanks for all your help !!!
I managed to solve my problem using this simple code.
Dim rex As New RegExp
Dim dateCol As New Collection
rex.Pattern = "(\d|\d\d)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})?"
rex.Global = True
For Each match In rex.Execute(sStream)
dateCol.Add match.Value
Next
Just note that on my side I'm sure that I got valid date in the string so the reg expression is easy.
thnx
Ilya
The following is a quick attempt I made. It's far from perfect.
Basically, it splits the string into words. While looping through the words it cuts off any punctuation (period and comma, you might need to add more).
When processing an item, we try to remove each month name from it. If the string gets shorter we might have a date.
It checks to see if the length of the final string is about right (5 or 6 characters, 1 or 2 + 4 for day and year)
You could instead (or also) check to see that there all numbers.
Private Const MonthList = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
Public Function getDates(ByVal Target As String) As String
Dim Data() As String
Dim Item As String
Dim Index As Integer
Dim List() As String
Dim Index2 As Integer
Dim Test As String
Dim Result As String
List = Split(MonthList, ",")
Data = Split(Target, " ")
Result = ""
For Index = LBound(Data) To UBound(Data)
Item = UCase(Replace(Replace(Data(Index), ".", ""), ",", ""))
For Index2 = LBound(Data) To UBound(Data)
Test = Replace(Item, List(Index2), "")
If Not Test = Item Then
If Len(Test) = 5 Or Len(Test) = 6 Then
If Result = "" Then
Result = Item
Else
Result = Result & ", " & Item
End If
End If
End If
Next Index2
Next
getDates = Result
End Function
I am using `VBscript.RegExp`` to find and replace using a regular expression. I'm trying to do something like this:
Dim regEx
Set regEx = CreateObject("VBScript.RegExp")
regEx.Pattern = "ID_(\d{3})"
regEx.IgnoreCase = False
regEx.Global = True
regEx.Replace(a_cell.Value, "=HYPERLINK(A" & CStr(CInt("$1") + 2) )
I.e. I have cells which contain things like ID_006 and I want to replace the contents of such a cell with a hyperlink to cell A8. So I match the three digits, and then want to add 2 to those digits to get the correct row to hyperlink to.
But the CStr(CInt("$1") + 2) part doesn't work. Any suggestions on how I can make it work?
Ive posted given these points
you should test for a valid match before trying a replace
from your current code the Global is redundant as you can add 1 hyerplink (1 match) to a cell
your current code will accept a partial string match, if you wanted to avoid ID_9999 then you match the entire string using ^ and $. This version runs me, you can revert to your current pattern with .Pattern = "ID_(\d{3})"
Normally when adding a hyperlink a visible address is needed. The code beloe does this (with the row manipulation in one shot)
The code below runs at A1:A10 (sample shown dumping to B1:B10 for pre and post coede)
Sub ParseIt()
Dim rng1 As Range
Dim rng2 As Range
Dim regEx
Set rng1 = Range([a1], [a10])
Set regEx = CreateObject("VBScript.RegExp")
With regEx
'match entire string
.Pattern = "^ID_(\d{3})$"
'match anywhere
' .Pattern = "ID_(\d{3})"
.IgnoreCase = False
For Each rng2 In rng1
If .test(rng2.Value) Then
'use Anchor:=rng2.Offset(0, 1) to dump one column to the right)
ActiveSheet.Hyperlinks.Add Anchor:=rng2, Address:="", SubAddress:= _
Cells(.Replace(rng2.Value, "$1") + 2, rng2.Column).Address, TextToDisplay:=Cells(.Replace(rng2.Value, "$1") + 2, rng2.Column).Address
End If
Next
End With
End Sub
This is because: "=HYPERLINK(A" & CStr(CInt("$1") + 2) is evaluated once, when the code is executed, not once for every match.
You need to capture & process the match like this;
a_cell_Value = "*ID_006*"
Set matches = regEx.Execute(a_cell_Value)
Debug.Print "=HYPERLINK(A" & CLng(matches(0).SubMatches(0)) + 2 & ")"
>> =HYPERLINK(A8)
Or if they are all in ??_NUM format;
a_cell_Value = "ID_11"
?"=HYPERLINK(A" & (2 + val(mid$(a_cell_Value, instr(a_cell_Value,"_") +1))) & ")"
=HYPERLINK(A13)
The line -
regEx.Replace(a_cell.Value, "=HYPERLINK(A" & CStr(CInt("$1") + 2) )
won't work as VBA will try to do a CInt on the literal string "$1" rather than on the match from your RegEx.
It would work if you did your replace in 2 steps, something like this -
Dim a_cell
a_cell = Sheets(1).Cells(1, 1)
Dim regEx
Set regEx = CreateObject("VBScript.RegExp")
regEx.Pattern = "ID_(\d{3})"
regEx.IgnoreCase = False
regEx.Global = True
a_cell = regEx.Replace(a_cell, "$1")
Sheets(1).Cells(1, 1) = "=HYPERLINK(A" & CStr(CInt(a_cell) + 2) & ")"