in the following examples I need to get the words on either side on the period
I am using this regex
Dim myRegex As New Regex("[^\w]+")
Dim mymatch As String() = myRegex.Split(currentField)
where as currentfield = one of the following 3 samples
Contacts.Address2 as `Contact Address2`
Contacts.ContactID
CONCAT(Contacts.FirstName;;' ';;Contacts.LastName) as `Contact`
returns are as follows.
1-- Contacts, Address2, as, Contact and Address2 do not want the word as.
2-- Contacts and ContactID this is ok.
3-- CONCAT,Contacts,FirstName,Contacts,LastName,as and Contact.
3rd one this is too much do not want CONCAT,as or Contact. I only want the four words (ones before and after the period) to be returned Contacts, Firstname, Contacts, and Lastname
how can I write the regex to only get words before and after the period
I would consider matching vs. splitting the input:
For Each m As Match In Regex.Matches(input, "(\w+)\.(\w+)")
Console.WriteLine(
String.Join(", ",
m.Groups(1).Value,
m.Groups(2).Value
))
Next
This is an example, It's not clear what you expect to do with the returned results.
Ideone Demo
I think you are looking to only split inside round brackets and you are not interested in the word as. Thus, I suggest a 2 step approach:
Get the substring(s) in round brackets (\([^()]+\) regex)
If there are such substrings, split them with the split regex, if not, split the original string with split regex (\W+|\s*\bas\b\s* regex).
Sample code:
'Dim currentField As String = "Contacts.Address2 as `Contact Address2`"
Dim currentField As String = "CONCAT(Contacts.FirstName;;' ';;Contacts.LastName) as `Contact`"
'Dim currentField As String = "Contacts.ContactID"
Dim myRegex As New Regex("\([^()]+\)")
Dim splitRegex As New Regex("\W+|\s*\bas\b\s*")
Dim mymatch As MatchCollection = myRegex.Matches(currentField)
If mymatch.Count > 0 Then
For Each match As Match In mymatch
Dim mysubstrs As String() = splitRegex.Split(match.Value)
For Each substr As String In mysubstrs
If String.IsNullOrEmpty(substr) = False Then
Console.WriteLine(substr)
End If
Next
Next
Else
Dim mysubstrs As String() = splitRegex.Split(currentField)
For Each substr As String In mysubstrs
If String.IsNullOrEmpty(substr) = False Then
Console.WriteLine(substr)
End If
Next
End If
here is the final working routine, based on the accepted answer above
Public Sub Load_Field_List(FieldSTR As String, FieldType As String)
Dim t As New FileIO.TextFieldParser(New System.IO.StringReader(FieldSTR))
t.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
t.Delimiters = New String() {","}
Dim currentRow As String()
Dim dr As DataRow
Dim ColListSTR As String = loadeddataview.Tables(0).Rows(0).Item("ColumnList")
Dim ColListSTRArr As String() = ColListSTR.Split(",")
While Not t.EndOfData
Try
currentRow = t.ReadFields()
Dim currentField As String 'field string
For Each currentField In currentRow
Dim startName As Integer
Dim endName As Integer
Dim name As String
dr = fieldDT.NewRow
Dim isValid As Boolean = False
If currentField = "" Then 'make sure current field has data
isValid = False
ElseIf (Regex.IsMatch(currentField, "(\w+)\.(\w+)")) = True Then 'make sure current field has xxxx.yyyy pattern
Dim m As Match = Regex.Match(currentField, "(\w+)\.(\w+)") 'sets m to the first xxxx.yyyy pattern
dr("Table") = m.Groups(1).Value 'sets table column to table name xxxx
dr("Column Name") = "`" & m.Groups(2).Value & "`" 'sets column name to column yyyy enclosed in ` `
If ColListSTRArr.Contains(m.Groups(2).Value) Then 'checks columnlist str to see if column visible
dr("Show") = "True"
Else
dr("Show") = "False"
End If
' this section overrides column name if it was set using AS `zzzzz` statement
startName = currentField.IndexOf("`")
endName = currentField.IndexOf("`", If(startName > 0, startName + 1, 0))
If (endName > startName) Then
Dim mylength As Integer = currentField.Length
name = currentField.Substring(startName, endName - startName + 1)
dr("Column Name") = name 'set override columname
dr("Field") = currentField.Substring(0, startName - 4) 'sets field minus the " as 'ZZZZZ" above
If ColListSTRArr.Contains(currentField.Substring(startName + 1, endName - startName - 1)) Then 'dup may be able to remove
dr("Show") = "True"
Else
dr("Show") = "False"
End If
Else
dr("Field") = currentField 'sets field if there was no " as `ZZZZZZ`" in string
End If
If FieldType = "Field" Then 'sets the column linking field
dr("Linking") = "No Linking"
Else
dr("Linking") = FieldType
End If
End If
' commit changes
fieldDT.Rows.Add(dr)
fieldDT.AcceptChanges()
DataGridView3.DataSource = fieldDT
DataGridView3.ClearSelection()
Next
Catch ex As Microsoft.VisualBasic.
FileIO.MalformedLineException
MsgBox("Line " & ex.Message &
"is not valid and will be skipped.")
End Try
End While
End Sub
Related
First of all, sorry for the long title. I just don't know how to put it succinctly. I am trying to do this in VBA as normal Excel will not cut it.
Basically, I have a column. Each cells may contain data in the format of something like
flat 10-14;Flat 18-19;unit 7-9;flat A-D;ABC;DEF;
What I need is to find the string that has "-" in it, and attempt to replace it with anything in between. so the above code will become
Flat 10, Flat 11; Flat 12, Flat 14;Flat 18, Flat 19;Unit 7, Unit 8, Unit 9;Flat A, Flat B, Flat C; ABC;DEF;
With the help of this article on RegExpression, I have managed to work out how to expand the bits of data with number, which I will post the code below. However, I don't know a good way to expand the data with the letter. i.e from Flat A-C to Flat A, Flat B, Flat C
My code is below, please feel free to give any pointers if you think it can be more efficient. I am very much an amateur at this. Thank you in advance.
Sub CallRegEx()
Dim r As Match
Dim mcolResults As MatchCollection
Dim strInput As String, strPattern As String
Dim test As String, StrOutput As String, prefix As String
Dim startno As Long, endno As Long
Dim myrange As Range
strPattern = "(Flat|Unit) [0-9]+-+[0-9]+"
With Worksheets("Sheet1")
lrow = .Cells(Rows.Count, 9).End(xlUp).Row
For Each x In .Range("A2:A" & lrow)
strInput = Range("A" & x.Row).Value
Set mcolResults = RegEx(strInput, strPattern, True, , True)
If Not mcolResults Is Nothing Then
StrOutput = strInput
For Each r In mcolResults
startno = Mid(r, (InStr(r, "-") - 2), 2)
endno = Mid(r, (InStr(r, "-") + 1))
prefix = Mid(r, 1, 4)
test = ""
For i = startno To endno - 1
test = test & prefix & " " & i & ","
Next i
test = test & prefix & " " & endno
'this is because I don't want the comma at the end of the last value
StrOutput = Replace(StrOutput, r, test)
Debug.Print r ' remove in production
Next r
End If
.Range("D" & x.Row).Value = StrOutput
Next x
End With
End Sub
This function below is to support the Sub above
Function RegEx(strInput As String, strPattern As String, _
Optional GlobalSearch As Boolean, Optional MultiLine As Boolean, _
Optional IgnoreCase As Boolean) As MatchCollection
Dim mcolResults As MatchCollection
Dim objRegEx As New RegExp
If strPattern <> vbNullString Then
With objRegEx
.Global = GlobalSearch
.MultiLine = MultiLine
.IgnoreCase = IgnoreCase
.Pattern = strPattern
End With
If objRegEx.test(strInput) Then
Set mcolResults = objRegEx.Execute(strInput)
Set RegEx = mcolResults
End If
End If
End Function
Letters have character codes that are ordinal (A < B < C ...) & these can be accessed via asc()/chr$() - here is one way to do it:
inputStr = "flat 10-14;Flat 18-19;unit 7-9;flat A-D;ABC;DEF;flat 6;flat T"
Dim re As RegExp: Set re = New RegExp
re.Pattern = "(flat|unit)\s+((\d+)-(\d+)|([A-Z])-([A-Z]))"
re.Global = True
re.IgnoreCase = True
Dim m As MatchCollection
Dim start As Variant, fin As Variant
Dim tokens() As String
Dim i As Long, j As Long
Dim isDigit As Boolean
tokens = Split(inputStr, ";")
For i = 0 To UBound(tokens) '// loop over tokens
Set m = re.Execute(tokens(i))
If (m.Count) Then
With m.Item(0)
start = .SubMatches(2) '// first match number/letter
isDigit = Not IsEmpty(start) '// is letter or number?
If (isDigit) Then '// number
fin = .SubMatches(3)
Else '// letter captured as char code
start = Asc(.SubMatches(4))
fin = Asc(.SubMatches(5))
End If
tokens(i) = ""
'// loop over items
For j = start To fin
tokens(i) = tokens(i) & .SubMatches(0) & " " & IIf(isDigit, j, Chr$(j)) & ";"
Next
End With
ElseIf i <> UBound(tokens) Then tokens(i) = tokens(i) & ";"
End If
Next
Debug.Print Join(tokens, "")
flat 10;flat 11;flat 12;flat 13;flat 14;Flat 18;Flat 19;unit 7;unit 8;unit 9;flat A;flat B;flat C;flat D;ABC;DEF;flat 6;flat T
I have a macro that send an XMLHTTP request to a server and it gets as response a plain text string, not a JSON format string or other standard formats (at least for what I know).
I would like to parse the output string in order to access the data in an structured approach in the same fashion as the parseJson subroutine in this link
My problem is I am not good with regular expressions and I am not able to modify the routine for my needs.
The string that I need to parse has the following structure:
The string is a single line
Each single parameter is defined by its parameter name the equal simbol, its value and ending with; "NID=3;" or "SID=Test;"
Parameter can be collected in "structures" starts and end with the symbol | and they are identified with their name followed by ; such as |STEST;NID=3;SID=Test;|
A structure can contain also other structures
An example of a output string is the following
|KC;|AD;PE=5;PF=3;|CD;PE=5;HP=test;|CD;PE=3;HP=abc;|
In this case there is a macro structure KC which contains a structure AD. The structure AD is composed by the parameters PE, PF and 2 structures CD. And finaly the structures CD have the parameters PE and HP
So I would like to parse the string to obtain an Object/Dictionary that reflects this structure, can you help me?
Adds after the first answers
Hi all, thank you for your help, but I think I should make more clear the output that I would like to get.
For the example string that I have, I would like to have an object with the following structure:
<KC>
<AD>
<PE>5</PE>
<PF>3</PF>
<CD>
<PE>5</PE>
<HP>test</HP>
</CD>
<CD>
<PE>3</PE>
<HP>abc</HP>
</CD>
</AD>
</KC>
So I started to wrote a possible working code base on some hint from #Nvj answer and the answer in this link
Option Explicit
Option Base 1
Sub Test()
Dim strContent As String
Dim strState As String
Dim varOutput As Variant
strContent = "|KC;|AD;PE=5;PF=3;|CD;PE=5;HP=test;|CD;PE=3;HP=abc;|"
Call ParseString(strContent, varOutput, strState)
End Sub
Sub ParseString(ByVal strContent As String, varOutput As Variant, strState As String)
' strContent - source string
' varOutput - created object or array to be returned as result
' strState - Object|Array|Error depending on processing to be returned as state
Dim objTokens As Object
Dim lngTokenId As Long
Dim objRegEx As Object
Dim bMatched As Boolean
Set objTokens = CreateObject("Scripting.Dictionary")
lngTokenId = 0
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = "\|[A-Z]{2};" 'Pattern for the name of structures
Tokenize objTokens, objRegEx, strContent, lngTokenId, bMatched, "str"
.Pattern = "[A-Z]{2}=[^\|=;]+;" 'Pattern for parameters name and values
Tokenize objTokens, objRegEx, strContent, lngTokenId, bMatched, "par"
End With
End Sub
Sub Tokenize(objTokens, objRegEx, strContent, lngTokenId, bMatched, strType)
Dim strKey As String
Dim strKeyPar As String
Dim strKeyVal As String
Dim strWork As String
Dim strPar As String
Dim strVal As String
Dim strLevel As String
Dim strRes As String
Dim lngCopyIndex As Long
Dim objMatch As Object
strRes = ""
lngCopyIndex = 1
With objRegEx
For Each objMatch In .Execute(strContent)
If strType = "str" Then
bMatched = True
With objMatch
strWork = Replace(.Value, "|", "")
strWork = Replace(strWork, ";", "")
strLevel = get_Level(strWork)
strKey = "<" & lngTokenId & strLevel & strType & ">"
objTokens(strKey) = strWork
strRes = strRes & Mid(strContent, lngCopyIndex, .FirstIndex - lngCopyIndex + 1) & strKey
lngCopyIndex = .FirstIndex + .Length + 1
End With
lngTokenId = lngTokenId + 1
ElseIf strType = "par" Then
strKeyPar = "<" & lngTokenId & "par>"
strKeyVal = "<" & lngTokenId & "val>"
strKey = strKeyPar & strKeyVal
bMatched = True
With objMatch
strWork = Replace(.Value, ";", "")
strPar = Split(strWork, "=")(0)
strVal = Split(strWork, "=")(1)
objTokens(strKeyPar) = strPar
objTokens(strKeyVal) = strVal
strRes = strRes & Mid(strContent, lngCopyIndex, .FirstIndex - lngCopyIndex + 1) & strKey
lngCopyIndex = .FirstIndex + .Length + 1
End With
lngTokenId = lngTokenId + 2
End If
Next
strContent = strRes & Mid(strContent, lngCopyIndex, Len(strContent) - lngCopyIndex + 1)
End With
End Sub
Function get_Level(strInput As String) As String
Select Case strInput
Case "KC"
get_Level = "L1"
Case "AD"
get_Level = "L2"
Case "CD"
get_Level = "L3"
Case Else
MsgBox ("Error")
End
End Select
End Function
This function creates a dictionary with an item for each structure name, parameter name and parameter value as shown in the figure
Thanks to the function get_Level the items associated to structures have a level that should help to preserve the original hierarchy of the data.
So what I am missing is a function to create an object that has the original structure of the input string. This is what the Retrieve function do in this answer link, but I do not know how to adapt it to my case
This looks like a simple nested delimited string. A couple of Split() functions will do the trick:
Option Explicit
Function parseString(str As String) As Collection
Dim a1() As String, i1 As Long, c1 As Collection
Dim a2() As String, i2 As Long, c2 As Collection
Dim a3() As String
a1 = Split(str, "|")
Set c1 = New Collection
For i1 = LBound(a1) To UBound(a1)
If a1(i1) <> "" Then
Set c2 = New Collection
a2 = Split(a1(i1), ";")
For i2 = LBound(a2) To UBound(a2)
If a2(i2) <> "" Then
a3 = Split(a2(i2), "=")
If UBound(a3) > 0 Then
c2.Add a3(1), a3(0)
ElseIf UBound(a3) = 0 Then
c2.Add a3(0)
End If
End If
Next i2
c1.Add c2
End If
Next i1
Set parseString = c1
End Function
Sub testParseString()
Dim c As Collection
Set c = parseString("|KC;|AD;PE=5;PF=3;|CD;PE=5;HP=test;|CD;PE=3;HP=abc;|")
Debug.Assert c(1)(1) = "KC"
Debug.Assert c(2)("PE") = "5"
Debug.Assert c(3)(1) = "CD"
Debug.Assert c(4)("HP") = "abc"
Debug.Assert c(4)(3) = "abc"
End Sub
Note that you can address values by both, index and key (if key existed in the input). If key was not provided you can only access the value by its index. You can also iterate collection recursively to get all the values in a tree structure.
Food for thought: since your structures may have repeated names (in your case "CD" structure happens twice) Collections / Dictionaries would find it problematic to store this elegantly (due to key collisions). Another good way to approach this is to create an XML structure with DOMDocument and use XPath to access its elements. See Program with DOM in Visual Basic
UPDATE: I've added XML example below as well. Have a look.
Here is another take on your string parsing issue using DOMDocument XML parser. You need to include Microsoft XML, v.6.0 in your VBA references.
Function parseStringToDom(str As String) As DOMDocument60
Dim a1() As String, i1 As Long
Dim a2() As String, i2 As Long
Dim a3() As String
Dim dom As DOMDocument60
Dim rt As IXMLDOMNode
Dim nd As IXMLDOMNode
Set dom = New DOMDocument60
dom.async = False
dom.validateOnParse = False
dom.resolveExternals = False
dom.preserveWhiteSpace = True
Set rt = dom.createElement("root")
dom.appendChild rt
a1 = Split(str, "|")
For i1 = LBound(a1) To UBound(a1)
If a1(i1) <> "" Then
a2 = Split(a1(i1), ";")
Set nd = dom.createElement(a2(0))
For i2 = LBound(a2) To UBound(a2)
If a2(i2) <> "" Then
a3 = Split(a2(i2), "=")
If UBound(a3) > 0 Then
nd.appendChild dom.createElement(a3(0))
nd.LastChild.Text = a3(1)
End If
End If
Next i2
rt.appendChild nd
End If
Next i1
Set parseStringToDom = dom
End Function
Sub testParseStringToDom()
Dim dom As DOMDocument60
Set dom = parseStringToDom("|KC;|AD;PE=5;PF=3;|CD;PE=5;HP=test;|CD;PE=3;HP=abc;|")
Debug.Assert Not dom.SelectSingleNode("/root/KC") Is Nothing
Debug.Assert dom.SelectSingleNode("/root/AD/PE").Text = "5"
Debug.Assert dom.SelectSingleNode("/root/CD[1]/HP").Text = "test"
Debug.Assert dom.SelectSingleNode("/root/CD[2]/HP").Text = "abc"
Debug.Print dom.XML
End Sub
As you can see this converts your text into an XML DOM document preserving all the structures and allowing for duplicates in naming. You can then use XPath to access any node or value. This can also be extended to have more nesting levels and further structures.
This is the XML document it creates behind the scenes:
<root>
<KC/>
<AD>
<PE>5</PE>
<PF>3</PF>
</AD>
<CD>
<PE>5</PE>
<HP>test</HP>
</CD>
<CD>
<PE>3</PE>
<HP>abc</HP>
</CD>
</root>
I've started to write a parser in VBA for the string structure specified by you, and it's not complete, but I'll post it anyways. Maybe you can pick up some ideas from it.
Sub ParseString()
Dim str As String
str = "|KC;|AD;PE=5;PF=3;|CD;PE=5;HP=test;|CD;PE=3;HP=abc;|"
' Declare an object dictionary
' Make a reference to Microsoft Scripting Runtime in order for this to work
Dim dict As New Dictionary
' If the bars are present in the first and last character of the string, replace them
str = Replace(str, "|", "", 1, 1)
If (Mid(str, Len(str), 1) = "|") Then
str = Mid(str, 1, Len(str) - 1)
End If
' Split the string by bars
Dim substring_array() As String
substring_array = Split(str, "|")
' Declare a regex object
' Check the reference to Microsoft VBScript Regular Expressions 5.5 in order for this to work
Dim regex As New RegExp
With regex
.Global = True
.IgnoreCase = True
.MultiLine = True
End With
' Object to store the regex matches
Dim matches As MatchCollection
Dim param_name_matches As MatchCollection
Dim parameter_value_matches As MatchCollection
' Define some regex patterns
pattern_for_structure_name = "^[^=;]+;"
pattern_for_parameters = "[^=;]+=[^=;]+;"
pattern_for_parameter_name = "[^=;]="
pattern_for_parameter_val = "[^=;];"
' Loop through the elements of the array
Dim i As Integer
For i = 0 To UBound(substring_array) - LBound(substring_array)
' Get the array element in a string
str1 = substring_array(i)
' Check if it contains a structure name
regex.Pattern = pattern_for_structure_name
Set matches = regex.Execute(str1)
If matches.Count = 0 Then
' This substring does not contain a structure name
' Check if it contains parameters
regex.Pattern = pattern_for_parameter
Set matches = regex.Execute(matches(0).Value)
If matches.Count = 0 Then
' There are no parameters as well as no structure name
' This means the string had || - invalid string
MsgBox ("Invalid string")
Else
' The string contains parameter names
' Add each parameter name to the dictionary
Dim my_match As match
For Each my_match In matches
' Get the name of the parameter
regex.Pattern = pattern_for_parameter_name
Set parameter_name_matches = regex.Execute(my_match.Value)
' Check if the above returned any matches
If parameter_name_matches.Count = 1 Then
' Remove = sign from the parameter name
parameter_name = Replace(parameter_name_matches(0).Value, "=", "")
' Get the value of the parameter
regex.Pattern = pattern_for_parameter_value
Set parameter_value_matches = regex.Execute(my_match.Value)
' Check if the above returned any matches
If parameter_value_matches.Count = 1 Then
' Get the value
parameter_value = Replace(parameter_value_matches(0).Value, ";", "")
' Add the parameter name and value as a key pair to the Dictionary object
dict.Item(parameter_name) = parameter_value
Else
' Number of matches is either 0 or greater than 1 - in both cases the string is invalid
MsgBox ("Invalid string")
End If
Else
' Parameter name did not match - invalid string
MsgBox ("Invalid string")
End If
Next
End If
ElseIf matches.Count = 1 Then
' This substring contains a single structure name
' Check if it has parameter names
Else
' This substring contains more than one structure name - the original string is invalid
MsgBox ("Invalid string")
End If
Next i
End Sub
I need to parse and summarize and batches of several thousand text lines on a weekly basis. Excel wildcards weren't flexible enough, and I wanted to remove the extra step of either pasting into Notepad++ for processing or feeding to a script.
Here are the tools I came up with. They're still a bit slow -- perhaps 3000 lines per second on a company laptop -- but they are handy.
RXMatch -- return first match, option to return a subgroup.
=RXMatch("Apple","A(..)",1) -> "pp"
RXCount -- count number of matches
=RXCount("Apple","p") -> 2
RXPrint -- embed first match and/or subgroups into a template string
=RXPrint("Apple","(\S)\S+","\1 is for \0") -> "A is for Apple"
RXPrintAll -- embed each match into a template string, join the results
=RXPrintAll("Apple Banana","(\S)\S+","\1 is for \0") -> "A is for Apple, B is for Banana"
RXMatches -- return a vertical array of matches, option to return a subgroup
=RXMatches("Apple Banana","\S+") -> {"Apple";"Banana"}
RXMatch
Public Function RXMatch(Text As String, Pattern As String, Optional Group As Integer = 0, Optional IgnoreCase As Boolean = True) As String
Dim retval As String
' Takes a string and returns the matching text
' Text is the string to be searched
' Pattern is the regex pattern
' Group (optional) selects a parenthesized group (count the number of left parentheses preceding it to get the group number)
' IgnoreCase (optional) set to False for a case-sensitive search
Dim RE As Object
Dim Matches As Object
Set RE = CreateObject("vbscript.regexp")
RE.IgnoreCase = IgnoreCase
RE.Pattern = Pattern
Set Matches = RE.Execute(Text)
If (Matches.Count > 0) Then
If (Group > 0) Then
retval = Matches(0).submatches(Group - 1)
Else
retval = Matches(0)
End If
Else
retval = ""
End If
RXMatch = retval
End Function
RXCount
Public Function RXCount(Text As String, Pattern As String, Optional IgnoreCase As Boolean = True) As Integer
Dim retval As Integer
' Counts the number of matches
' Text is the string to be searched
' Pattern is the regex pattern
' IgnoreCase (optional) set to False for a case-sensitive search
Dim RE As Object
Dim Matches As Object
Set RE = CreateObject("vbscript.regexp")
RE.IgnoreCase = IgnoreCase
RE.Global = True
RE.Pattern = Pattern
Set Matches = RE.Execute(Text)
retval = Matches.Count
RXCount = retval
End Function
RXPrint
Public Function RXPrint(Text As String, Pattern As String, Optional Template As String = "\0", Optional IgnoreCase As Boolean = True) As String
Dim retval As String
' Takes a string and returns a new string formatted according to the given template, using the first match found
' Text is the string to be searched
' Pattern is the regex pattern
' Template (optional) is a string which should contain group identifiers (\0 - \9) to be substituted with groups in the match
' IgnoreCase (optional) set to False for a case-sensitive search
Dim REText, RETemplate As Object
Dim MatchesText, MatchesTemplate As Object
Set REText = CreateObject("vbscript.regexp")
REText.IgnoreCase = IgnoreCase
REText.Pattern = Pattern
Set MatchesText = REText.Execute(Text)
Set RETemplate = CreateObject("vbscript.regexp")
RETemplate.Global = True
RETemplate.Pattern = "(?:\\(.))|([^\\]+)"
Set MatchesTemplate = RETemplate.Execute(Template)
If (MatchesText.Count > 0) Then
ReDim retArray(0 To MatchesTemplate.Count - 1) As String
Dim escaped As String
Dim plaintext As String
For i = 0 To MatchesTemplate.Count - 1
escaped = MatchesTemplate(i).submatches(0)
plaintext = MatchesTemplate(i).submatches(1)
If (Len(escaped) > 0) Then
If (IsNumeric(escaped)) Then
Dim groupnum As Integer
groupnum = CInt(escaped)
If groupnum = 0 Then
retArray(i) = MatchesText(0)
ElseIf (groupnum > MatchesText(0).submatches.Count) Then
retArray(i) = "?"
Else
retArray(i) = MatchesText(0).submatches(groupnum - 1)
End If
Else
retArray(i) = escaped
End If
Else
retArray(i) = plaintext
End If
Next i
retval = Join(retArray, "")
Else
retval = ""
End If
RXPrint = retval
End Function
RXPrintAll
Public Function RXPrintAll(Text As String, Pattern As String, Optional Template As String = "\0", Optional Delimiter As String = ", ", Optional IgnoreCase As Boolean = True) As String
Dim retval As String
' Takes a string and returns a new string formatted according to the given template, repeated for each match
' Text is the string to be searched
' Pattern is the regex pattern
' Template (optional) is a string which should contain group identifiers (\0 - \9) to be substituted with groups in the match
' Delimiter (optional) specified how the results will be joined
' IgnoreCase (optional) set to False for a case-sensitive search
Dim REText, RETemplate As Object
Dim MatchesText, MatchesTemplate As Object
Set REText = CreateObject("vbscript.regexp")
REText.IgnoreCase = IgnoreCase
REText.Global = True
REText.Pattern = Pattern
Set MatchesText = REText.Execute(Text)
Set RETemplate = CreateObject("vbscript.regexp")
RETemplate.Global = True
RETemplate.Pattern = "(?:\\(.))|([^\\]+)"
Set MatchesTemplate = RETemplate.Execute(Template)
If (MatchesText.Count > 0) Then
ReDim retArrays(0 To MatchesText.Count - 1)
For j = 0 To MatchesText.Count - 1
ReDim retArray(0 To MatchesTemplate.Count - 1) As String
Dim escaped As String
Dim plaintext As String
For i = 0 To MatchesTemplate.Count - 1
escaped = MatchesTemplate(i).submatches(0)
plaintext = MatchesTemplate(i).submatches(1)
If (Len(escaped) > 0) Then
If (IsNumeric(escaped)) Then
Dim groupnum As Integer
groupnum = CInt(escaped)
If groupnum = 0 Then
retArray(i) = MatchesText(j)
ElseIf (groupnum > MatchesText(j).submatches.Count) Then
retArray(i) = "?"
Else
retArray(i) = MatchesText(j).submatches(groupnum - 1)
End If
Else
retArray(i) = escaped
End If
Else
retArray(i) = plaintext
End If
Next i
retArrays(j) = Join(retArray, "")
Next j
retval = Join(retArrays, Delimiter)
Else
retval = ""
End If
RXPrintAll = retval
End Function
RXMatches
Public Function RXMatches(Text As String, Pattern As String, Optional Group As Integer = 0, Optional IgnoreCase As Boolean = True) As Variant
Dim retval() As String
' Takes a string and returns all matches in a vertical array
' Text is the string to be searched
' Pattern is the regex pattern
' Group (optional) selects a parenthesized group (count the number of left parentheses preceding it to get the group number)
' IgnoreCase (optional) set to False for a case-sensitive search
Dim RE As Object
Dim Matches As Object
Set RE = CreateObject("vbscript.regexp")
RE.IgnoreCase = IgnoreCase
RE.Global = True
RE.Pattern = Pattern
Set Matches = RE.Execute(Text)
If (Matches.Count > 0) Then
ReDim retval(0 To Matches.Count - 1)
For i = 0 To Matches.Count - 1
If (Group > 0) Then
retval(i) = Matches(i).submatches(Group - 1)
Else
retval(i) = Matches(i)
End If
Next i
Else
ReDim retval(1)
retval(0) = ""
End If
RXMatches = Application.Transpose(retval)
End Function
When dealing with UDFs it's vital that you cache created objects.
For example:
Public Function RegexTest(ByVal vHaystack As Variant, ByVal sPattern As String, Optional ByVal sFlags As String = "") As Boolean
'If haystack is an error then return false
If IsError(vHaystack) Then Exit Function
'Stringify haystack
Dim sHaystack As String: sHaystack = vHaystack
'Cache regular expressions, especially important for formulae
Static lookup As Object
If lookup Is Nothing Then Set lookup = CreateObject("Scripting.Dictionary")
'If cached object doesn't exist, create it
Dim sKey As String: sKey = sPattern & "-" & sFlags
If Not lookup.exists(sKey) Then
'Create regex object
Set lookup(sKey) = CreateObject("VBScript.Regexp")
'Bind flags
For i = 1 To Len(sFlags)
Select Case Mid(sFlags, i, 1)
Case "i"
lookup(sKey).IgnoreCase = True
Case "g"
lookup(sKey).Global = True
End Select
Next
'Set pattern
lookup(sKey).Pattern = sPattern
End If
'Use test function of regex object
RegexTest = lookup(sKey).test(sHaystack)
End Function
Applying this to your own functions, you'll see this vastly increases the speed of execution on a large number of cells.
Edit:
Since my string became more and more complicated looks like regexp is the only way.
I do not have a lot experience in that and your help is much appreciated.
Basically from what I read on the web I construct the following exp to try matching occurrence in my sample string:
"My very long long string 12Mar2012 is right here 23Apr2015"
[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]
and trying this code. I do not have any match. Any good link on regexp tutorial much appreciated.
Dim re, match, RegExDate
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(^[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]$)"
re.Global = True
For Each match In re.Execute(str)
MsgBox match.Value
RegExDate = match.Value
Exit For
Next
Thank you
This code validates the actual date from the Regexp using DateValuefor robustness
Sub Robust()
Dim Regex As Object
Dim RegexMC As Object
Dim RegexM As Object
Dim strIn As String
Dim BDate As Boolean
strIn = "My very long long string 12Mar2012 is right here 23Apr2015 and 30Feb2002"
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Pattern = "(([0-9])|([0-2][0-9])|([3][0-1]))(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})"
.Global = True
If .test(strIn) Then
Set RegexMC = .Execute(strIn)
On Error Resume Next
For Each RegexM In RegexMC
BDate = False
BDate = IsDate(DateValue(RegexM.submatches(0) & " " & RegexM.submatches(4) & " " & RegexM.submatches(5)))
If BDate Then Debug.Print RegexM
Next
On Error GoTo 0
End If
End With
End Sub
thanks for all your help !!!
I managed to solve my problem using this simple code.
Dim rex As New RegExp
Dim dateCol As New Collection
rex.Pattern = "(\d|\d\d)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})?"
rex.Global = True
For Each match In rex.Execute(sStream)
dateCol.Add match.Value
Next
Just note that on my side I'm sure that I got valid date in the string so the reg expression is easy.
thnx
Ilya
The following is a quick attempt I made. It's far from perfect.
Basically, it splits the string into words. While looping through the words it cuts off any punctuation (period and comma, you might need to add more).
When processing an item, we try to remove each month name from it. If the string gets shorter we might have a date.
It checks to see if the length of the final string is about right (5 or 6 characters, 1 or 2 + 4 for day and year)
You could instead (or also) check to see that there all numbers.
Private Const MonthList = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
Public Function getDates(ByVal Target As String) As String
Dim Data() As String
Dim Item As String
Dim Index As Integer
Dim List() As String
Dim Index2 As Integer
Dim Test As String
Dim Result As String
List = Split(MonthList, ",")
Data = Split(Target, " ")
Result = ""
For Index = LBound(Data) To UBound(Data)
Item = UCase(Replace(Replace(Data(Index), ".", ""), ",", ""))
For Index2 = LBound(Data) To UBound(Data)
Test = Replace(Item, List(Index2), "")
If Not Test = Item Then
If Len(Test) = 5 Or Len(Test) = 6 Then
If Result = "" Then
Result = Item
Else
Result = Result & ", " & Item
End If
End If
End If
Next Index2
Next
getDates = Result
End Function
I'm trying to get the code below to send the results of the regexp search to an array of strings. How can I do that?
When I change name to an array of strings i.e. Dim name() as String VBA throws a type-mismatch exception. Any idea what I can do to fix that?
Many thanks.
Do While Not EOF(1)
Line Input #1, sText
If sText <> "" Then
Dim Regex As Object, myMatches As Object
' instantiates regexp object
Set Regex = CreateObject("VBScript.RegExp")
With Regex
.MultiLine = False
.Global = True
.IgnoreCase = False
.Pattern = "^Personal\sname\s*[:]\s*"
End With
' get name, seperated from Personal Name
If Regex.test(sText) Then
Set myMatches = Regex.Execute(sText)
Dim temp As String
temp = Regex.Replace(sText, vbNullString)
Regex.Pattern = "^[^*]*[*]+"
Set myMatches = Regex.Execute(temp)
Dim temp2 As String
temp2 = myMatches.Item(0)
name = Trim(Left(temp2, Len(temp2) - 3))
End If
End If
Loop
You should not use "name" as a variable name as it conflicts with an excel property. Try sName or sNames instead, where s is for string.
With a array you need to give it a size before you can assign a value to each element.
Dim sNames(4) As String '// Or Dim sNames(1 To 4) As String
sName(1) = "John"
...
sName(4) = "Sam"
or if you don't know the total number of elements (names) to begin with then:
Dim sNames() As String
Dim iTotalNames As Integer
iTotalNames = '// Some code here to determine how many names you will have
ReDim sNames(iTotalNames) '// You can also use ReDim Preserve if you have existing elements
sName(1) = "John"
...
sName(4) = "Sam"
So I suspect you will need something like:
Dim sNames() As String
Dim iTotalNames As Integer
'// Your code ....
iTotalNames = iTotalNames + 1
ReDim Preserve sNames(iTotalNames)
sNames(iTotalNames) = Trim(Left(temp2, Len(temp2) - 3))
'// Rest of your code ...
Also in VBA all dimensioning of variables should be at the top of the module.
change
'call this "A"
Dim temp2 As String
temp2 = myMatches.Item(0)
to
'stick this at the top
redim temp2(0 to 0)
'replace "A" with this
new_top = ubound(temp2)+1
redim preserve temp2 (0 to new_top)
temp2(new_top) = myMatches.Item(0)