Extract/convert date from string in MS Access - regex

I'm trying to extract date/times from strings with the following patterns and convert them to date types in Access.
"08-Apr-2012 21:26:49"
"...Confirmed by SMITH, MD, JOHN (123) on 4/2/2012 11:11:01 AM;"
Can anyone help?

Try this
Dim d As Date
d = CDate("08-Apr-2012 21:26:49")
Debug.Print Format(d, "dd-MMM-yyyy")
Debug.Print Format(d, "h:m:s")
Will give
08-Apr-2012
21:26:49
use this regex to get date-time between " on " (ie, space on space) and the ";" (first semi-colon after that).
(?<=\ on )(.*?)(?=\;)

As already mentioned by Romeo in his answer, you need to use CDate() to convert a string with a valid date value to a Date variable.
You can get the date value out of the string like this:
(given that the strings always look like the one in the example, " on " (with blanks) before the date and ";" after it):
Public Function Test()
Dim Source As String
Dim Tmp As String
Dim DateStart As Integer
Dim DateEnd As Integer
Dim DateValue As Date
Source = "...Confirmed by SMITH, MD, JOHN (123) on 4/2/2012 11:11:01 AM;"
'find the place in the source string where " on " ends
DateStart = InStr(1, Source, " on ") + 4
'find first semicolon after the date)
DateEnd = InStr(DateStart, Source, ";")
'get the part with the date
Tmp = Mid(Source, DateStart, DateEnd - DateStart)
'convert to date
DateValue = CDate(Tmp)
End Function

Add this function to a VBA module:
' ----------------------------------------------------------------------'
' Return a Date object or Null if no date could be extracted '
' ----------------------------------------------------------------------'
Public Function ExtractDate(value As Variant) As Variant
If IsNull(value) Then
ExtractDate = Null
Exit Function
End If
' Using a static, we avoid re-creating the same regex object for every call '
Static regex As Object
' Initialise the Regex object '
If regex Is Nothing Then
Set regex = CreateObject("vbscript.regexp")
With regex
.Global = True
.IgnoreCase = True
.MultiLine = True
.pattern = "(\d+\/\d+/\d+\s+\d+:\d+:\d+\s+\w+|\d+-\w+-\d+\s+\d+:\d+:\d+)"
End With
End If
' Test the value against the pattern '
Dim matches As Object
Set matches = regex.Execute(value)
If matches.count > 0 Then
' Convert the match to a Date if we can '
ExtractDate = CDate(matches(0).value)
Else
' No match found, jsut return Null '
ExtractDate = Null
End If
End Function
And then use it like this, for instance in a query:
SELECT ID, LogData, ExtractDate(LogData) as LogDate
FROM MyLog
Make sure you check that hte dates returned are in the proper format and make sense to you.
CDate() interprets the date string in different ways depending on your locale.
If you're not getting the desired result, you will need to modify the code to separate the individual components of the date and rebuild them using DateSerial() for instance.

Related

Excel VBA RegEx that extracts numbers from price values in range (has commas, $ and -)

I have a field data extracted from a database which represents a range of values, but it's coming in Excel as a String format $86,000 - $162,000.
I need to extract the minimum value and the maximum value from each cell, so I need to extract the numeric portion of it, and ignore the $, - and the ,.
I've attached an image of the data I have, and the values I want to extract from it.
This is the closest pattern I got with RegEx, but I'ts not what I'm looking for.
Pattern = (\d+)(?:\.(\d{1,2}))?
Can anyone assist ?
Just wondering why Regex?
Function GetParts(priceRange As String) As Double()
Dim arr() As String
Dim parts() As Double
If InStr(1, priceRange, "-") > 0 Then
arr = Split(priceRange, "-")
ReDim parts(0 To UBound(arr))
Dim i As Long
For i = 0 To UBound(arr)
parts(i) = CDbl(Replace$(Replace$(Trim$(arr(i)), "$", ""), ",", ""))
Next i
End If
GetParts = parts
End Function
Sub test()
MsgBox GetParts("$14,000 - $1,234,567")(0) 'Minimum
End Sub
EDIT
Yet you could do this with regex to match the data string into the parts:
Function GetPartsRegEx(priceRange As String) As Variant
Dim arr() As Double
Dim pricePattern As String
pricePattern = "(\$?\d+[\,\.\d]*)"
'START EDIT
Static re As RegExp
If re Is Nothing Then
Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.Pattern = pricePattern & "\s*[\-]\s*" & pricePattern 'look for the pattern first
End If
Static nums As RegExp
If nums Is Nothing Then
Set nums = New RegExp
'to remove all non digits, except decimal point in case you have pennies
nums.Pattern = "[^0-9.]"
nums.Global = True
End If
'END EDIT
If re.test(priceRange) Then
ReDim arr(0 To 1) ' fill return array
arr(0) = CDbl(nums.Replace(re.Replace(priceRange, "$1"), ""))
arr(1) = CDbl(nums.Replace(re.Replace(priceRange, "$2"), ""))
Else
'do some error handling here
Exit Function
End If 'maybe throw error if no +ve test or
GetPartsRegEx = arr
End Function
Sub test()
MsgBox GetPartsRegEx("$1,005.45 - $1,234,567.88")(1)
End Sub
Here is quick Example Demo https://regex101.com/r/RTNlVF/1
Pattern "^\$(\d+\,\d+)\s\-\s\$(\d+\,\d+)"
Option Explicit
Private Sub Example()
Dim RegExp As New RegExp
Dim Pattern As String
Dim CelValue As String
Dim rng As Range
Dim Cel As Range
Set rng = ActiveWorkbook.Sheets("Sheet1" _
).Range("A2", Range("A9999" _
).End(xlUp))
For Each Cel In rng
DoEvents
Pattern = "^\$(\d+\,\d+)\s\-\s\$(\d+\,\d+)"
If Pattern <> "" Then
With RegExp
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = Pattern
End With
If RegExp.Test(Cel.Value) Then
' Debug.Print Cel.Value
Debug.Print RegExp.Replace(CStr(Cel), "$1")
Debug.Print RegExp.Replace(CStr(Cel), "$2")
End If
End If
Next
End Sub
Without a loop (but still no regex):
Sub Split()
With Columns("B:B")
.Replace What:="$", Replacement:=""
Application.CutCopyMode = False
.TextToColumns Destination:=Range("B1"), DataType:=xlDelimited, Other:=True, OtherChar:="-", FieldInfo:=Array(Array(1, 1), Array(2, 1))
End With
Columns("B:C").Insert Shift:=xlToRight
Columns("D:E").NumberFormat = "0"
Range("D1").FormulaR1C1 = "Min Value"
Range("E1").FormulaR1C1 = "Max Value"
With Range("D1:E1").Interior
.Pattern = xlSolid
.PatternColorIndex = xlAutomatic
.Color = 12611584
End With
With Range("D1:E1").Font
.ThemeColor = xlThemeColorDark1
.TintAndShade = 0
End With
End Sub
I made this function:
Hope it helps.
Code:
Function ExtractNumber(ByVal TextInput As String, _
Optional ByVal Position As Byte = 0, _
Optional ByVal Delimiter As String = "-") As Variant
' You can use this function in a subprocess that
' writes the values in the cells you want, or
' you can use it directly in the ouput cells
' Variables
Dim RemoveItems(2) As String
Dim Aux As Variant
' The variable RemoveItems is an array
' containing the characters you want to remove
RemoveItems(0) = "."
RemoveItems(1) = ","
RemoveItems(2) = " "
' STEP 1 - The variable Aux will store the text
' given as input
Aux = TextInput
' STEP 2 - Characters stored in the variable
' RemoveItems will be removed from Aux
For i = 0 To UBound(RemoveItems)
Aux = Replace(Aux, RemoveItems(i), "")
Next i
' STEP 3 - Once Aux is "clean", it will be
' transformed into an array containing the
' values separated by the delimiter
' As you can see at the function's header,
' Delimiter default value is "-". You can change
' it depending on the situation
Aux = Split(Aux, Delimiter)
' STEP 4 - The result of this function will be
' a numeric value. So, if the value of the
' selected position in Aux is not numeric it will
' remove the first character assuming it is a
' currency symbol.
' If something fails in the process the function
' will return "ERROR", so you can know you may
' verify the inputs or adjust this code for
' your needs.
On Error GoTo ErrHndl
If Not IsNumeric(Aux(Position)) Then
ExtractNumber = CLng(Mid(Aux(Position), 2))
Else
ExtractNumber = CLng(Aux(Position))
End If
Exit Function
ErrHndl:
ExtractNumber = "ERROR"
End Function
You can even do this with just worksheet formulas. Under certain circumstances, Excel will ignore the $ and ,. The double unary converts the returned string to a numeric value.
First Value: =--LEFT(A1,FIND("-",A1)-1)
Second Value: =--MID(A1,FIND("-",A1)+1,99)

Named groups for Regex in VBA

Is there any way to use named groups with regular expressions in VBA?
I would like to write a an Excel VBA Sub that matches the dates in file names and decrements these dates by a specified amount. I need to be able to distinguish between dd/mm and mm/dd formats -- among other irregularities -- and using named groups something like this would solve the problem:
(?:<month>\d\d)(?:<day>\d\d)
Advice is appreciated
Nope, no named groups in VBScript regular expressions.
VBScript uses the same regexp engine that JScript uses, so it's compatible with JavaScript regex, which also doesn't have named groups.
You have to use unnamed groups and just go by the order they appear on the expression to retrieve them by index after running it.
In general, dd/mm and mm/dd can't be automatically distinguished since there are valid dates that could be either. (e.g. 01/04 could be January 4th or April 1st). I don't think you'd be able to solve this with a regular expression.
Here is an implementation of named groups using VBA I made today. Hopefully this will be useful to someone else!:
'Description:
' An implementation of Regex which includes Named Groups
' and caching implemented in VBA
'Example:
' Dim match as Object
' set match = RegexMatch("01/01/2019","(?<month>\d\d)\/(?<day>\d\d)\/(?<year>\d\d\d\d)")
' debug.print match("day") & "/" & match("month") & "/" & match("year")
'Options:
' "i" = IgnoreCase
'Return value:
' A dictionary object with the following keys:
' 0 = Whole match
' 1,2,3,... = Submatch 1,2,3,...
' "Count" stores the count of matches
' "<<NAME>>" stores the match of a specified name
Function RegexMatch(ByVal haystack As String, ByVal pattern As String, Optional ByVal options As String) As Object
'Cache regexes for optimisation
Static CachedRegex As Object
Static CachedNames As Object
If CachedRegex Is Nothing Then Set CachedRegex = CreateObject("Scripting.Dictionary")
If CachedNames Is Nothing Then Set CachedNames = CreateObject("Scripting.Dictionary")
'Named regexp used to detect capturing groups and named capturing groups
Static NamedRegexp As Object
If NamedRegexp Is Nothing Then
Set NamedRegexp = CreateObject("VBScript.RegExp")
NamedRegexp.pattern = "\((?:\?\<(.*?)\>)?"
NamedRegexp.Global = True
End If
'If cached pattern doesn't exist, create it
If Not CachedRegex(pattern) Then
'Create names/capture group object
Dim testPattern As String, oNames As Object
testPattern = pattern
testPattern = Replace(testPattern, "\\", "asdasd")
testPattern = Replace(testPattern, "\(", "asdasd")
'Store names for optimisation
Set CachedNames(options & ")" & pattern) = NamedRegexp.Execute(testPattern)
'Create new VBA valid pattern
Dim newPattern As String
newPattern = NamedRegexp.Replace(pattern, "(")
'Create regexp from new pattern
Dim oRegexp As Object
Set oRegexp = CreateObject("VBScript.RegExp")
oRegexp.pattern = newPattern
'Set regex options
Dim i As Integer
For i = 1 To Len(flags)
Select Case Mid(flags, i, 1)
Case "i"
oRegexp.ignoreCase = True
Case "g"
oRegexp.Global = True
End Select
Next
'Store regex for optimisation
Set CachedRegex(options & ")" & pattern) = oRegexp
End If
'Get matches object
Dim oMatches As Object
Set oMatches = CachedRegex(options & ")" & pattern).Execute(haystack)
'Get names object
Dim CName As Object
Set CName = CachedNames(options & ")" & pattern)
'Create dictionary to return
Dim oRet As Object
Set oRet = CreateObject("Scripting.Dictionary")
'Fill dictionary with names and indexes
'0 = Whole match
'1,2,3,... = Submatch 1,2,3,...
'"Count" stores the count of matches
'"<<NAME>>" stores the match of a specified name
For i = 1 To CName.Count
oRet(i) = oMatches(0).Submatches(i - 1)
If Not IsEmpty(CName(i - 1).Submatches(0)) Then oRet(CName(i - 1).Submatches(0)) = oMatches(0).Submatches(i - 1)
Next i
oRet(0) = oMatches(0)
oRet("Count") = CName.Count
Set RegexMatch = oRet
End Function
P.S. for a Regex library (built by myself) which has this additional functionality, check out stdRegex. The equivalent can be done with:
set match = stdRegex.Create("(?:<month>\d\d)(?:<day>\d\d)").Match(sSomeString)
Debug.print match("month")
There are also more features of stdRegex, than VBScript's standard object. See the test suite for more info.
Thanks #Sancarn for his code!
For a few reasons I've revised it. The changes I've made are documented inside the code:
' Procedure for testing 'RegexMatch'.
' - It shows how to convert a date from 'mm/dd/yyyy' to 'dd.mm.yyyy' format.
' - It shows how to retrieve named groups by real name: 'Match.Item("group name")'
' as well as by number: 'Match.Items(group number)'.
' - It shows how to retrieve unnamed groups by number-generated name as well as by number.
' - It shows how to retrieve group count and the whole match by number-generated name as well as by number.
' - It shows that non-capturing groups like '(?:y)?' won't be listed.
' - It shows that left parenthesis inside a character class like '([x(])?' won't disturbe.
' Take notice of:
' - the small difference between 'Item' and 'Items'
' - the quotes in 'Match.Item("number of an unnamed group")'
Sub TestRegexMatch()
Dim Match As Scripting.Dictionary
Set Match = RegexMatch("01/23/2019z", "(?<month>\d\d)\/([x(])?(?<day>\d\d)\/(?:y)?(?<year>\d\d\d\d)(z)?")
Debug.Print Match.Item("day") & "." & Match.Item("month") & "." & Match.Item("year") & " vs. " & Match.Items(2) & "." & Match.Items(0) & "." & Match.Items(3)
Debug.Print "'" & Match.Item("1") & "'" & ", '" & Match.Item("4") & "' vs. '" & Match.Items(1) & "', '" & Match.Items(4) & "'"
Debug.Print Match.Item("98") & " vs. " & Match.Items(Match.Count - 2)
Debug.Print Match.Item("99") & " vs. " & Match.Items(Match.Count - 1)
End Sub
' An implementation of regex which includes named groups and caching implemented in VBA.
' The 'Microsoft VBScript Regular Expressions 5.5' library must be referenced (in VBA-editor: Tools -> References).
' Parameters:
' - haystack: the string the regex is applied on.
' - originalPattern: the regex pattern with or without named groups.
' The group naming has to follow .net regex syntax: '(?<group name>group content)'.
' Group names may contain the following characters: a-z, A-Z, _ (underscore).
' Group names must not be an empty string.
' - options: a string that may contain:
' - 'i' (the regex will work case-insensitive)
' - 'g' (the regex will work globally)
' - 'm' (the regex will work in multi-line mode)
' or any combination of these.
' Returned value: a Scripting.Dictionary object with the following entries:
' - Item 0 or "0", 1 or "1" ... for the groups content/submatches,
' following the convention of VBScript_RegExp_55.SubMatches collection, which is 0-based.
' - Item Match.Count - 2 or "98" for the whole match, assuming that the number of groups is below.
' - Item Match.Count - 1 or "99" for number of groups/submatches.
' Changes compared to the original version:
' - Handles non-capturing and positive and negative lookahead groups.
' - Handles left parenthesis inside a character class.
' - Named groups do not count twice.
' E.g. in the original version the second named group occupies items 3 and 4 of the returned
' dictionary, in this revised version only item 1 (item 0 is the first named group).
' - Additional 'm' option.
' - Fixed fetching cached regexes.
' - Early binding.
' - Some code cleaning.
' For an example take a look at the 'TestRegexMatch' procedure above.
Function RegexMatch(ByVal haystack As String, ByVal originalPattern As String, Optional ByVal options As String) As Scripting.Dictionary
Dim GroupsPattern As String
Dim RealPattern As String
Dim RealRegExp As VBScript_RegExp_55.RegExp
Dim RealMatches As VBScript_RegExp_55.MatchCollection
Dim ReturnData As Scripting.Dictionary
Dim GroupNames As VBScript_RegExp_55.MatchCollection
Dim Ctr As Integer
' Cache regexes and group names for optimisation.
Static CachedRegExps As Scripting.Dictionary
Static CachedGroupNames As Scripting.Dictionary
' Group 'meta'-regex used to detect named and unnamed capturing groups.
Static GroupsRegExp As VBScript_RegExp_55.RegExp
If CachedRegExps Is Nothing Then Set CachedRegExps = New Scripting.Dictionary
If CachedGroupNames Is Nothing Then Set CachedGroupNames = New Scripting.Dictionary
If GroupsRegExp Is Nothing Then
Set GroupsRegExp = New VBScript_RegExp_55.RegExp
' Original version: GroupsRegExp.Pattern = "\((?:\?\<(.*?)\>)?"
GroupsRegExp.Pattern = "\((?!(?:\?:|\?=|\?!|[^\]\[]*?\]))(?:\?<([a-zA-Z0-9_]+?)>)?"
GroupsRegExp.Global = True
End If
' If the pattern isn't cached, create it.
If Not CachedRegExps.Exists("(" & options & ")" & originalPattern) Then
' Prepare the pattern for retrieving named and unnamed groups.
GroupsPattern = Replace(Replace(Replace(Replace(originalPattern, "\\", "X"), "\(", "X"), "\[", "X"), "\]", "X")
' Store group names for optimisation.
CachedGroupNames.Add "(" & options & ")" & originalPattern, GroupsRegExp.Execute(GroupsPattern)
' Create new VBScript regex valid pattern and set regex for this pattern.
RealPattern = GroupsRegExp.Replace(originalPattern, "(")
Set RealRegExp = New VBScript_RegExp_55.RegExp
RealRegExp.Pattern = RealPattern
' Set regex options.
For Ctr = 1 To Len(options)
Select Case Mid(options, Ctr, 1)
Case "i"
RealRegExp.IgnoreCase = True
Case "g"
RealRegExp.Global = True
Case "m"
RealRegExp.MultiLine = True
End Select
Next
' Store this regex for optimisation.
CachedRegExps.Add "(" & options & ")" & originalPattern, RealRegExp
End If
' Get matches.
Set RealMatches = CachedRegExps.Item("(" & options & ")" & originalPattern).Execute(haystack)
' Get group names.
Set GroupNames = CachedGroupNames.Item("(" & options & ")" & originalPattern)
' Create dictionary to return.
Set ReturnData = New Scripting.Dictionary
' Fill dictionary with names and indexes as descibed in the remarks introducing this procedure.
For Ctr = 1 To GroupNames.Count
If IsEmpty(GroupNames(Ctr - 1).SubMatches(0)) Then
ReturnData.Add CStr(Ctr - 1), RealMatches(0).SubMatches(Ctr - 1)
Else
ReturnData.Add GroupNames(Ctr - 1).SubMatches(0), RealMatches(0).SubMatches(Ctr - 1)
End If
Next
ReturnData.Add "98", RealMatches.Item(0)
ReturnData.Add "99", GroupNames.Count
' Return the result.
Set RegexMatch = ReturnData
End Function
For further improvement this code could be the base of a class module for replacement of the VBScript regex.

Use Regular Expressions to find a date in a String

I am trying to use regular Expressions to extract the dates from a string using VBA in Excel.
The string is:
Previous Month: 9/1/2015 - 9/30/2015
Or it can be :
Custom: 9/1/2015 - 9/30/2015
Do you have any idea how can I achieve that? I have never used Regular Expressions before.
RegEx is a poor choice for dates. You could look for : and examine the remaining tokens:
Sub Foo()
Dim result() As Variant
result = GetDates("Previous Month: 9/1/2015 - 9/30/2015")
If UBound(result) Then
Debug.Print result(0)
Debug.Print result(1)
End If
End Sub
Function GetDates(str As String) As Variant()
Dim tokens() As String
tokens = Split(Mid$(str, InStr(str & ": ", ":")), " ")
If (UBound(tokens) = 3) Then
If IsDate(tokens(1)) And IsDate(tokens(3)) Then
GetDates = Array(CDate(tokens(1)), CDate(tokens(3)))
Exit Function
End If
End If
ReDim GetDates(0)
End Function
Try this:
([1-9]|1[012])[/]([1-9]|[1-2][0-9]|3[01])[/](19|20)[0-9]{2}
search a commandtext string for dates and then replace them with a new date
dim regex as object
Set regex = CreateObject("VBScript.RegExp")
regex.Pattern = "\d{1,2}[-/]\d{1,2}[-/]\d{2,4}"
regex.Global = True
Set regexMatches = regex.Execute(CommandText)
for i=0 to regexMatches.Count()
date1 = regexMatches(i)
next

VBA: REGEX LOOKBEHIND MS ACCESS 2010

I have a function that was written so that VBA can be used in MS Access
I wish to do the following
I have set up my code below. Everything before the product works perfectly but trying to get the information behind just returns "" which is strange as when i execute it within Notepad++ it works perfectly fine
So it looks for the letters MIP and one of the 3 letter codes (any of them)
StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
' PART 1
' If MIP appears in the string, then delete any of the following codes if they exist - DOM, DOX, DDI, ECX, LOW, WPX, SDX, DD6, DES, BDX, CMX,
' WMX, TDX, TDT, BSA, EPA, EPP, ACP, ACA, ACE, ACS, GMB, MAL, USP, NWP.
' EXAMPLE 1. Flagged as: MADHUBESOMIPTDTLTRCOYORGLEJ, should be MADHUBESOMIPLTRCOYORGLEJ
Do While regexp(StringToCheck, "MIP(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False) <> ""
' SELECT EVERYTHING BEFORE THE THREE LETTER CODES
strPart1 = regexp(StringToCheck, ".*^[^_]+(?=DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False)
' SELECT EVERYTHING AFTER THE THREE LETTER CODES
strPart2 = regexp(StringToCheck, "(?<=(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX).*", False)
StringToCheck = strPart1 & strPart2
Loop
The function i am using which i have taken from the internet is below
Function regexp(StringToCheck As Variant, PatternToUse As String, Optional CaseSensitive As Boolean = True) As String
On Error GoTo RefErr:
Dim re As New regexp
re.Pattern = PatternToUse
re.Global = False
re.IgnoreCase = Not CaseSensitive
Dim m
For Each m In re.Execute(StringToCheck)
regexp = UCase(m.Value)
Next
RefErr:
On Error Resume Next
End Function
Just do it in two steps:
Check if MIP is in the string
If it is, remove the other codes.
Like this:
Sub Test()
Dim StringToCheck As String
StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
Debug.Print StringToCheck
Debug.Print CleanupString(StringToCheck)
End Sub
Function CleanupString(str As String) As String
Dim reCheck As New RegExp
Dim reCodes As New RegExp
reCheck.Pattern = "^(?:...)*?MIP"
reCodes.Pattern = "^((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
reCodes.Global = True
If reCheck.Test(str) Then
While reCodes.Test(str)
str = reCodes.Replace(str, "$1")
Wend
End If
CleanupString = str
End Function
Note that the purpose of (?:...)*? is to group the letters in threes.
Since the VBScript regular expression engine does support look-aheads, you can of course also do it in a single regex:
Function CleanupString(str As String) As String
Dim reClean As New RegExp
reClean.Pattern = "^(?=(?:...)*?MIP)((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
While reClean.Test(str)
str = reClean.Replace(str, "$1")
Wend
CleanupString = str
End Function
Personally, I like the two-step check/remove pattern better because it is a lot more obvious and therefore more maintainable.
Non RE option:
Function DeMIPString(StringToCheck As String) As String
If Not InStr(StringToCheck, "MIP") Then
DeMIPString = StringToCheck
Else
Dim i As Long
For i = 1 To Len(StringToCheck) Step 3
Select Case Mid$(StringToCheck, i, 3)
Case "MIP", "DOM", "DOX", "DDI", "ECX", "LOW", "WPX", "SDX", "DD6", "DES", "BDX", "CMX", "WMX", "TDX", "TDT", "BSA", "EPA", "EPP", "ACP", "ACA", "ACE", "ACS", "GMB", "MAL", "USP", "NWP":
Case Else
DeMIPString = DeMIPString & Mid$(StringToCheck, i, 3)
End Select
Next
End If
End Function

match date pattern in the string vba excel

Edit:
Since my string became more and more complicated looks like regexp is the only way.
I do not have a lot experience in that and your help is much appreciated.
Basically from what I read on the web I construct the following exp to try matching occurrence in my sample string:
"My very long long string 12Mar2012 is right here 23Apr2015"
[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]
and trying this code. I do not have any match. Any good link on regexp tutorial much appreciated.
Dim re, match, RegExDate
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(^[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]$)"
re.Global = True
For Each match In re.Execute(str)
MsgBox match.Value
RegExDate = match.Value
Exit For
Next
Thank you
This code validates the actual date from the Regexp using DateValuefor robustness
Sub Robust()
Dim Regex As Object
Dim RegexMC As Object
Dim RegexM As Object
Dim strIn As String
Dim BDate As Boolean
strIn = "My very long long string 12Mar2012 is right here 23Apr2015 and 30Feb2002"
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Pattern = "(([0-9])|([0-2][0-9])|([3][0-1]))(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})"
.Global = True
If .test(strIn) Then
Set RegexMC = .Execute(strIn)
On Error Resume Next
For Each RegexM In RegexMC
BDate = False
BDate = IsDate(DateValue(RegexM.submatches(0) & " " & RegexM.submatches(4) & " " & RegexM.submatches(5)))
If BDate Then Debug.Print RegexM
Next
On Error GoTo 0
End If
End With
End Sub
thanks for all your help !!!
I managed to solve my problem using this simple code.
Dim rex As New RegExp
Dim dateCol As New Collection
rex.Pattern = "(\d|\d\d)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})?"
rex.Global = True
For Each match In rex.Execute(sStream)
dateCol.Add match.Value
Next
Just note that on my side I'm sure that I got valid date in the string so the reg expression is easy.
thnx
Ilya
The following is a quick attempt I made. It's far from perfect.
Basically, it splits the string into words. While looping through the words it cuts off any punctuation (period and comma, you might need to add more).
When processing an item, we try to remove each month name from it. If the string gets shorter we might have a date.
It checks to see if the length of the final string is about right (5 or 6 characters, 1 or 2 + 4 for day and year)
You could instead (or also) check to see that there all numbers.
Private Const MonthList = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
Public Function getDates(ByVal Target As String) As String
Dim Data() As String
Dim Item As String
Dim Index As Integer
Dim List() As String
Dim Index2 As Integer
Dim Test As String
Dim Result As String
List = Split(MonthList, ",")
Data = Split(Target, " ")
Result = ""
For Index = LBound(Data) To UBound(Data)
Item = UCase(Replace(Replace(Data(Index), ".", ""), ",", ""))
For Index2 = LBound(Data) To UBound(Data)
Test = Replace(Item, List(Index2), "")
If Not Test = Item Then
If Len(Test) = 5 Or Len(Test) = 6 Then
If Result = "" Then
Result = Item
Else
Result = Result & ", " & Item
End If
End If
End If
Next Index2
Next
getDates = Result
End Function