Getting only digits in a string using regex in vba - regex

I have a string as such:
tempString = "65.00000000;ACCUMPOINTS;Double:0.0593000000;D"
And my output shld be "65.000000,0.0593000000" or at least give two separated values.
I am using regex to find the values in the string.
My code:
tempString = "65.00000000;ACCUMPOINTS;Double:0.0593000000;D"
temp = NumericOnly(tempString)
Public Function NumericOnly(s As String) As String
Dim s2 As String
Dim replace_hyphen As String
replace_hyphen = " "
Static re As VBScript_RegExp_55.RegExp
If re Is Nothing Then Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.Pattern = "[^\d+]" 'includes space, if you want to exclude space "[^0-9]"("-?\\d+");
s2 = re.Replace(s, vbNullString)
re.Pattern = "[^\d+]"
NumericOnly = re.Replace(s2, replace_hyphen)
End Function
My output is like this:
"650000000000593000000"
How to go about doing this? Need some help.

Just did a minor change in your regex. Instead of just using [^\d+], now [^\d.:+] is being used to indicate that we would like one or more of digits, dots or colons. Then, colon is replaced with a comma to get the desired result.
Sub Test()
Dim tempString As String
tempString = "65.00000000;ACCUMPOINTS;Double:0.0593000000;D"
temp = NumericOnly(tempString)
MsgBox temp
End Sub
Public Function NumericOnly(s As String) As String
Dim s2 As String
Dim replace_hyphen As String
replace_hyphen = " "
Static re As VBScript_RegExp_55.RegExp
If re Is Nothing Then Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.Pattern = "[^\d.:+]"
s2 = re.Replace(s, vbNullString)
re.Pattern = "[^\d.:+]"
NumericOnly = re.Replace(s2, replace_hyphen)
NumericOnly = Replace(NumericOnly, ":", ",")
End Function

Related

Splitting a string and capitalizing letters based on cases

I have some column names with starting coding convention that I would like to transform, see example:
Original Target
------------- --------------
partID Part ID
completedBy Completed By
I have a function in VBA that splits the original string by capital letters:
Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "([a-z])([A-Z])"
SplitCaps = .Replace(strIn, "$1 $2")
End With
End Function
I wrap this function within PROPER, for example, PROPER(SplitCaps(A3)) produces the desired result for the third row but leaves the "D" in ID uncapitalized.
Original Actual
------------- --------------
partID Part Id
completedBy Completed By
Can anyone think of a solution to add cases to this function?
split the word and loop the results and test whether it is all caps before using Proper. then join them back:
Sub kjl()
Dim str As String
str = "partID"
Dim strArr() As String
strArr = Split(SplitCaps(str), " ")
Dim i As Long
For i = 0 To UBound(strArr)
If UCase(strArr(i)) <> strArr(i) Then
strArr(i) = Application.Proper(strArr(i))
End If
Next i
str = Join(strArr, " ")
Debug.Print str
End Sub
If you want a formula to do what you are asking then:
=TEXTJOIN(" ",TRUE,IF(EXACT(UPPER(TRIM(MID(SUBSTITUTE(SplitCaps(A1)," ",REPT(" ",999)),{1,999},999))),TRIM(MID(SUBSTITUTE(SplitCaps(A1)," ",REPT(" ",999)),{1,999},999))),TRIM(MID(SUBSTITUTE(SplitCaps(A1)," ",REPT(" ",999)),{1,999},999)),PROPER(TRIM(MID(SUBSTITUTE(SplitCaps(A1)," ",REPT(" ",999)),{1,999},999)))))
Entered as an array formula by confirming with Ctrl-Shift-Enter instead of Enter when exiting edit mode.
Or use the code above as a Function:
Function propSplitCaps(str As String)
Dim strArr() As String
strArr = Split(SplitCaps(str), " ")
Dim i As Long
For i = 0 To UBound(strArr)
If UCase(strArr(i)) <> strArr(i) Then
strArr(i) = Application.Proper(strArr(i))
End If
Next i
propSplitCaps = Join(strArr, " ")
End Function
and call it =propSplitCaps(A1)
Instead of using the Proper function, just capitalize the first letter of each word after you have split the string on the transition.
Option Explicit
Function Cap(s As String) As String
Dim RE As RegExp, MC As MatchCollection, M As Match
Const sPatSplit = "([a-z])([A-Z])"
Const sPatFirstLtr As String = "\b(\w)"
Const sSplit As String = "$1 $2"
Set RE = New RegExp
With RE
.Global = True
.Pattern = sPatSplit
.IgnoreCase = False
If .Test(s) = True Then
s = .Replace(s, sSplit)
.Pattern = sPatFirstLtr
Set MC = .Execute(s)
For Each M In MC
s = WorksheetFunction.Replace(s, M.FirstIndex + 1, 1, UCase(M))
Next M
End If
End With
Cap = s
End Function

VBA Find a string that has range of value in it with Regular Expression and replace with each value in that range

First of all, sorry for the long title. I just don't know how to put it succinctly. I am trying to do this in VBA as normal Excel will not cut it.
Basically, I have a column. Each cells may contain data in the format of something like
flat 10-14;Flat 18-19;unit 7-9;flat A-D;ABC;DEF;
What I need is to find the string that has "-" in it, and attempt to replace it with anything in between. so the above code will become
Flat 10, Flat 11; Flat 12, Flat 14;Flat 18, Flat 19;Unit 7, Unit 8, Unit 9;Flat A, Flat B, Flat C; ABC;DEF;
With the help of this article on RegExpression, I have managed to work out how to expand the bits of data with number, which I will post the code below. However, I don't know a good way to expand the data with the letter. i.e from Flat A-C to Flat A, Flat B, Flat C
My code is below, please feel free to give any pointers if you think it can be more efficient. I am very much an amateur at this. Thank you in advance.
Sub CallRegEx()
Dim r As Match
Dim mcolResults As MatchCollection
Dim strInput As String, strPattern As String
Dim test As String, StrOutput As String, prefix As String
Dim startno As Long, endno As Long
Dim myrange As Range
strPattern = "(Flat|Unit) [0-9]+-+[0-9]+"
With Worksheets("Sheet1")
lrow = .Cells(Rows.Count, 9).End(xlUp).Row
For Each x In .Range("A2:A" & lrow)
strInput = Range("A" & x.Row).Value
Set mcolResults = RegEx(strInput, strPattern, True, , True)
If Not mcolResults Is Nothing Then
StrOutput = strInput
For Each r In mcolResults
startno = Mid(r, (InStr(r, "-") - 2), 2)
endno = Mid(r, (InStr(r, "-") + 1))
prefix = Mid(r, 1, 4)
test = ""
For i = startno To endno - 1
test = test & prefix & " " & i & ","
Next i
test = test & prefix & " " & endno
'this is because I don't want the comma at the end of the last value
StrOutput = Replace(StrOutput, r, test)
Debug.Print r ' remove in production
Next r
End If
.Range("D" & x.Row).Value = StrOutput
Next x
End With
End Sub
This function below is to support the Sub above
Function RegEx(strInput As String, strPattern As String, _
Optional GlobalSearch As Boolean, Optional MultiLine As Boolean, _
Optional IgnoreCase As Boolean) As MatchCollection
Dim mcolResults As MatchCollection
Dim objRegEx As New RegExp
If strPattern <> vbNullString Then
With objRegEx
.Global = GlobalSearch
.MultiLine = MultiLine
.IgnoreCase = IgnoreCase
.Pattern = strPattern
End With
If objRegEx.test(strInput) Then
Set mcolResults = objRegEx.Execute(strInput)
Set RegEx = mcolResults
End If
End If
End Function
Letters have character codes that are ordinal (A < B < C ...) & these can be accessed via asc()/chr$() - here is one way to do it:
inputStr = "flat 10-14;Flat 18-19;unit 7-9;flat A-D;ABC;DEF;flat 6;flat T"
Dim re As RegExp: Set re = New RegExp
re.Pattern = "(flat|unit)\s+((\d+)-(\d+)|([A-Z])-([A-Z]))"
re.Global = True
re.IgnoreCase = True
Dim m As MatchCollection
Dim start As Variant, fin As Variant
Dim tokens() As String
Dim i As Long, j As Long
Dim isDigit As Boolean
tokens = Split(inputStr, ";")
For i = 0 To UBound(tokens) '// loop over tokens
Set m = re.Execute(tokens(i))
If (m.Count) Then
With m.Item(0)
start = .SubMatches(2) '// first match number/letter
isDigit = Not IsEmpty(start) '// is letter or number?
If (isDigit) Then '// number
fin = .SubMatches(3)
Else '// letter captured as char code
start = Asc(.SubMatches(4))
fin = Asc(.SubMatches(5))
End If
tokens(i) = ""
'// loop over items
For j = start To fin
tokens(i) = tokens(i) & .SubMatches(0) & " " & IIf(isDigit, j, Chr$(j)) & ";"
Next
End With
ElseIf i <> UBound(tokens) Then tokens(i) = tokens(i) & ";"
End If
Next
Debug.Print Join(tokens, "")
flat 10;flat 11;flat 12;flat 13;flat 14;Flat 18;Flat 19;unit 7;unit 8;unit 9;flat A;flat B;flat C;flat D;ABC;DEF;flat 6;flat T

VBA RegEx identifiying multiple patterns - Excel

I´m rather new to VBA RegEx, but thanks to this stackoverflow thread,
I am getting to it. I have a problem and hope that somebody can help. In row 1 in Excel I have multiple Strings with different city/country attribution. Example:
A1: "/flights/munich/newyork"
A2: "flights/munich/usa"
A3: "flights/usa/germany"
...
What I wanna have now, is a VBA that goes though those strings with RegEx and prompts a categorisation value if the RegEx is met. Example:
A1: "/flights/munich/new-york" categorises as "city to city"
A2: "flights/munich/usa" categorises as "city to country"
A3: "flights/usa/germany" categorises as "country to country"
Right now, I have a code that will return the "city to city" category to me, but I can´t figure out who to get a code that handles the multiple patterns and returns the corresponding output string.
In short, a logic like this is needed:
If A1 contains RegEx ".*/munich/new-york" then return output string "city to city", if A1 contains RegEx ".*/munich/usa" then return output string "city to country" and so on.
Guess this has something to to with how to handle multiple if statements with multiple patterns in VBA, but I can´t figure it out.
This is how my code looks right now - hope you can help!
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "(munich|berlin|new-york|porto|copenhagen|moscow)/(munich|berlin|new-york|porto|copenhagen|moscow)"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = "CITY TO CITY"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "NO MATCH FOUND"
End If
End If
End Function
Like #dbmitch mentions in the comments, you can't do this with a single Regex - you'll need to use 3 of them. I'd personally put the cities and countries into Consts and build the patterns as need. You can then pass them (along with the strReplace) as parameters to simpleCellRegex function:
Const CITIES As String = "(munich|berlin|new-york|porto|copenhagen|moscow)"
Const COUNTRIES As String = "(germany|france|usa|russia|etc)"
Function simpleCellRegex(Myrange As Range, strReplace As String, strPattern As String) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
If strPattern <> "" Then
strInput = Myrange.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "NO MATCH FOUND"
End If
End If
End Function
Called like this:
foo = simpleCellRegex(someRange, "CITY TO CITY", CITIES & "/" & CITIES)
foo = simpleCellRegex(someRange, "CITY TO COUNTRY", CITIES & "/" & COUNTRIES)
foo = simpleCellRegex(someRange, "COUNTRY TO COUNTRY", COUNTRIES & "/" & COUNTRIES)
Note: If you're doing this in a loop, it would be wildly more efficient to only build each RegExp once, and then pass that as a parameter instead of the pattern.
A little (maybe) "out of the box" solution:
Option Explicit
Sub main()
Const CITIES As String = "MUNICH|BERLIN|NEWYORK|PORTO|COPENHAGEN|MOSCOW"
Const COUNTRIES As String = "USA|GERMANY"
With Worksheets("FLIGHTS")
With .Range("A1", .Cells(.Rows.Count, 1).End(xlUp))
With .Offset(, 1)
.value = .Offset(, -1).value
.Replace What:="*flights/", replacement:="", LookAt:=xlPart, MatchCase:=False
.Replace What:="/", replacement:=" to ", LookAt:=xlPart, MatchCase:=False
ReplaceElement .Cells, CITIES, "city"
ReplaceElement .Cells, COUNTRIES, "country"
End With
End With
End With
End Sub
Sub ReplaceElement(rng As Range, whats As String, replacement As String)
Dim elem As Variant
With rng
For Each elem In Split(whats, "|")
.Replace What:=elem, replacement:=replacement, LookAt:=xlPart, MatchCase:=False
Next elem
End With
End Sub
note
replace() methods can be taught to ignore cases but beware to have consistency between names: "newyork" will never match "new-york"
I would do this a bit differently.
I would make the regex pattern the start or end point, and match it against a comma delimited string of cities or countries.
Given what you have presented, the start and end points will always be the last two / separated units.
So something like:
Option Explicit
Sub CategorizeFlights()
Dim rData As Range, vData As Variant
Dim arrCity() As Variant
Dim arrCountry() As Variant
Dim I As Long, J As Long
Dim sCategoryStart As String, sCategoryEnd As String
Dim V As Variant
Dim RE As RegExp
arrCity = Array("munich", "newyork")
arrCountry = Array("usa", "germany")
Set RE = New RegExp
With RE
.Global = False
.ignorecase = True
End With
Set rData = Range("A2", Cells(Rows.Count, "A").End(xlUp)).Resize(columnsize:=2)
vData = rData
For I = 1 To UBound(vData, 1)
V = Split(vData(I, 1), "/")
RE.Pattern = "\b" & V(UBound(V) - 1) & "\b"
If RE.test(Join(arrCity, ",")) = True Then
sCategoryStart = "City to "
ElseIf RE.test(Join(arrCountry, ",")) = True Then
sCategoryStart = "Country to "
Else
sCategoryStart = "Unknown to "
End If
RE.Pattern = "\b" & V(UBound(V)) & "\b"
If RE.test(Join(arrCity, ",")) = True Then
sCategoryEnd = "City"
ElseIf RE.test(Join(arrCountry, ",")) = True Then
sCategoryEnd = "Country"
Else
sCategoryEnd = "Unknown"
End If
vData(I, 2) = sCategoryStart & sCategoryEnd
Next I
With rData
.Value = vData
.EntireColumn.AutoFit
End With
End Sub
As is sometimes the case, a similar algorithm can be used without regular expressions, but I assume this is an exercise in its use.

Excel Regular Expression: Add Quote (") to Values in Two Columns

So I have a CSV file with two columns that have items listed out like below:
The goal is to create a Excel VB code that will go through columns H and I, and add a quote (") to the beginning and end of each 6 digit group (e.g., H67100 into "H67100"). Additionally, the comma should be left alone.
I know the code is not complete as of yet, but this is what I have thus far. I think I am fine with the beginning part but after the match is found, I think my logic/syntax is incorrect. A little guidance and feedback is much appreciated:
Private Sub splitUpRegexPattern2()
Dim strPattern As String: strPattern = "(^[a-zA-Z0-9]{6}(?=[,])"
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange = ActiveSheet.Range("H:I")
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = """" & strInput & """"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
End With
End Sub
UPDATED CODE:
Function splitUpRegexPattern2 (Myrange As Range) as String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim Myrange As Range
Dim strReplace As String
Dim strOutput As String
strPattern = "(^[a-zA-Z0-9]{6}(?=[,])"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = """" & strInput & """"
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
End With
If regEx.test(strInput) Then
simpleCellRegex = regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "Not matched"
End If
End If
End FUNCTION
Adding example CSV file. Download Sample CSV File
This answer assumes you can get the values of each cell you are interested in.
There's no need to use RegEx in this case as your values appear to be simple comma-delimited data.
Public Const DOUBLE_QUOTE As String = Chr(34)
'''
'''<summary>This function splits a string into an array on commas, adds quotes around each element in the array, the joins the array back into a string placing a comma between each element.</summary>
'''
Public Function QuotedValues(ByVal input As String) As String
Dim words As String() = input.Split(New Char() {","})
Dim result As String = String.Empty
words = (From w In words Select DOUBLE_QUOTE & w.Trim & DOUBLE_QUOTE).ToArray
result = String.Join(", ", words)
Return result
End Function

match date pattern in the string vba excel

Edit:
Since my string became more and more complicated looks like regexp is the only way.
I do not have a lot experience in that and your help is much appreciated.
Basically from what I read on the web I construct the following exp to try matching occurrence in my sample string:
"My very long long string 12Mar2012 is right here 23Apr2015"
[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]
and trying this code. I do not have any match. Any good link on regexp tutorial much appreciated.
Dim re, match, RegExDate
Set re = CreateObject("vbscript.regexp")
re.Pattern = "(^[0-9][0-9] + [a-zA-Z] + [0-9][0-9][0-9][0-9]$)"
re.Global = True
For Each match In re.Execute(str)
MsgBox match.Value
RegExDate = match.Value
Exit For
Next
Thank you
This code validates the actual date from the Regexp using DateValuefor robustness
Sub Robust()
Dim Regex As Object
Dim RegexMC As Object
Dim RegexM As Object
Dim strIn As String
Dim BDate As Boolean
strIn = "My very long long string 12Mar2012 is right here 23Apr2015 and 30Feb2002"
Set Regex = CreateObject("vbscript.regexp")
With Regex
.Pattern = "(([0-9])|([0-2][0-9])|([3][0-1]))(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})"
.Global = True
If .test(strIn) Then
Set RegexMC = .Execute(strIn)
On Error Resume Next
For Each RegexM In RegexMC
BDate = False
BDate = IsDate(DateValue(RegexM.submatches(0) & " " & RegexM.submatches(4) & " " & RegexM.submatches(5)))
If BDate Then Debug.Print RegexM
Next
On Error GoTo 0
End If
End With
End Sub
thanks for all your help !!!
I managed to solve my problem using this simple code.
Dim rex As New RegExp
Dim dateCol As New Collection
rex.Pattern = "(\d|\d\d)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\d{4})?"
rex.Global = True
For Each match In rex.Execute(sStream)
dateCol.Add match.Value
Next
Just note that on my side I'm sure that I got valid date in the string so the reg expression is easy.
thnx
Ilya
The following is a quick attempt I made. It's far from perfect.
Basically, it splits the string into words. While looping through the words it cuts off any punctuation (period and comma, you might need to add more).
When processing an item, we try to remove each month name from it. If the string gets shorter we might have a date.
It checks to see if the length of the final string is about right (5 or 6 characters, 1 or 2 + 4 for day and year)
You could instead (or also) check to see that there all numbers.
Private Const MonthList = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"
Public Function getDates(ByVal Target As String) As String
Dim Data() As String
Dim Item As String
Dim Index As Integer
Dim List() As String
Dim Index2 As Integer
Dim Test As String
Dim Result As String
List = Split(MonthList, ",")
Data = Split(Target, " ")
Result = ""
For Index = LBound(Data) To UBound(Data)
Item = UCase(Replace(Replace(Data(Index), ".", ""), ",", ""))
For Index2 = LBound(Data) To UBound(Data)
Test = Replace(Item, List(Index2), "")
If Not Test = Item Then
If Len(Test) = 5 Or Len(Test) = 6 Then
If Result = "" Then
Result = Item
Else
Result = Result & ", " & Item
End If
End If
End If
Next Index2
Next
getDates = Result
End Function