Replace all characters in a String, unless they are within double quotes - regex

I am sadly unfamiliar with regular expressions since I'm not a programmer, but I would guess this problem is easily solvable using regex (I am definitely open to other suggestions, though)
I want to use the split function to split the value of a cell and spread it out over multiple cells. The delimiter is a comma. The problem though is that some users use commas in comments for example, which the Split function uses to split the string mid-comment.
for example a cell containing the value:
0001,"name","address","likes apples, oranges
and plums"
needs to be split into multiple cells saying 0001 "name" "address" and "likes apples, oranges and plums".
my code splits the comment as well, and I want it to ignore the comment or everything else withing double quotes. here is a sample:
Sub SplittingStrings()
Dim wb As Workbook
Dim ws As Worksheet
Dim strInput As String
Dim counter As Integer
Dim cell As Variant
Dim splitCount As Integer
Dim splitString() As String
Dim category As Variant
Set wb = ActiveWorkbook
Set ws = wb.ActiveSheet
counter = 1
For Each cell In Range("A1", "A2000")
If cell.Value <> "" Then
strInput = cell.Value
splitCount = 2
splitString = Split(strInput, ",")
For Each category In splitString
Cells(counter, splitCount).Value = category
splitCount = splitCount + 1
Next category
End If
counter = counter + 1
Next cell
End Sub
how do I exclude stuff withing the double quotes from being considered by the split function?

Please give this a try and see if you get the desired output.
Tweak the variables if required.
Sub SplittingStringsUsingRegEx()
Dim lr As Long, c As Long
Dim Rng As Range, cell As Range
Dim RE, Match, Matches
Application.ScreenUpdating = False
lr = Cells(Rows.Count, 1).End(xlUp).Row
Set Rng = Range("A1:A" & lr)
Set RE = CreateObject("VBScript.RegExp")
With RE
.Global = True
.Pattern = "\d+|"".+?"""
End With
c = 2
For Each cell In Rng
If RE.test(cell.Value) Then
Set Matches = RE.Execute(cell.Value)
For Each Match In Matches
Cells(cell.Row, c) = Replace(Match, """", "")
c = c + 1
Next Match
End If
c = 2
Next cell
Application.ScreenUpdating = True
End Sub

Without Regex:
We need to "protect" commas that are encapsulated with double quotes:
Sub ProtectStuff()
Dim i As Long, N As Long, v As String, v2 As String
Dim ProtectMode As Boolean, DQ As String, rep As String
Dim CH As String, arr
DQ = """"
rep = Chr(1)
N = Cells(Rows.Count, "A").End(xlUp).Row
For i = 1 To N
v = Cells(i, "A").Value
If v <> "" Then
ProtectMode = False
v2 = ""
For j = 1 To Len(v)
CH = Mid(v, j, 1)
If CH = DQ Then ProtectMode = Not ProtectMode
If CH = "," And ProtectMode Then CH = rep
v2 = v2 & CH
Next j
End If
arr = Split(v2, ",")
j = 2
For Each a In arr
Cells(i, j) = Replace(a, rep, ",")
j = j + 1
Next a
Next i
End Sub

Text to Columns will do what you want, differently than the split function.

Related

VBA Find a string that has range of value in it with Regular Expression and replace with each value in that range

First of all, sorry for the long title. I just don't know how to put it succinctly. I am trying to do this in VBA as normal Excel will not cut it.
Basically, I have a column. Each cells may contain data in the format of something like
flat 10-14;Flat 18-19;unit 7-9;flat A-D;ABC;DEF;
What I need is to find the string that has "-" in it, and attempt to replace it with anything in between. so the above code will become
Flat 10, Flat 11; Flat 12, Flat 14;Flat 18, Flat 19;Unit 7, Unit 8, Unit 9;Flat A, Flat B, Flat C; ABC;DEF;
With the help of this article on RegExpression, I have managed to work out how to expand the bits of data with number, which I will post the code below. However, I don't know a good way to expand the data with the letter. i.e from Flat A-C to Flat A, Flat B, Flat C
My code is below, please feel free to give any pointers if you think it can be more efficient. I am very much an amateur at this. Thank you in advance.
Sub CallRegEx()
Dim r As Match
Dim mcolResults As MatchCollection
Dim strInput As String, strPattern As String
Dim test As String, StrOutput As String, prefix As String
Dim startno As Long, endno As Long
Dim myrange As Range
strPattern = "(Flat|Unit) [0-9]+-+[0-9]+"
With Worksheets("Sheet1")
lrow = .Cells(Rows.Count, 9).End(xlUp).Row
For Each x In .Range("A2:A" & lrow)
strInput = Range("A" & x.Row).Value
Set mcolResults = RegEx(strInput, strPattern, True, , True)
If Not mcolResults Is Nothing Then
StrOutput = strInput
For Each r In mcolResults
startno = Mid(r, (InStr(r, "-") - 2), 2)
endno = Mid(r, (InStr(r, "-") + 1))
prefix = Mid(r, 1, 4)
test = ""
For i = startno To endno - 1
test = test & prefix & " " & i & ","
Next i
test = test & prefix & " " & endno
'this is because I don't want the comma at the end of the last value
StrOutput = Replace(StrOutput, r, test)
Debug.Print r ' remove in production
Next r
End If
.Range("D" & x.Row).Value = StrOutput
Next x
End With
End Sub
This function below is to support the Sub above
Function RegEx(strInput As String, strPattern As String, _
Optional GlobalSearch As Boolean, Optional MultiLine As Boolean, _
Optional IgnoreCase As Boolean) As MatchCollection
Dim mcolResults As MatchCollection
Dim objRegEx As New RegExp
If strPattern <> vbNullString Then
With objRegEx
.Global = GlobalSearch
.MultiLine = MultiLine
.IgnoreCase = IgnoreCase
.Pattern = strPattern
End With
If objRegEx.test(strInput) Then
Set mcolResults = objRegEx.Execute(strInput)
Set RegEx = mcolResults
End If
End If
End Function
Letters have character codes that are ordinal (A < B < C ...) & these can be accessed via asc()/chr$() - here is one way to do it:
inputStr = "flat 10-14;Flat 18-19;unit 7-9;flat A-D;ABC;DEF;flat 6;flat T"
Dim re As RegExp: Set re = New RegExp
re.Pattern = "(flat|unit)\s+((\d+)-(\d+)|([A-Z])-([A-Z]))"
re.Global = True
re.IgnoreCase = True
Dim m As MatchCollection
Dim start As Variant, fin As Variant
Dim tokens() As String
Dim i As Long, j As Long
Dim isDigit As Boolean
tokens = Split(inputStr, ";")
For i = 0 To UBound(tokens) '// loop over tokens
Set m = re.Execute(tokens(i))
If (m.Count) Then
With m.Item(0)
start = .SubMatches(2) '// first match number/letter
isDigit = Not IsEmpty(start) '// is letter or number?
If (isDigit) Then '// number
fin = .SubMatches(3)
Else '// letter captured as char code
start = Asc(.SubMatches(4))
fin = Asc(.SubMatches(5))
End If
tokens(i) = ""
'// loop over items
For j = start To fin
tokens(i) = tokens(i) & .SubMatches(0) & " " & IIf(isDigit, j, Chr$(j)) & ";"
Next
End With
ElseIf i <> UBound(tokens) Then tokens(i) = tokens(i) & ";"
End If
Next
Debug.Print Join(tokens, "")
flat 10;flat 11;flat 12;flat 13;flat 14;Flat 18;Flat 19;unit 7;unit 8;unit 9;flat A;flat B;flat C;flat D;ABC;DEF;flat 6;flat T

How to split multiple UPPERCASE/delimiter/text using regex? (VBA)

I've got 2k+ records with string followyng rule (LOCATION I UPPERCASE - text) x several times, like this:
I- TRZON - Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego
powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry
nie stwierdza się bakterii odpowiadających Helicobacter pylori. II-ANTRUM +
KĄT - Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie
przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie
stwierdza się bakterii odpowiadajacych Helicobacter pylori.
Which I'm trying to split as follows using regex:
Location - I- TRZON
Text Fragmenty błony śluzowej trzonu żołądka w stanie przewleklego powierzchownego (++) aktywnego (++) zapalenia. W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadających Helicobacter pylori.
Location II- ANTRUM + KĄT
Text Fragmenty błony śluzowej części odźwiernikowej żołądka w stanie przewlekłego głębokiego zapalenia (+++). W barwieniu Warthin-Starry nie stwierdza się bakterii odpowiadajacych Helicobacter pylori.
So far I managed to do this by creating something like this
([A-ZŻŹĆĄŚĘŁÓŃ\s,+\-0-9]*)[\s]?-+?(.*[^A-ZŻŹĆĄŚĘŁÓŃ\s,+\-0-9]) ([A-ZŻŹĆĄŚĘŁÓŃ\s,+\-0-9]+)*[\s]?-+?(.*)
But obviously it cannot manage those strings, where one or three pairs of location and text are possible. The main problems I encountered are hyphens used in text (see - Warthin-Starry).
If I try something more elegant, like
([A-ZŻŹĆŃĄŚŁĘÓ]+[\s-\+,]*?)-(.*)
It obviously matches only the word before the first hyphen into the first group, and everything else into next.
To sum up: how to translate into regex something like: match, splitting into two groups: 1) UPPERCASE text with any other signs (no lowercase), followed by 2) text, that is as long as you encounter another UPPERCASE text.
I must admit that I'm fairly new to regex, but I searched for a few days and nothing seems to work universally (and it's only the beginning of extracting data from this string...)
I`m not sure how you can do this with RegEx, i have a hard time myself to get my head around that syntax.
However, I would probably just use the DATA/Text To Columns, split with hyphen, and just concatenate back together the extra splits caused by hyphens in text.
If is not just an one off processing, you can always use VBA as well, something like:
Sub TextToColumns()
Dim ws As Worksheet: Set ws = ActiveSheet
Dim lRow As Long, sndHyphen As Long, R As Long
lRow = ws.Cells(1, 1).End(xlDown).Row
For R = 1 To lRow 'Iterate through all rows containing this data
sndHyphen = InStr(InStr(ws.Cells(R, 1), "-") + 1, ws.Cells(R, 1), "-") 'Get the hyphens positions
ws.Cells(R, 2) = Left(ws.Cells(R, 1), sndHyphen - 2) 'Get the data before the second hyphen
ws.Cells(R, 3) = Mid(ws.Cells(R, 1), sndHyphen + 2) 'Get the data after the second hyphen
Next R
End Sub
Thank you for your input. I finally managed to do this using two subs:
Sub locfinder()
Dim myregexp As RegExp
Set myregexp = New RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Dim str As String
Dim i, j As Integer
Dim endrow As Integer
Sheets("dane").Activate
endrow = LastRow
Dim rozp1, rozp2 As String
For i = 1 To endrow
str = Sheets("Dane").Cells(i, 10).Value
myregexp.Global = True
myregexp.Pattern = "([A-ZŻŹĆĄŚĘŁÓŃ]+[\s,+\-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[\s,+\-0-9]*[A-ZŻŹĆĄŚĘŁÓŃ]*[\s,+\-0-9]*|Trzon|Antrum)\s?-"
If Not str = "" Then
Set myMatches = myregexp.Execute(str)
j = 1
For Each myMatch In myMatches
If myMatch.Value <> "" Then
Sheets("Dane").Cells(i, j + 10).Value = Trim(myMatch.SubMatches(0))
j = j + 1
End If
Next
End If
Next i
End Sub
Then extracted diagnoses using
Sub rozpfinder()
Dim myregexp As RegExp
Set myregexp = New RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Dim str As String
Dim i, j As Integer
Dim endrow As Integer
Sheets("dane").Activate
endrow = LastRow
Dim rozp, loc As Collection
Dim splitted() As String
Dim rozpoznanie, lokalizacja
Dim wb As Workbook
Dim ws As Worksheet
Set wb = ThisWorkbook
Set ws = wb.Worksheets("Dane")
For i = 1 To endrow
str = ws.Cells(i, 10).Value
Set loc = New Collection
Set rozp = New Collection
For j = 1 To 2
If ws.Cells(i, 10 + j) <> "" Then
loc.Add ws.Cells(i, 10 + j).Value
End If
Next j
For Each lokalizacja In loc
If lokalizacja <> "I" Then
str = Replace(str, lokalizacja, "xxx")
Else
lokalizacja = "I-"
str = Replace(str, lokalizacja, "xxx-")
End If
Next lokalizacja
splitted = split(str, "xxx")
For j = 0 To UBound(splitted)
If splitted(j) <> "" Then
myregexp.Pattern = "-[^\w]"
myMatch = myregexp.Replace(splitted(j), "")
rozp.Add (Trim(myMatch))
End If
Next j
j = 1
For Each rozpoznanie In rozp
ws.Cells(i, 12 + j).Value = rozpoznanie
j = j + 1
Next rozpoznanie
Next i
End Sub
While it wasn't 100% accurate, the number of records I need to correct is about 1%, so I guess it works :)

Extract text from 2 strings from selected Outlook email

I have code to import email body data from Outlook to Excel. I only need Name, ID, code from the email.
I have done everything except to extract the ID from a fixed sentence:
cn=SVCLMCH,OU=Users,OU=CX,DC=dm001,DC=corp,DC=dcsa,DC=com
The id is SVCLMCH in this case, that means I need to extract the text between "cn=" and ",OU=Users".
Sub import_code()
Dim O As Outlook.Application
Set O = New Outlook.Application
Dim ONS As Outlook.Namespace
Set ONS = O.GetNamespace("MAPI")
Dim OMAIL As Outlook.MailItem
Set OMAIL = Nothing
Dim ws As Object
Set ws = ThisWorkbook.Worksheets("Import code from Outlook")
Dim rcount As Long
Dim vText As Variant
Dim sText As String
Dim i As Long
If O.ActiveExplorer.Selection.Count = 0 Then
msgbox "No Items selected!", vbCritical, "Error"
End If
On Error Resume Next
'Process each selected record
rcount = ws.UsedRange.Rows.Count
For Each OMAIL In O.ActiveExplorer.Selection
sText = OMAIL.Body
vText = Split(sText, Chr(13))
'Find the next empty line of the worksheet
rcount = rcount + 1
'Check each line of text in the message body
For i = UBound(vText) To 0 Step -1
If InStr(1, vText(i), "Password Generated and set for:") > 0 Then
vItem = Split(vText(i), Chr(58))
ws.Range("A" & rcount) = Trim(vItem(1))
End If
If InStr(1, vText(i), "cn=") > 0 Then
vItem = Split(vText(i), Chr(58))
ws.Range("b" & rcount) = Trim(vItem(1))
End If
If InStr(1, vText(i), "Password:") > 0 Then
vItem = Split(vText(i), Chr(58))
ws.Range("c" & rcount) = Trim(vItem(1))
End If
Next i
Next OMAIL
End Sub
The trick here is to use the Split() function
Dim Arr() As String
Dim j As Integer
Dim k As Integer
Dim strvar As String
Dim strval As String
Dim strID As String
If InStr(1, vtext(i), "cn=") > 0 Then
' split the whole line in an array - "," beeing the value separator
Arr = Split(vtext(i), ",")
' loop through all array elements
For j = 0 To UBound(r) - 1
' find the position of =
k = InStr(Arr(j), "=")
strvar = Left(Arr(j), k - 1) ' this is the variable name, ie "cn"
strval = Mid(Arr(j), k + 1) ' this is the value, ie "SVCLMCH"
' now do what you want with a specific variable
Select Case strvar
Case "cn"
strID = strval
Case Else
' do nothing
End Select
Next j
End If
you can use a helper function like this:
Function GetID(strng As String)
Dim el As Variant
For Each el In Split(strng, ",")
If InStr(1, el, "cn=") > 0 Then
GetID = Mid(el, InStr(1, el, "cn=") + 3)
Exit Function
End If
Next
End Function
and your main code would exploit it as:
If InStr(1, vText(i), "cn=") > 0 Then ws.Range("b" & rcount) = GetID(CStr(vText(i)))
Use Regular Expression extract the ID from the sentence
Example Pattern = "(?<=cn=)([^.]+)(?=\,OU=Users)"
https://regex101.com/r/67u84s/2
Code Example
Option Explicit
Private Sub Examplea()
Dim Matches As Variant
Dim RegEx As Object
Set RegEx = CreateObject("VbScript.RegExp")
Dim olApp As Object
Set olApp = CreateObject("Outlook.Application")
Dim Item As Outlook.MailItem
Set Item = olApp.ActiveExplorer.Selection.Item(1)
Dim Pattern As String
Pattern = "(?<=cn=)([^.]+)(?=\,OU=Users)"
With RegEx
.Global = False
.Pattern = Pattern
.IgnoreCase = True
Set Matches = .Execute(Item.Body)
End With
If Matches.Count > 0 Then
Debug.Print Matches(0).SubMatches(0)
With ThisWorkbook.Sheets("Sheet1")
.Range("A1").Value = Trim(Matches(0).SubMatches(0))
End With
End If
End Sub

Extract pattern from column

I am struggling with a huge Excel sheet (with 200K rows), where I need to extract from a certain column (B) list of all email addresses present in the string.
What I want to achieve:
Extract the email from string
convert (at) to # and (dot) to .
Save name and email in separate columns
Example of column B:
Shubhomoy Biswas <biswas_shubhomoy777(at)yahoo(dot)com>
Puneet Arora <ar.puneetarora(at)gmail(dot)com>
Anand Upadhyay <001.anand(at)gmail(dot)com>
Rajat Gupta <rajatgupta0889(at)gmail(dot)com>
Sarvesh Sonawane <sarvesh.s(at)suruninfocoresystems.
Although I want to be able to do it on Excel any other Windows-based utility suggestion would be helpful.
this can be done assuming they are all in the same format and only 1 email add per cell
=SUBSTITUTE(SUBSTITUTE(MID(B1,FIND("<",B1)+1,LEN(B1)-FIND("<",B1)-1),"(at)","#"),"(dot)",".")
Give this a try:
Sub splitter()
Dim r As Range, v As String
For Each r In Intersect(Range("B:B"), ActiveSheet.UsedRange)
v = r.Text
If v <> "" Then
ary = Split(v, " <")
r.Offset(0, 1).Value = ary(0)
r.Offset(0, 2).Value = Replace(Replace(Replace(ary(1), ">", ""), "(at)", "#"), "(dot)", ".")
End If
Next r
End Sub
This sub uses columns C and D for the output. Modify the code to suite your needs.
To extract the name, try =TRIM(LEFT(B1,FIND("<",B1)-1)). user3005775's answer works for the email.
You can also do this easily a regular expression (you'll need to add a reference to Microsoft VBScript Regular Expressions):
Private Sub ExtractEmailInfo(value As String)
Dim expr As New RegExp
Dim result As Object
Dim user As String
Dim addr As String
expr.Pattern = "(.+)(<.+>)"
Set result = expr.Execute(value)
If result.Count > 0 Then
user = result(0).SubMatches(0)
addr = result(0).SubMatches(1)
'Strip the < and >
addr = Mid$(addr, 2, Len(addr) - 2)
addr = Replace$(addr, "(at)", "#")
addr = Replace$(addr, "(dot)", ".")
End If
Debug.Print user
Debug.Print addr
End Sub
Replace the Debug.Print calls with whatever you need to do to place them in cells.
This does it for 200 K rows in less than 15 seconds:
Option Explicit
Sub extractPattern()
Dim ws As Worksheet, ur As Range, rng As Range, t As Double
Dim fr As Long, fc As Long, lr As Long, lc As Long
Set ws = Application.ThisWorkbook.Worksheets("Sheet1")
Set ur = ws.UsedRange
fr = 1
fc = 1
lr = ws.Cells(ur.Row + ur.Rows.Count + 1, fc).End(xlUp).Row
lc = ws.Cells(fr, ur.Column + ur.Columns.Count + 1).End(xlToLeft).Column
Set rng = ws.Range(ws.Cells(fr, fc), ws.Cells(lr, fc))
enableXL False
t = Timer
rng.TextToColumns Destination:=ws.Cells(fr, lc + 1), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, _
Space:=True
With ws.Columns(lc + 3)
.Replace What:="(at)", Replacement:="#", LookAt:=xlPart
.Replace What:="(dot)", Replacement:=".", LookAt:=xlPart
.Replace What:="<", Replacement:=vbNullString, LookAt:=xlPart
.Replace What:=">", Replacement:=vbNullString, LookAt:=xlPart
End With
ws.Range(ws.Cells(fr, lc + 1), ws.Cells(fr, lc + 3)).EntireColumn.AutoFit
Debug.Print "Total rows: " & lr & ", Duration: " & Timer - t & " seconds"
enableXL 'Total rows: 200,000, Duration: 14.4296875 seconds
End Sub
Private Sub enableXL(Optional ByVal opt As Boolean = True)
Application.ScreenUpdating = opt
Application.EnableEvents = opt
Application.Calculation = IIf(opt, xlCalculationAutomatic, xlCalculationManual)
End Sub
It places the new data in the first unused column at the end (splits the names as well)

Separating strings from numbers with Excel VBA

I need to
a) separate strings from numbers for a selection of cells
and
b) place the separated strings and numbers into different columns.
For example , Excel sheet is as follows:
A1 B1
100CASH etc.etc.
The result should be:
A1 B1 C1
100 CASH etc.etc.
Utilization of regular expressions will be useful, as there may be different cell formats,such as 100-CASH, 100/CASH, 100%CASH. Once the procedure is set up it won't be hard to use regular expressions for different variations.
I came across a UDF for extracting numbers from a cell. This can easily be modified to extract string or other types of data from cells simply changing the regular expression.
But what I need is not just a UDF but a sub procedure to split cells using regular expressions and place the separated data into separate columns.
I've also found a similar question in SU, however it isn't VBA.
See if this will work for you:
UPDATED 11/30:
Sub test()
Dim RegEx As Object
Dim strTest As String
Dim ThisCell As Range
Dim Matches As Object
Dim strNumber As String
Dim strText As String
Dim i As Integer
Dim CurrCol As Integer
Set RegEx = CreateObject("VBScript.RegExp")
' may need to be tweaked
RegEx.Pattern = "-?\d+"
' Get the current column
CurrCol = ActiveCell.Column
Dim lngLastRow As Long
lngLastRow = Cells(1, CurrCol).End(xlDown).Row
' add a new column & shift column 2 to the right
Columns(CurrCol + 1).Insert Shift:=xlToRight
For i = 1 To lngLastRow ' change to number of rows to search
Set ThisCell = ActiveSheet.Cells(i, CurrCol)
strTest = ThisCell.Value
If RegEx.test(strTest) Then
Set Matches = RegEx.Execute(strTest)
strNumber = CStr(Matches(0))
strText = Mid(strTest, Len(strNumber) + 1)
' replace original cell with number only portion
ThisCell.Value = strNumber
' replace cell to the right with string portion
ThisCell.Offset(0, 1).Value = strText
End If
Next
Set RegEx = Nothing
End Sub
How about:
Sub UpdateCells()
Dim rng As Range
Dim c As Range
Dim l As Long
Dim s As String, a As String, b As String
''Working with sheet1 and column C
With Sheet1
l = .Range("C" & .Rows.Count).End(xlUp).Row
Set rng = .Range("C1:C" & l)
End With
''Working with selected range from above
For Each c In rng.Cells
If c <> vbNullString Then
s = FirstNonNumeric(c.Value)
''Split the string into numeric and non-numeric, based
''on the position of first non-numeric, obtained above.
a = Mid(c.Value, 1, InStr(c.Value, s) - 1)
b = Mid(c.Value, InStr(c.Value, s))
''Put the two values on the sheet in positions one and two
''columns further along than the test column. The offset
''can be any suitable value.
c.Offset(0, 1) = a
c.Offset(0, 2) = b
End If
Next
End Sub
Function FirstNonNumeric(txt As String) As String
With CreateObject("VBScript.RegExp")
.Pattern = "[^0-9]"
FirstNonNumeric = .Execute(txt)(0)
End With
End Function