How to extract phone number and split to column in Google sheet? - regex

I am trying to extract phone numbers in this format (123) 456-7890 and put each number in a separate column with format 1234567890 with no space, dash or parenthesis.
I am able to achieve this in Excel using VBA code below from another StackOverflow question, but not able to get it working on Google sheet
Sub ewqre()
Dim str As String, n As Long, rw As Long
Dim rgx As Object, cmat As Object, ws As Worksheet
Set rgx = CreateObject("VBScript.RegExp")
Set ws = Worksheets("Sheet4")
With rgx
.Global = True
.MultiLine = True
'phone number pattern is: ###-###-####
.Pattern = "/^(\d{3})(\d{3})(\d{4})$/"
For rw = 2 To ws.Cells(Rows.Count, "A").End(xlUp).Row
str = ws.Cells(rw, "A").Value2
If .Test(str) Then
Set cmat = .Execute(str)
'populate the worksheet with the matches
For n = 0 To cmat.Count - 1
ws.Cells(rw, Columns.Count).End(xlToLeft).Offset(0, 1) = cmat.Item(n)
Next n
End If
Next rw
End With
Set rgx = Nothing: Set ws = Nothing
End Sub

=ARRAYFORMULA(IFERROR(REGEXEXTRACT(TO_TEXT(SPLIT(
REGEXREPLACE(A2:A, "\(|\)|\-| ", ""), CHAR(10))), "\d+")))
or:
=ARRAYFORMULA(REGEXREPLACE(REGEXEXTRACT(SPLIT(A2, CHAR(10)),
"\((.*)\(.T"),"\)|\s|\-", ""))
true array formula would be:
=ARRAYFORMULA(IFERROR(REGEXREPLACE(REGEXEXTRACT(SPLIT(A2:A, CHAR(10)),
"\((.*)\(.T"),"\)|\s|\-", "")))

Related

Remove " and =+ from cells in Google Sheets

I have cells like
"Apple"
=+Organe +is +good
"Mango"
I want to remove all the characters i.e. = and " and +
Tried =SUBSTITUTE(C3,"+" ,"",1) but didnt work
I am using Google Sheets and can't use Excel (in MAC)
If you know which and how many characters should be removed N nested substitutes would work. Just make sure that the input cell is not an error:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"="," "),"+"," "),"""","")
If you need to exclude anything but the alphabet characters A to Z and the numbers from 0 to 9, VBA + RegEx come into power:
Public Function RemoveNonAlphabetChars(inputString As String) As String
Dim regEx As Object
Dim inputMatches As Object
Dim regExString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Global = True
.Pattern = "[^a-zA-Z0-9]"
.ignoreCase = True
Set inputMatches = .Execute(inputString)
If regEx.test(inputString) Then
RemoveNonAlphabetChars = .Replace(inputString, vbNullString)
Else
RemoveNonAlphabetChars = inputString
End If
End With
End Function
try:
=ARRAYFORMULA(REGEXREPLACE(A1:A, "[=\+""]", ))
or if you want to use SUBSTITUTE then:
=ARRAYFORMULA(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1:A, "=", ), "+", ), """", ))

Excel VBA Regex for fetching the values which has specific text

I am new to the vba and trying to solve my situation wherein we recieve multiple mail like below:
we would like to create a database in excel for all the mails which are in my specific folder
Package Summary:
Client: XYZ
Price (USD): 3,000
Time: 1 Week
Project Id: 21312
and some more text......
here we would like to capture the information for Client, Price (USD), Time, Project Id.
Have tried below code which capture the information and stores in excel file.
Sub GetFromOutlook()
Dim OutlookApp As Outlook.Application
Dim OutlookNamespace As Namespace
Dim Folder As MAPIFolder
Dim OutlookMail As Variant
'Dim i As Integer
Set OutlookApp = New Outlook.Application
Set OutlookNamespace = OutlookApp.GetNamespace("MAPI")
Set Folder = OutlookNamespace.GetDefaultFolder(olFolderInbox).Folders("Dummy").Folders("New Dummy")
'i = 1
For Each OutlookMail In Folder.Items
Dim sText As String
sText = OutlookMail.Body
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim vText, vText2, vText3, vText4 As Variant
Dim i As Integer
Set Reg1 = New RegExp
' \s* = invisible spaces
' \d* = match digits
' \w* = match alphanumeric
For i = 1 To 9
With Reg1
Select Case i
Case 1
.Pattern = "(Client[:]([\w-\s]*)\s*)\n"
.Global = False
Case 2
.Pattern = "(([\d]*\,[\d]*))\s*\n"
.Global = False
Case 3
.Pattern = "(Time[:]([\w-\s]*)\s*)\n"
.Global = False
Case 4
.Pattern = "(Project Id[:]([\w-\s]*)\s*)\n"
.Global = False
End Select
End With
If Reg1.Test(sText) Then
Set M1 = Reg1.Execute(sText)
Select Case i
Case 1
For Each M In M1
vText = Trim(M.SubMatches(1))
Next
Case 2
For Each M In M1
vText2 = Trim(M.SubMatches(1))
Next
Case 3
For Each M In M1
vText3 = Trim(M.SubMatches(1))
Next
Case 4
For Each M In M1
vText4 = Trim(M.SubMatches(1))
Next
End Select
End If
Next i
Range("a1000").End(xlUp).Offset(1, 0).Value = vText
Range("b1000").End(xlUp).Offset(1, 0).Value = vText2
Range("c1000").End(xlUp).Offset(1, 0).Value = vText3
Range("d1000").End(xlUp).Offset(1, 0).Value = vText4
Next OutlookMail
Set Folder = Nothing
Set OutlookNamespace = Nothing
Set OutlookApp = Nothing
End Sub
Challenges:
Challenge 1: if the heading Price (USD) changes to Price (GBP) still its storing the value, which should not be. it should only store the value only if the matching text found.
i tried "(Price (USD) [:] ([\d]\,[\d]))\s*\n" however its not working.
Challenge 2: for Project id, value is coming with underscore as well which i am unable to exclude.
Would really appreciate if one can help me solving the above 2 challenge from my code.
or else suggest any better approach for the same.
You may use
Client:\s*(.*)[\r\n][\s\S]*?^Price \(USD\):\s*(.*)[\r\n][\s\S]*?^Time:\s*(.*)[\r\n][\s\S]*?^Project Id:\s*(\w+)
Make sure you set Reg1.Multiline = True.
See the regex demo
The Client details will be in M.SubMatches(0) (Group 1), price info will be in M.SubMatches(1) (Group 2), time details in M.SubMatches(2) (Group 3), and the project ID will be in M.SubMatches(3) (Group 4).
If you need to remove underscores from Group 4, the project ID, just use a post-processing step:
vText4 = Replace(M.SubMatches(3), "_", "")

How to replace Numbers in Parentheses with some calculations in MS Word

I have a problem to replace some serial number such as [30] [31] [32]... to [31] [32] [33]... in MS word when I insert a new references in the middle of article. I have not found a solution way in GUI so I try to use VBA to do that replacement. I find a similar problem in stack overflow:
MS Word Macro to increment all numbers in word document
However, this way is a bit inconvenient because it have to generate some replacement array in other place. Can I make that replacement with regex and some function in MS Word VBA like code below?
Sub replaceWithregExp()
Dim regExp As Object
Dim regx, S$, Strnew$
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})\]"
.Global = True
End With
'How to do some calculations with $1?
Selection.Text = regExp.Replace(Selection.Text, "[$1]")
End Sub
But I don't know how to do some calculations with $1 in regExp? I have try use "[$1+1]" but it return [31+1] [32+1] [33+1]. Can anyone help? Thanks!
It is impossible to pass a callback function to the RegExp.Replace, so you have the only option: use RegExp.execute and process matches in a loop.
Here is an example code for your case (I took a shortcut since you only have the value to modify inside known delimiters, [ and ].)
Sub replaceWithregExp()
Dim regExp As Object
Dim regx, S$, Strnew$
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})]"
.Global = True
End With
'How to do some calculations with $1?
' Removing regExp.Replace(Selection.Text, "[$1]")
For Each m In regExp.Execute(Selection.Text)
Selection.Text = Left(Selection.Text, m.FirstIndex+1) _
& Replace(m.Value, m.Value, CStr(CInt(m.Submatches(0)) + 10)) _
& Mid(Selection.Text, m.FirstIndex + Len(m.Value))
Next m
End Sub
Here,
Selection.Text = Left(Selection.Text, m.FirstIndex+1) - Get what is before
& Replace(m.Value, m.Value, CStr(CInt(m.Submatches(0)) + 10)) - Add 10 to the captured number
& Mid(Selection.Text, m.FirstIndex + Len(m.Value)) - Append what is after the capture
That should do the trick :
Sub IncrementWithRegex()
Dim Para As Paragraph
Set Para = ThisDocument.Paragraphs.First
Dim ParaNext As Paragraph
Dim oRange As Range
Set oRange = Para.Range
Dim regEx As New RegExp
Dim regMatch As Variant
Dim ACrO As String
With regEx
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = "[\[]([0-9]{2})[\]]"
End With
Do While Not Para Is Nothing
Set ParaNext = Para.Next
Set oRange = Para.Range
'Debug.Print oRange.Text
If regEx.test(oRange.Text) Then
For Each regMatch In regEx.Execute(oRange.Text)
oRange.Text = _
Left(oRange.Text, _
InStr(1, oRange.Text, CStr(regMatch))) & _
CDbl(regMatch) + 1 & _
Right(oRange.Text, _
Len(CStr(regMatch)) + InStr(1, oRange.Text, CStr(regMatch)))
Next regMatch
Else
End If
Set Para = ParaNext
Loop
End Sub
To use this, remember to add the reference :
Description: Microsoft VBScript Regular Expressions 5.5
FullPath: C:\windows\SysWOW64\vbscript.dll\3
Major.Minor: 5.5
Name: VBScript_RegExp_55
GUID: {3F4DACA7-160D-11D2-A8E9-00104B365C9F}
Here is a simple VBA macro you can use to achieve this :
Sub IncrementNumbers()
Dim regExp As Object
Dim i As Integer
Dim fullMatch As String
Dim subMatch As Integer
Dim replacement As String
Const TMP_PREFIX As String = "$$$"
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})\]"
.Global = True
.MultiLine = True
End With
'Ensure selected text match our regex
If regExp.test(Selection.Text) Then
'Find all matches
Set matches = regExp.Execute(Selection.Text)
' Start from last match
For i = 0 To (matches.Count - 1)
fullMatch = matches(i).Value
subMatch = CInt(matches(i).SubMatches(0))
'Increment by 1
subMatch = subMatch + 1
'Create replacement. Add a temporary prefix so we ensure [30] replaced with [31]
'will not be replaced with [32] when [31] will be replaced
replacement = "[" & TMP_PREFIX & subMatch & "]"
'Replace full match with [subMatch]
Selection.Text = Replace(Selection.Text, fullMatch, replacement)
Next
End If
'Now replacements are complete, we have to remove replacement prefix
Selection.Text = Replace(Selection.Text, TMP_PREFIX, "")
End Sub

Regular Expressions in Excel VBA

I'm using the Microsoft regular expression engine in Excel VBA. I'm very new to regex but I have a pattern working right now. I need to expand it and I'm having trouble. Here is my code so far:
Sub ImportFromDTD()
Dim sDTDFile As Variant
Dim ffile As Long
Dim sLines() As String
Dim i As Long
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim myRange As Range
Set Reg1 = New RegExp
ffile = FreeFile
sDTDFile = Application.GetOpenFilename("DTD Files,*.XML", , _
"Browse for file to be imported")
If sDTDFile = False Then Exit Sub '(user cancelled import file browser)
Open sDTDFile For Input Access Read As #ffile
Lines = Split(Input$(LOF(ffile), #ffile), vbNewLine)
Close #ffile
Cells(1, 2) = "From DTD"
J = 2
For i = 0 To UBound(Lines)
'Debug.Print "Line"; i; "="; Lines(i)
With Reg1
'.Pattern = "(\<\!ELEMENT\s)(\w*)(\s*\(\#\w*\)\s*\>)"
.Pattern = "(\<\!ELEMENT\s)(\w*)(\s*\(\#\w*\)\s*\>)"
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
If Reg1.Test(Lines(i)) Then
Set M1 = Reg1.Execute(Lines(i))
For Each M In M1
sExtract = M.SubMatches(1)
sExtract = Replace(sExtract, Chr(13), "")
Cells(J, 2) = sExtract
J = J + 1
'Debug.Print sExtract
Next M
End If
Next i
Set Reg1 = Nothing
End Sub
Currently, I'm matching on a set of data like this:
<!ELEMENT DealNumber (#PCDATA) >
and extract Dealnumber but now, I need to add another match on data like this:
<!ELEMENT DealParties (DealParty+) >
and extract just Dealparty without the Parens and the +
I've been using this as a reference and it's awesome but I'm still a bit confused. How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
EDIT
I have come across a few new scenarios that have to be matched on.
Extract Deal
<!ELEMENT Deal (DealNumber,DealType,DealParties) >
Extract DealParty the ?,CR are throwing me off
<!ELEMENT DealParty (PartyType,CustomerID,CustomerName,CentralCustomerID?,
LiabilityPercent,AgentInd,FacilityNo?,PartyReferenceNo?,
PartyAddlReferenceNo?,PartyEffectiveDate?,FeeRate?,ChargeType?) >
Extract Deals
<!ELEMENT Deals (Deal*) >
Looking at your pattern, you have too many capture groups. You only want to capture the PCDATA and DealParty. Try changing you pattern to this:
With Reg1
.Pattern = "\<!ELEMENT\s+\w+\s+\(\W*(\w+)\W*\)"
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Here's the stub: Regex101.
You could use this Regex pattern;
.Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>"
This portion
(#\w+|(\w+)\+)
says match either
#a-z0-9
a-z0-9+
inside the parentheses.
ie match either
(#PCDATA)
(DealParty+)
to validate the entire string
Then the submatches are used to extract DealNumber for the first valid match, DealParty for the other valid match
edited code below - note submatch is now M.submatches(0)
Sub ImportFromDTD()
Dim sDTDFile As Variant
Dim ffile As Long
Dim sLines() As String
Dim i As Long
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim myRange As Range
Set Reg1 = New RegExp
J = 1
strIn = "<!ELEMENT Deal12Number (#PCDATA) > <!ELEMENT DealParties (DealParty+) >"
With Reg1
.Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>"
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
If Reg1.Test(strIn) Then
Set M1 = Reg1.Execute(strIn)
For Each M In M1
sExtract = M.SubMatches(2)
If Len(sExtract) = 0 Then sExtract = M.SubMatches(0)
sExtract = Replace(sExtract, Chr(13), "")
Cells(J, 2) = sExtract
J = J + 1
Next M
End If
Set Reg1 = Nothing
End Sub

Macro to skip Column A and strips out lower case letters

In column A, I have a list of sentences
In columns B-Z, I have strings contain numbers followed by letters both uppercase and lower case.
such as
45ABc
The following macro strips all lowercase letters in the entire work sheet - do not want it to strip any letters in column A. Please help.
Sub RegExReplace()
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[^A-Z0-9_-]"
For Each objCell In ActiveSheet.UsedRange.Cells
objCell.Value = RegEx.Replace(objCell.Value, "")
Next
End Sub
Try this one:
Sub RegExReplace()
Dim objCell As Range
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[^A-Z0-9_-]"
For Each objCell In ActiveSheet.UsedRange.Cells
If objCell.Column<>1 Then objCell.Value = RegEx.Replace(objCell.Value, "")
Next
End Sub
or if you know that values that should be replaced only in columns B:Z, you can use next code as well:
Sub RegExReplace()
Dim rng As Range, objCell As Range
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[^A-Z0-9_-]"
With ActiveSheet
Set rng = Application.Intersect(.UsedRange, .Range("B:Z"))
End With
If Not rng Is Nothing Then
For Each objCell In rng
objCell.Value = RegEx.Replace(objCell.Value, "")
Next
End If
End Sub
I've added code that:
Fixes your pattern to remove what you want to remove directly - ie a-z - rather than what you want to preserve (currently A-Z-_ but could be much larger).
To use quicker arrays rather than range loops.
Sub objRegexReplace()
Dim rng1 As Range
Dim objRegex As Object
Dim X
Dim lngRow As Long
Dim lngCol As Long
Set rng1 = Application.Intersect(ActiveSheet.UsedRange, ActiveSheet.Range("B:Z"))
X = rng1.Value2
If rng1.Cells.Count > 1 Then
Set objRegex = CreateObject("VBScript.Regexp")
With objRegex
.Global = True
.Pattern = "[a-z]+"
.ignorecase = False
For lngRow = 1 To UBound(X, 1)
For lngCol = 1 To UBound(X, 2)
X(lngRow, lngCol) = .Replace(X(lngRow, lngCol), vbNullString)
Next
Next
rng1.Value2 = X
End With
Else
MsgBox "No range to work on", vbCritical
End If
End Sub