I have a column of strings from which I want to extract the uppercase words then past in another column. The words are always in the beginning of the string. For example:
APPLE ORANGE_20 lbs_15 ------> APPLE ORANGE
BANANA_10 lbs_30 -----------------> BANANA
GRAPE MANGO 30lbs_o ----------> GRAPE MANGO
This is what I have so far but I am having a hard time setting the Pattern to get the output needed:
Sub ExtractUPPERCASE()
Dim re As Object, mc As Object
Dim r As Range, c As Range
Dim s As String
Dim wbdata As Workbook
Dim wsData As Worksheet
Set wbdata = Workbooks("trial1")
Set wsData = wbdata.Worksheets("Final Data")
wsData.Activate
Set r = wsData.Range("D1", Cells(Rows.Count, "D").End(xlUp))
Set re = CreateObject("vbscript.regexp")
With re
.Global = True
.ignorecase = False
.MultiLine = True
.Pattern = "^\s*([A-Z\W]+\b)\W+([\w\s]+)"
'.Pattern = "([^a-z]+|[^0-9]+|(?=.*[^\w_]))" I tried this pattern but it didn't get what i want
End With
For Each c In r
s = c.Text
If re.test(s) = True Then
Set mc = re.Execute(s)
c(1, 13) = mc(0).submatches(0)
End If
Next c
Range(r(1, 13), r(1, 13)).EntireColumn.AutoFit
End Sub
Thank you for your time :)
Try this:
Sub ExtractUPPERCASE()
Dim re As Object, mc As Object
Dim r As Range, c As Range
Dim s As String
Dim wbdata As Workbook
Dim wsData As Worksheet
Set wbdata = Workbooks("trial1")
Set wsData = wbdata.Worksheets("Final Data")
wsData.Activate
Set r = ActiveSheet.Range("A1", Cells(Rows.Count, "A").End(xlUp))
Set re = CreateObject("vbscript.regexp")
With re
.Global = True
.ignorecase = False
.MultiLine = True
.Pattern = "[A-Z]+\s?[A-Z]+"
End With
For Each c In r
s = c.Text
If re.test(s) = True Then
Set mc = re.Execute(s)
c(1, 13) = mc.Item(0).Value
End If
Next c
Range(r(1, 13), r(1, 13)).EntireColumn.AutoFit
End Sub
Related
I'm trying to filter items with an email body that contains a less than symbol <.
Here is a sample email body that contains less than symbol.
Our drive E: is now < 10%.
Sub CodeSubjectForward(Item As Outlook.MailItem)
Dim M1 As MatchCollection
Dim M As Match
Set Reg1 = New RegExp
With Reg1
.Pattern = "([<]\s*(\w*)\s*)"
.Global = True
End With
If Reg1.Test(Item.Body) Then
Set M1 = Reg1.Execute(Item.Body)
For Each M In M1
Next
End If
Item.Save
Set myForward = Item.Forward
myForward.Recipients.Add "alias#domain.com"
myForward.Send
End Sub
Should be something like this
Public Sub FWItem(Item As Outlook.mailitem)
Dim Email As Outlook.mailitem
Dim Matches As Variant
Dim RegExp As Object
Dim Pattern As String
Set RegExp = CreateObject("VbScript.RegExp")
If TypeOf Item Is Outlook.mailitem Then
Pattern = "(\s[<]\s)"
With RegExp
.Global = False
.Pattern = Pattern
.IgnoreCase = True
Set Matches = .Execute(Item.Body)
End With
If Matches.Count > 0 Then
Debug.Print Item.subject ' Print on Immediate Window
Set Email = Item.Forward
Email.subject = Item.subject
Email.Recipients.Add "0m3r#Email.com"
Email.Save
Email.Send
End If
End If
Set RegExp = Nothing
Set Matches = Nothing
Set Email = Nothing
Set Item = Nothing
End Sub
https://regex101.com/r/KOFM8E/1/
I'm using the Microsoft regular expression engine in Excel VBA. I'm very new to regex but I have a pattern working right now. I need to expand it and I'm having trouble. Here is my code so far:
Sub ImportFromDTD()
Dim sDTDFile As Variant
Dim ffile As Long
Dim sLines() As String
Dim i As Long
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim myRange As Range
Set Reg1 = New RegExp
ffile = FreeFile
sDTDFile = Application.GetOpenFilename("DTD Files,*.XML", , _
"Browse for file to be imported")
If sDTDFile = False Then Exit Sub '(user cancelled import file browser)
Open sDTDFile For Input Access Read As #ffile
Lines = Split(Input$(LOF(ffile), #ffile), vbNewLine)
Close #ffile
Cells(1, 2) = "From DTD"
J = 2
For i = 0 To UBound(Lines)
'Debug.Print "Line"; i; "="; Lines(i)
With Reg1
'.Pattern = "(\<\!ELEMENT\s)(\w*)(\s*\(\#\w*\)\s*\>)"
.Pattern = "(\<\!ELEMENT\s)(\w*)(\s*\(\#\w*\)\s*\>)"
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
If Reg1.Test(Lines(i)) Then
Set M1 = Reg1.Execute(Lines(i))
For Each M In M1
sExtract = M.SubMatches(1)
sExtract = Replace(sExtract, Chr(13), "")
Cells(J, 2) = sExtract
J = J + 1
'Debug.Print sExtract
Next M
End If
Next i
Set Reg1 = Nothing
End Sub
Currently, I'm matching on a set of data like this:
<!ELEMENT DealNumber (#PCDATA) >
and extract Dealnumber but now, I need to add another match on data like this:
<!ELEMENT DealParties (DealParty+) >
and extract just Dealparty without the Parens and the +
I've been using this as a reference and it's awesome but I'm still a bit confused. How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
EDIT
I have come across a few new scenarios that have to be matched on.
Extract Deal
<!ELEMENT Deal (DealNumber,DealType,DealParties) >
Extract DealParty the ?,CR are throwing me off
<!ELEMENT DealParty (PartyType,CustomerID,CustomerName,CentralCustomerID?,
LiabilityPercent,AgentInd,FacilityNo?,PartyReferenceNo?,
PartyAddlReferenceNo?,PartyEffectiveDate?,FeeRate?,ChargeType?) >
Extract Deals
<!ELEMENT Deals (Deal*) >
Looking at your pattern, you have too many capture groups. You only want to capture the PCDATA and DealParty. Try changing you pattern to this:
With Reg1
.Pattern = "\<!ELEMENT\s+\w+\s+\(\W*(\w+)\W*\)"
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Here's the stub: Regex101.
You could use this Regex pattern;
.Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>"
This portion
(#\w+|(\w+)\+)
says match either
#a-z0-9
a-z0-9+
inside the parentheses.
ie match either
(#PCDATA)
(DealParty+)
to validate the entire string
Then the submatches are used to extract DealNumber for the first valid match, DealParty for the other valid match
edited code below - note submatch is now M.submatches(0)
Sub ImportFromDTD()
Dim sDTDFile As Variant
Dim ffile As Long
Dim sLines() As String
Dim i As Long
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim myRange As Range
Set Reg1 = New RegExp
J = 1
strIn = "<!ELEMENT Deal12Number (#PCDATA) > <!ELEMENT DealParties (DealParty+) >"
With Reg1
.Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>"
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
If Reg1.Test(strIn) Then
Set M1 = Reg1.Execute(strIn)
For Each M In M1
sExtract = M.SubMatches(2)
If Len(sExtract) = 0 Then sExtract = M.SubMatches(0)
sExtract = Replace(sExtract, Chr(13), "")
Cells(J, 2) = sExtract
J = J + 1
Next M
End If
Set Reg1 = Nothing
End Sub
I have a cell in Excel that holds a long string in cell A1:
"ABC12+BED58,YZ001"
I have the following regex to match some specific variables in my string
strPattern = "[A-Z]{1,3}[0-9]{2,4}"
Basically, I need to write a macro or a function (I would prefer a function actually) that will fill cell A2, A3, A4 like that:
ABC12
BED58
YZ001
The thing is, there is an undeterminate number of parameters in the string (so for example, it could go all the way through A200).
I'm thinking of a function get_n_variables(str, n) that would return the Nth unique match
Here is my progress so far but the function returns #VALUE!
Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim matches As Object
strPattern = "[A-Z]{1,3}[0-9]{2,4}"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
Set matches = regEx.Execute(strInput)
simpleCellRegex = matches(0).SubMatches(0)
Else
simpleCellRegex = "Not matched"
End If
End If
End Function
From MrExcel Forum:
You can not put a function in a cell to change other cells. Functions do not work this way.
Thus, it should be a sub, like this, e.g. (outputs the matches under the selected cell with our input string):
Sub simpleCellRegex()
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim matches As MatchCollection
Dim i As Long, cnt As Long
strPattern = "[A-Z]{1,3}[0-9]{2,4}"
cnt = 1
If strPattern <> "" Then
strInput = ActiveCell.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set objMatches = regEx.Execute(strInput)
For i = 0 To objMatches.Count - 1
ActiveCell.Offset(cnt).Value = objMatches.Item(i)
cnt = cnt + 1
Next
End If
End If
End Sub
Output:
You can actually still use an function if you use an array
select B1:D1
enter this formula =simpleCellRegex(A1) and press CTRL+SHIFT+ENTER
if you dont know how many matches enter in more cells than there may be matches
code
Function simpleCellRegex(StrIn As String) As Variant
Dim regEx As Object
Dim regMC As Object
Dim X
Dim strPattern As String
Dim lngCnt As Long
strPattern = "[A-Z]{1,3}[0-9]{2,4}"
Set regEx = CreateObject("vbscript.regexp")
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
If .Test(StrIn) Then
Set regMC = .Execute(StrIn)
ReDim X(0 To regMC.Count - 1) As String
For lngCnt = 0 To UBound(X)
X(lngCnt) = regMC(lngCnt)
Next
simpleCellRegex = X
Else
simpleCellRegex = "Not matched"
End If
End With
End Function
In column A, I have a list of sentences
In columns B-Z, I have strings contain numbers followed by letters both uppercase and lower case.
such as
45ABc
The following macro strips all lowercase letters in the entire work sheet - do not want it to strip any letters in column A. Please help.
Sub RegExReplace()
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[^A-Z0-9_-]"
For Each objCell In ActiveSheet.UsedRange.Cells
objCell.Value = RegEx.Replace(objCell.Value, "")
Next
End Sub
Try this one:
Sub RegExReplace()
Dim objCell As Range
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[^A-Z0-9_-]"
For Each objCell In ActiveSheet.UsedRange.Cells
If objCell.Column<>1 Then objCell.Value = RegEx.Replace(objCell.Value, "")
Next
End Sub
or if you know that values that should be replaced only in columns B:Z, you can use next code as well:
Sub RegExReplace()
Dim rng As Range, objCell As Range
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.Pattern = "[^A-Z0-9_-]"
With ActiveSheet
Set rng = Application.Intersect(.UsedRange, .Range("B:Z"))
End With
If Not rng Is Nothing Then
For Each objCell In rng
objCell.Value = RegEx.Replace(objCell.Value, "")
Next
End If
End Sub
I've added code that:
Fixes your pattern to remove what you want to remove directly - ie a-z - rather than what you want to preserve (currently A-Z-_ but could be much larger).
To use quicker arrays rather than range loops.
Sub objRegexReplace()
Dim rng1 As Range
Dim objRegex As Object
Dim X
Dim lngRow As Long
Dim lngCol As Long
Set rng1 = Application.Intersect(ActiveSheet.UsedRange, ActiveSheet.Range("B:Z"))
X = rng1.Value2
If rng1.Cells.Count > 1 Then
Set objRegex = CreateObject("VBScript.Regexp")
With objRegex
.Global = True
.Pattern = "[a-z]+"
.ignorecase = False
For lngRow = 1 To UBound(X, 1)
For lngCol = 1 To UBound(X, 2)
X(lngRow, lngCol) = .Replace(X(lngRow, lngCol), vbNullString)
Next
Next
rng1.Value2 = X
End With
Else
MsgBox "No range to work on", vbCritical
End If
End Sub
Sub Test()
Dim strTest As String
Dim strTemp As String
strTest = Sheet1.Cells(1, 1).Value
MsgBox RE6(strTest)
Sheet1.Cells(2, 1).Value = RE6(strTest)
End Sub
Function RE6(strData As String)
Dim RE As Object 'REMatches As Object
Dim P As String, A As String
Dim Q As String, B As String
Dim R As String, C As String
Dim S As String, D As String
Dim T As String, E As String
Dim U As String, F As String
Dim V As String, G As String
Dim W As String, H As String
Dim N As Integer
Set RE = CreateObject("vbscript.regexp")
P = "(?:^|\b)He"
A = "She"
Q = "(?:^|\b)he"
B = "she"
R = "(?:^|\b)Him"
C = "Her"
S = "(?:^|\b)him"
D = "her"
T = "(?:^|\b)Himself"
E = "Herself"
U = "(?:^|\b)himself"
F = "herself"
V = "(?:^|\b)His"
G = "Her"
W = "(?:^|\b)his"
H = "her"
'This section replaces "He" with"She"
With RE
.MultiLine = True
.Global = True
.IgnoreCase = False
.Pattern = P
End With
RE6 = RE.Replace(strData, A)
'This section replaces "he" with "she"
With RE
.MultiLine = True
.Global = True
.IgnoreCase = False
.Pattern = Q
End With
RE6 = RE.Replace(strData, B)
'
'This section replaces "Him" with "Her"
With RE
.MultiLine = True
.Global = True
.IgnoreCase = False
.Pattern = R
End With
RE6 = RE.Replace(strData, C)
'This section replaces "him" with "her"
With RE
.MultiLine = True
.Global = True
.IgnoreCase = False
.Pattern = S
End With
RE6 = RE.Replace(strData, D)
'This section replaces "Himself" with "Herself"
With RE
.MultiLine = True
.Global = True
.IgnoreCase = False
.Pattern = T
End With
RE6 = RE.Replace(strData, E)
'This section replaces "himself" with "herself"
With RE
.MultiLine = True
.Global = True
.IgnoreCase = False
.Pattern = U
End With
RE6 = RE.Replace(strData, F)
'This section replaces "His" with "Her"
With RE
.MultiLine = True
.Global = True
.IgnoreCase = False
.Pattern = V
End With
RE6 = RE.Replace(strData, G)
'This section replaces "his" with "her"
With RE
.MultiLine = True
.Global = True
.IgnoreCase = False
.Pattern = W '
RE6 = RE.Replace(strData, H)
End With
End Function
When I run this code on this piece of text:
James has settled effortlessly in his new class. He has shown seriousness and demonstrated traits of a serious student in the first half of the term. I am very optimistic that his positive attitude towards work, if he does not relent, will yield positive dividends. However, James needs to respond positively to prompts on getting himself better organised in school. I wish Him, him the best in the second half of the term.
I only get "his" replaced with "her". If I comment out the last bit then I get only "Him" replaced with "Her". Any help will be very welcome.
The issue is you repeatedly do your replacement on strData, as opposed to the result of each replacement; that is, you take your original string, replace "He" with "She", and then store it in RE6. Then you take your original string again, replace "he" with "she", and then store it in RE6, overwriting the first replacement, and so on and so on.. This is why you only see the results of the last replacement.
To fix it, leave your first replacement as
RE6 = RE.Replace(strData, A)
but change all of your other replacements to be
RE6 = RE.Replace(RE6, B) <-- do this for B-H
This will give you your desired output.