Match n amount of words separated by commas after base text - regex

I would like to match an infinite amount of words separated by commas and whitespaces.
Is there a better solution than just repeating the search parameter?
Sample:
"2_i Art des Problems:\s*(.[^,\s]+)[,]\s*(.[^,\s]+)[,]\s*(.[^,\s]+)"
2_i Art des Problems: Elektrisch, Schweißausrüstung, Burgenland
View on regex101: https://regex101.com/r/yP7PPO/1
Full code for this operation:
With Reg1
.Pattern = "2_i Art des Problems:+\s*([^\r\n]*\S)"
.Global = False
End With
If Reg1.Test(olMail.Body) Then
Set M1 = Reg1.Execute(olMail.Body)
End If
For Each M In M1
With xExcelApp
Select Case M.SubMatches
Case Software
Range("D6").Value = 1
Case Mechanisch
Range("E6").Value = 1
Case Elektrisch
Range("F6").Value = 1
Case Roboter
Range("G6").Value = 1
Case Schweißausrüstung
Range("H6").Value = 1
Case Anwendung
Range("I6").Value = 1
Case Ersatzteil
Range("J6").Value = 1
Case Else
Range("K6").Value = 1
End Select
End With
Next M

Does it really need to be a RegEx?
I think this is over complicating things as this can easily be solved with Split():
Option Explicit
Public Sub Example()
Const TestString As String = "2_i Art des Problems: Elektrisch, Schweißausrüstung, Burgenland"
Const ConstantPart As String = "2_i Art des Problems: "
If Left$(TestString, Len(ConstantPart)) = ConstantPart Then
Dim Parts() As String
Parts = Split(Mid$(TestString, Len(ConstantPart) + 1), ", ")
Dim Part As Variant
For Each Part In Parts
Debug.Print Part
Next Part
End If
End Sub
Output is:
Elektrisch
Schweißausrüstung
Burgenland

If you realy need to use regexp than use global flag and e.g. this regexp
(.[^,\s]+)(,|$)
Explanation here
With regEx
.Global = True
Use .SubMatches to get capturing groups values
EDIT:
according to one of comment "Then you still need to Trim the matches because they will include the spaces. – Pᴇʜ 1 min ago"
you can still use regexp
.([^,\s]+)(,|$)
check

Related

Can I use regex to not only identify a pattern but extract the value found?

I am using Excel VBA.
I need to extract the dimensions (width x height) of a creative from a string and the dimensions will always be in the format:
000x000 or 000X000 or 000x00 or 000X00 where 0 can be any number between 1-9 and x can be upper or lower case.
I read this guide:
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
And I think what I want is something similar to:
[0-9]{2, 3}[xX][0-9]{2, 3}
So if my string is:
creativeStr = ab234-cdc-234-300x250-777aabb
I want to extract "300x250" and assign it to a variable like this:
dimensions = 300x250
Is my Regex above correct? Also, how would I pull the resulting match into a variable?
Here is part of my code:
creativeStr = "Sample-abc-300x250-cba-123"
regex_pattern = "[0-9]{2,3}[xX][0-9]{2,4}"
If regex_pattern <> "" Then
With regEx
.Global = True
.Pattern = regex_pattern
End With
If regEx.Test(creativeStr) Then
dimensions = regEx.Replace(creativeStr, "$1")
Else
dimensions = "Couldn't extract dimensions from creative name."
End If
End If
But it still returns the condition in my else clause...
Thanks!
Your examples do not match your regex. Your examples show that the first set of digits will always be three, and the last set either two or three.
Also, in your description you write can be any number between 1-9 but your example includes 0's.
If you are going to work with regex, that type of imprecision will lead to undesired results.
Asssuming that 0's should be included, and that the desired pattern is 3x2 or 3x3, then perhaps this example will provide some clarity:
Option Explicit
Function dimension(S As String) As String
Dim RE As Object, MC As Object
Const sPat As String = "[0-9]{3}[Xx][0-9]{2,3}"
' or, with .ignorecase = true, could use: "\d{3}x\d{2,3}"
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = sPat
If .Test(S) = True Then
Set MC = .Execute(S)
dimension = MC(0)
Else
dimension = "Couldn't extract dimensions from creative name."
End If
End With
End Function
Sub getDimension()
Dim creativeStr As String
Dim Dimensions As String
creativeStr = "Sample-abc-300x250-cba-123"
Dimensions = dimension(creativeStr)
Debug.Print Dimensions
End Sub

Get last 3 characters of matched pattern RUTA

I am trying to get last 3 characters of a pattern. But I am stuck on how to do it.
Please share your thoughts on this.
PACKAGE uima.ruta.example;
Document{->RETAINTYPE(SPACE)};
DECLARE VarA;
((W|NUM)* (W|NUM)*){REGEXP(".{12}")-> MARK(VarA),MARK(EntityType,1), UNMARK(VarA)};
I/P - AB1234567CAB
O/P - CAB
You can use $ to indicate where the end of the source string should be in the pattern. For your example, you want the last 3 characters, so you could use a pattern like:
.{3}$
To get the last 3 characters. This would get any character (apart from a \n), but you could be more specific, for example if you just want uppercase letters, you could use:
[A-Z]{3}$
or if you could accept uppercase, lowercase or numbers, you could use
\w{3}$
Experiment on regex101.com to see what works for you.
Suppose your data in cell A1
You can Use the second macro of this two ones
Option Explicit
Sub Extract_Laste_3Carachters(st As Range, Patt$, n)
Dim Obj As Object
Set Obj = CreateObject("Vbscript.RegExp")
With Obj
.Pattern = Patt
.Global = True
End With
If Len(st) <= 3 Then st.Offset(, 1) = st: Exit Sub
If Obj.test(st) Then
If n > Obj.Execute(st).Count Then n = Obj.Execute(st).Count
st.Offset(, 1) = _
Obj.Execute(st)(n - 3) _
& Obj.Execute(st)(n - 2) _
& Obj.Execute(st)(n - 1)
End If
End Sub
'+++++++++++++++++++++++++++++++
Sub Test_Me()
Call Extract_Laste_3Carachters(Range("a1"), ("\w"), Len(Range("a1")))
End Sub
I tried below code and its working now!
PACKAGE uima.ruta.example;
Document{->RETAINTYPE(SPACE)};
"(?i)\\b(?=.*\\d)[1]{0,1}[A-Z0-9]{2}[\\s |-]{0,2}[A-Z0-9]{7}[\\s |-]{0,2}([A-Z]{3})\\b" ->1 = EntityType;

Extract largest numeric sequence from string (regex, or?)

I have strings similar to the following:
4123499-TESCO45-123
every99999994_54
And I want to extract the largest numeric sequence in each string, respectively:
4123499
99999994
I have previously tried regex (I am using VB6)
Set rx = New RegExp
rx.Pattern = "[^\d]"
rx.Global = True
StringText = rx.Replace(StringText, "")
Which gets me partway there, but it only removes the non-numeric values, and I end up with the first string looking like:
412349945123
Can I find a regex that will give me what I require, or will I have to try another method? Essentially, my pattern would have to be anything that isn't the longest numeric sequence. But I'm not actually sure if that is even a reasonable pattern. Could anyone with a better handle of regex tell me if I am going down a rabbit hole? I appreciate any help!
You cannot get the result by just a regex. You will have to extract all numeric chunks and get the longest one using other programming means.
Here is an example:
Dim strPattern As String: strPattern = "\d+"
Dim str As String: str = "4123499-TESCO45-123"
Dim regEx As New RegExp
Dim matches As MatchCollection
Dim match As Match
Dim result As String
With regEx
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = strPattern
End With
Set matches = regEx.Execute(str)
For Each m In matches
If result < Len(m.Value) Then result = m.Value
Next
Debug.Print result
The \d+ with RegExp.Global=True will find all digit chunks and then only the longest will be printed after all matches are processed in a loop.
That's not solvable with an RE on its own.
Instead you can simply walk along the string tracking the longest consecutive digit group:
For i = 1 To Len(StringText)
If IsNumeric(Mid$(StringText, i, 1)) Then
a = a & Mid$(StringText, i, 1)
Else
a = ""
End If
If Len(a) > Len(longest) Then longest = a
Next
MsgBox longest
(first result wins a tie)
If the two examples you gave, are of a standard where:
<long_number>-<some_other_data>-<short_number>
<text><long_number>_<short_number>
Are the two formats that the strings come in, there are some solutions.
However, if you are searching any string in any format for the longest number, these will not work.
Solution 1
([0-9]+)[_-].*
See the demo
In the first capture group, you should have the longest number for those 2 formats.
Note: This assumes that the longest number will be the first number it encounters with an underscore or a hyphen next to it, matching those two examples given.
Solution 2
\d{6,}
See the demo
Note: This assumes that the shortest number will never exceed 5 characters in length, and the longest number will never be shorter than 6 characters in length
Please, try.
Pure VB. No external libs or objects.
No brain-breaking regexp's patterns.
No string manipulations, so - speed. Superspeed. ~30 times faster than regexp :)
Easy transform on variouse needs.
For example, concatenate all digits from the source string to a single string.
Moreover, if target string is only intermediate step,
so it's possible to manipulate with numbers only.
Public Sub sb_BigNmb()
Dim sSrc$, sTgt$
Dim taSrc() As Byte, taTgt() As Byte, tLB As Byte, tUB As Byte
Dim s As Byte, t As Byte, tLenMin As Byte
tLenMin = 4
sSrc = "every99999994_54"
sTgt = vbNullString
taSrc = StrConv(sSrc, vbFromUnicode)
tLB = LBound(taSrc)
tUB = UBound(taSrc)
ReDim taTgt(tLB To tUB)
t = 0
For s = tLB To tUB
Select Case taSrc(s)
Case 48 To 57
taTgt(t) = taSrc(s)
t = t + 1
Case Else
If CBool(t) Then Exit For ' *** EXIT FOR ***
End Select
Next
If (t > tLenMin) Then
ReDim Preserve taTgt(tLB To (t - 1))
sTgt = StrConv(taTgt, vbUnicode)
End If
Debug.Print "'" & sTgt & "'"
Stop
End Sub
How to handle sSrc = "ev_1_ery99999994_54", please, make by yourself :)
.

VBA: REGEX LOOKBEHIND MS ACCESS 2010

I have a function that was written so that VBA can be used in MS Access
I wish to do the following
I have set up my code below. Everything before the product works perfectly but trying to get the information behind just returns "" which is strange as when i execute it within Notepad++ it works perfectly fine
So it looks for the letters MIP and one of the 3 letter codes (any of them)
StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
' PART 1
' If MIP appears in the string, then delete any of the following codes if they exist - DOM, DOX, DDI, ECX, LOW, WPX, SDX, DD6, DES, BDX, CMX,
' WMX, TDX, TDT, BSA, EPA, EPP, ACP, ACA, ACE, ACS, GMB, MAL, USP, NWP.
' EXAMPLE 1. Flagged as: MADHUBESOMIPTDTLTRCOYORGLEJ, should be MADHUBESOMIPLTRCOYORGLEJ
Do While regexp(StringToCheck, "MIP(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False) <> ""
' SELECT EVERYTHING BEFORE THE THREE LETTER CODES
strPart1 = regexp(StringToCheck, ".*^[^_]+(?=DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False)
' SELECT EVERYTHING AFTER THE THREE LETTER CODES
strPart2 = regexp(StringToCheck, "(?<=(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX).*", False)
StringToCheck = strPart1 & strPart2
Loop
The function i am using which i have taken from the internet is below
Function regexp(StringToCheck As Variant, PatternToUse As String, Optional CaseSensitive As Boolean = True) As String
On Error GoTo RefErr:
Dim re As New regexp
re.Pattern = PatternToUse
re.Global = False
re.IgnoreCase = Not CaseSensitive
Dim m
For Each m In re.Execute(StringToCheck)
regexp = UCase(m.Value)
Next
RefErr:
On Error Resume Next
End Function
Just do it in two steps:
Check if MIP is in the string
If it is, remove the other codes.
Like this:
Sub Test()
Dim StringToCheck As String
StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
Debug.Print StringToCheck
Debug.Print CleanupString(StringToCheck)
End Sub
Function CleanupString(str As String) As String
Dim reCheck As New RegExp
Dim reCodes As New RegExp
reCheck.Pattern = "^(?:...)*?MIP"
reCodes.Pattern = "^((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
reCodes.Global = True
If reCheck.Test(str) Then
While reCodes.Test(str)
str = reCodes.Replace(str, "$1")
Wend
End If
CleanupString = str
End Function
Note that the purpose of (?:...)*? is to group the letters in threes.
Since the VBScript regular expression engine does support look-aheads, you can of course also do it in a single regex:
Function CleanupString(str As String) As String
Dim reClean As New RegExp
reClean.Pattern = "^(?=(?:...)*?MIP)((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
While reClean.Test(str)
str = reClean.Replace(str, "$1")
Wend
CleanupString = str
End Function
Personally, I like the two-step check/remove pattern better because it is a lot more obvious and therefore more maintainable.
Non RE option:
Function DeMIPString(StringToCheck As String) As String
If Not InStr(StringToCheck, "MIP") Then
DeMIPString = StringToCheck
Else
Dim i As Long
For i = 1 To Len(StringToCheck) Step 3
Select Case Mid$(StringToCheck, i, 3)
Case "MIP", "DOM", "DOX", "DDI", "ECX", "LOW", "WPX", "SDX", "DD6", "DES", "BDX", "CMX", "WMX", "TDX", "TDT", "BSA", "EPA", "EPP", "ACP", "ACA", "ACE", "ACS", "GMB", "MAL", "USP", "NWP":
Case Else
DeMIPString = DeMIPString & Mid$(StringToCheck, i, 3)
End Select
Next
End If
End Function

Excel VBA: search a string to find the first non-text character

Cells contain a mixture of characters within a string, such as:
Abcdef_8765
QWERTY3_JJHH
Xyz9mnop
I need to find the first non A-Za-z character so that I can strip out the subsequent remainder of the string.
So the results would be:
Abcdef
QWERTY
Xyz
I know how to do this if I know exactly what character I'm looking for, but I'm not intuitively grasping how to find ANY character other than A-Za-z.
Btw, this is intended to be used within a vba solution.
====================
EDIT:
I've had success with the following...
a = "abc123"
b = Len(a)
For x = 1 To b
c = (Mid(a, x, 1) Like "[a-zA-Z]")
If c = False Then
d = Left(a, x - 1)
Exit Sub
End If
Next x
Have I stumbled upon a suitable solution, or is this destined to break?
I ask only because I look at Doug Glancy's solution and it seems much more substantial.
(btw, I have not yet tested Doug's solution)
Here is a simple way which doesn't use RegEx. I am deliberately not using RegEx as the other two answer are based on RegEx. RegEx is definitely faster but this is almost equally fast. The difference in speed is almost negligible.
Function GetWord(Rng As Range)
Dim i As Long, pos As Long
For i = 1 To Len(Rng.Value)
Select Case Asc(Mid(Rng.Value, i, 1))
Case 65 To 90, 97 To 122
Case Else: pos = i: Exit For
End Select
Next i
GetWord = Left(Rng.Value, pos - 1)
End Function
Usage:
=GetWord(A1)
EDIT:
Followup from comments. Fine tuned the code (Courtesy #brettdj) .
Function GetWord(Rng As Range)
Dim i As Long, pos As Long
Dim sString As String
sString = UCase$(Rng.Value)
For i = 1 To Len(sString)
Select Case Asc(Mid$(sString, i, 1))
Case 65 To 90
Case Else: pos = i: Exit For
End Select
Next i
GetWord = Left(Rng.Value, pos - 1)
End Function
More Followup.
Here is something which I had never tried before. I did an actual test of my code vs RegXp and I was surprised to see my code was faster than RegXp which I had not anticipated.
I tested it on 10k cells and each cell had a string of 2256 of length
The string that I put in Cell A1:A10000 is
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5RoutaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeSiddharth5Rout
Next I ran this test
The regexp below looks to remove from the first non A-Z character.
Function StrChange(strIn As String) As String
Dim objRegEx As Object
Set objRegEx = CreateObject("vbscript.regexp")
With objRegEx
.ignorecase = True
.Pattern = "^([a-z]+)([^a-z].*)"
.Global = True
StrChange = .Replace(strIn, "$1")
End With
End Function
You can use a simple regular expression to specify a numeral followed by anything and use this function to replace anything that matches that pattern:
Function Regex_Replace(strOriginal As String, strPattern As String, strReplacement, varIgnoreCase As Boolean) As String
Dim objRegExp As Object
Set objRegExp = CreateObject("vbscript.regexp")
With objRegExp
.Pattern = strPattern
.IgnoreCase = varIgnoreCase
.Global = True
End With
Regex_Replace = objRegExp.Replace(strOriginal, strReplacement)
Set objRegExp = Nothing
End Function
You'd call it like this:
Sub DeleteAfterNums()
Dim cell As Excel.Range
'Change "Selection" to your range
For Each cell In Selection
'"\d.+" is a numeral and whatever follows it
cell.Value = Regex_Replace(cell.Value, "\d.+", "", True)
Next cell
End Sub
Here is a lightweight and fast method that avoids regex/reference additions, thus helping with overhead and transportability should that be an advantage.
Public Function GetText(xValue As String) As Variant
For GetText = 1 To Len(xValue)
If UCase(Mid(xValue, GetText, 1)) Like "[!A-Z]" Then GetText = Left(xValue, GetText - 1): Exit Function
Next
GetText = xValue
End Function
This is then called by using GetText("Submission String") from vba or prepended with a "=" from within a cell formula.