Slight adaptation of a User Defined Function - regex

I would like to extract a combination of text and numbers from a larger string located within a column within excel.
The constants I have to work with is that each Text string will
•either start with a A, C or S, and
•will always be 7 Characters long
•the position of he string I would like to extract varies
The code I have been using which has been working efficiently is;
Public Function Xtractor(r As Range) As String
Dim a, ary
ary = Split(r.Text, " ")
For Each a In ary
If Len(a) = 7 And a Like "[SAC]*" Then
Xtractor = a
Exit Function
End If
Next a
Xtractor = ""
End Function
However today I have learnt that sometimes my data may include scenarios like this;
What I would like is to adapt my code so If the 8th character is "Underscore" and the 1st character of the 7 characters is either S, A or C please extract up until the "Underscore"
Secondly I would like to exclude commons words like "Support" & "Collect" from being extracted.
Finally the 7th letter should be a number
Any ideas around this would be much appreciated.
Thanks

try this
ary = Split(Replace(r.Text, "_", " "))
or
ary = Split(Replace(r.Text, "_", " ")," ")
result will be same for both variants
test
update
Do you know how I could leave the result blank if the 7th character returned a letter?
Public Function Xtractor(r As Range) As String
Dim a, ary
ary = Split(Replace(r.Text, "_", " "))
For Each a In ary
If Len(a) = 7 And a Like "[SAC]*" And IsNumeric(Mid(a, 7, 1)) Then
Xtractor = a
Exit Function
End If
Next a
Xtractor = ""
End Function
test

Add Microsoft VBScript Regular Expressions 5.5 to project references. Use the following code to test matching and extracting with Xtractor:
Public Function Xtractor(ByVal p_val As String) As String
Xtractor = ""
Dim ary As String, v_re As New VBScript_RegExp_55.RegExp, Matches
v_re.Pattern = "^([SAC][^_]{1,6})_?"
Set Matches = v_re.Execute(p_val)
If Matches.Count > 0 Then Xtractor = Matches(0).SubMatches(0) Else Xtractor = ""
End Function
Sub test_Xtractor(p_cur As Range, p_val As String, p_expected As String)
Dim v_cur As Range, v_res As Range
p_cur.Value = p_val
Set v_cur = p_cur.Offset(columnOffset:=1)
v_cur.FormulaR1C1 = "='" & ThisWorkbook.Name & "'!Xtractor(RC[-1])"
Set v_res = v_cur.Offset(columnOffset:=1)
v_res.FormulaR1C1 = "=RC[-1]=""" & p_expected & """"
Debug.Print p_val; "->"; v_cur.Value; ":"; v_res.Value
End Sub
Sub test()
test_Xtractor ActiveCell, "A612002_MDC_308", "A612002"
test_Xtractor ActiveCell.Offset(1), "B612002_MDC_308", ""
test_Xtractor ActiveCell.Offset(2), "SUTP038_MDC_3", "SUTP038"
test_Xtractor ActiveCell.Offset(3), "KUTP038_MDC_3", ""
End Sub
Choose the workbook and cell for writing test fixture, then run test from the VBA Editor.
Output in the Immediate window (Ctrl+G):
A612002_MDC_308->A612002:True
B612002_MDC_308->:True
SUTP038_MDC_3->SUTP038:True
KUTP038_MDC_3->:True
UPD
Isit possible to ammend this code so if the 7th character is a letter to return blank?
Replace line with assign to v_re by the following:
v_re.Pattern = "^([SAC](?![^_]{5}[A-Z]_?)[^_]{1,6})_?"
v_re.IgnoreCase = True
And add to the test suite:
test_Xtractor ActiveCell.Offset(4), "SUTP03A_MDC_3", ""
Output:
A612002_MDC_308->A612002:True
B612002_MDC_308->:True
SUTP038_MDC_3->SUTP038:True
KUTP038_MDC_3->:True
SUTP03A_MDC_3->:True
I inserted negative lookahead subrule (?![^_]{5}[A-Z]_?) to reject SUTP03A_MDC_3. But pay attention: the rejecting rule is applied exactly to the 7th character. Now v_re.IgnoreCase set to True, but if only capitalized characters are allowed, set it to False. See also Regular Expression Syntax on MSDN.

Related

UDF (Regular expression) to match a string variants with some exclusions

I need to use (Regular expression) on the string Mod* followed by a specific one character e.g. "A" , like:
Mod A , Mod_A , Module xx A , Modules (A & B) and so on.
But, with the following conditions:
(1)- if the cell contains any of (Modif* or Moder* or Modr*) and Mod* Plus my specific character then the result is True
(2)- if the cell contains any of (Modif* or Moder* or Modr*) and not Mod* Plus my specific character then the result is False
Please this example and the expected result:
Item Description
Expected Result of RegexMatch
new modified of module A 1
TRUE
new modification of mod A
TRUE
new moderate of mod_A
TRUE
to modules (A & B)
TRUE
new modified and moderate A 1
FALSE
new modification of  A
FALSE
new moderate of modify
FALSE
to modules (D & E)
FALSE
Public Function RegexMatch(str) As Boolean
Dim tbx2 As String: tbx2 = "A" 'ActiveSheet.TextBox2.Value
Static re As New RegExp
re.Pattern = "\b[M]od(?!erate).*\b[" & tbx2 & "]\b"
re.IgnoreCase = True
RegexMatch = re.Test(str)
End Function
In advance, great thanks for your kindly help.
Not sure if I understand your requirements correctly: You want rows that contain a word that starts with "mod", but words starting with "Modif" or "Moder" or "Modr" doesn't count. Additionally, a module character (eg "A") needs to be present.
I usually get dizzy when I see longer regex terms, so I try to program some lines of code instead. The following function replaces special characters like "(" or "_" with blanks, splits the string into words and check the content word by word. Easy to understand, easy to adapt:
Function CheckModul(s As String, modulChar As String) As Boolean
Dim words() As String
words = Split(replaceSpecialChars(s), " ")
Dim i As Long, hasModul As Boolean, hasModulChar As Boolean
For i = 0 To UBound(words)
Dim word As String
word = UCase(words(i))
If word Like "MOD*" _
And Not word Like "MODIF*" _
And Not word Like "MODER*" _
And Not word Like "MODR*" Then
hasModul = True
End If
If word = modulChar Then
hasModulChar = True
End If
Next
CheckModul = hasModul And hasModulChar
End Function
Function replaceSpecialChars(ByVal s As String) As String
Dim i As Long
replaceSpecialChars = s
For i = 1 To Len(replaceSpecialChars)
If Mid(replaceSpecialChars, i, 1) Like "[!0-9A-Za-z]" Then Mid(replaceSpecialChars, i) = " "
Next
End Function
Tested as UDF with your data:

Vba: Regular expression to count the number of words in a string delimited by special characters

Need some help writing a regular expression to count the number of words in a string (Please note the data is a html string, which needs to be placed into a spreadsheet) when separated either by any special characters like . , - , +, /, Tab etc. Count should exclude special characters.
**Original String** **End Result**
Ex : One -> 1
One. -> 1
One Two -> 2
One.Two -> 2
One Two. -> 2
One.Two. -> 2
One.Tw.o -> 3
Updated
I think you asked a valuable question and this downvoting is not fair!
Function WCount(ByVal strWrd As String) As Long
'Variable declaration
Dim Delimiters() As Variant
Dim Delimiter As Variant
'Initialization
Delimiters = Array("+", "-", ".", "/", Chr(13), Chr(9)) 'Define your delimiter characters here.
'Core
For Each Delimiter In Delimiters
strWrd = Replace(strWrd, Delimiter, " ")
Next Delimiter
strWrd = Trim(strWrd)
Do While InStr(1, strWrd, " ") > 0
strWrd = Replace(strWrd, " ", " ")
Loop
WCount = UBound(Split(strWrd, " ")) + 1
End Function
________________
You can use this function as a UDF in excel formulas or can use in another VBA codes.
Using in formula
=WCOUNT("One.Two.Three.") or =WCOUNT($A$1") assuming your string is in A1 cell.
Using in VBA
(With assume passing your string with Str argument.)
Sub test()
Debug.Print WCount(Str)
End Sub
Regards.
Update
I have test your text as shown below.
copy your text in a Cell of Excel as shown.
The code updated for Line break and Tab characters and count your string words correctly now.
Try this code, all necessary comments are in code:
Sub SpecialSplit()
Dim i As Long
Dim str As String
Dim arr() As String
Dim delimeters() As String
'here you define all special delimeters you want to use
delimetres = Array(".", "+", "-", "/")
For i = 1 To 9
str = Cells(i, 1).Value
'this will protect us from situation where last character is delimeter and we have additional empty string
str = Left(str, Len(str) - 1)
'here we replace all special delimeters with space to simplify
For Each delimeter In delimetres
str = Replace(str, delimeter, " ")
Next
arr = Split(str)
Cells(i, 2).Value = UBound(arr) - LBound(arr) + 1
Next
End Sub
With your posted data following RegExp is working correctly. Put this in General Module in Visual Basic Editor.
Public Function CountWords(strInput As String) As Long
Dim objMatches
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.Pattern = "\w+"
Set objMatches = .Execute(strInput)
CountWords = objMatches.Count
End With
End Function
You have to use it like a normal formula. e.g. assuming data is in cell A1 function would be:
=CountWords(A1)
For your information, it can be also achieved through formula if number of characters are specific like so:
=LEN(TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(A1),"."," "),","," "),"-"," "),"+"," "),"/"," "),"\"," ")))-LEN(SUBSTITUTE(TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(A1),"."," "),","," "),"-"," "),"+"," "),"/"," "),"\"," "))," ",""))+1

Getting numbers out of cells with text and producing a list with all these numbers [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a column given to me in a spreadsheet which looks like that:
What I need is to get all the references out, the ones in square brackets, to provide with the full list to a user:
... and then get a full list of all references, as follows:
Does anyone have an idea of how I can do this using any Excel formulas/filtering or maybe VBA?
assuming:
worksheet to process named after "pressure"
column "A" with cells to get references out of
column "B" to write corresponding extracted references in
column "C"to write full list of all references in
you could try this
Option Explicit
Sub main()
Dim cell As Range
Dim references As String
Dim referencesArr As Variant
With Worksheets("pressure") '<-- change "pressure" to your actual worksheet name
For Each cell In .Range("A1", .Cells(.Rows.Count, 1).End(xlUp))
references = references & GetReferences(cell) & "; "
Next cell
If references <> "" Then
referencesArr = Split(Left(references, Len(references) - 2), ";")
.Range("C1").Resize(UBound(referencesArr)).Value = Application.Transpose(referencesArr)
End If
End With
End Sub
Function GetReferences(rng As Range) As String
Dim arr As Variant, iElem As Long
Dim strng As String
With rng
arr = Split(Replace(Replace(.Value, "[", "|["), "]", "]|"), "|")
For iElem = 1 To UBound(arr) - 1 Step 2
strng = strng & Mid(CStr(arr(iElem)), 2, Len(CStr(arr(iElem))) - 2) & "; "
Next iElem
End With
If strng <> "" Then
GetReferences = Left(strng, Len(strng) - 2)
rng.Offset(, 1) = GetReferences
End If
End Function
There are many examples of regex number parsing¹ from text on this site. Pulling numbers from narrative text is one of the easier regular expression 'patterns'² to construct; especially so with a fixed number of digits regardless of delimiter or grouping character(s).
Put the following into a standard module code sheet.
Option Explicit
Option Base 0 '<~~this is the default but I've included it because it has to be 0
Function numberParse(str As String, _
Optional ndx As Integer = 0, _
Optional delim As String = "; ") As Variant
Dim n As Long, nums() As Variant
Static rgx As Object, cmat As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
Else
Set cmat = Nothing
End If
numberParse = vbNullString
With rgx
.Global = True
.MultiLine = True
.Pattern = "[0-9]{4}"
If .Test(str) Then
Set cmat = .Execute(str)
If CBool(ndx) Then
'pull the index of the array of matches
numberParse = cmat.Item(ndx - 1)
Else
'resize the nums array to accept the matches
ReDim nums(cmat.Count - 1)
'populate the nums array with the matches
For n = LBound(nums) To UBound(nums)
nums(n) = cmat.Item(n)
Next n
'convert the nums array to a delimited string
numberParse = Join(nums, delim)
End If
End If
End With
End Function
With your blurb in A2, put the following into B2,
=numberParse(A2)
With your blurb in A2, put the following into A4 and fill down,
=numberParse(A$2, ROW(1:1))
Your results should resemble the following,
¹ The above was modified from my response in Excel UDF for capturing numbers within characters which wasn't that hard to find.
² See How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops for more information.
For a quick start, you can use =MID(A1,SEARCH("[",A1)+1,SEARCH("]",A1)-SEARCH("[",A1)-1) to extract the text between the brackets. Then you're left with a string, separated by semicolons.
Then, you can run this sub (with tweaking most likely, to narrow down the ranges):
Sub splitSemiColons()
Dim myArray() As String
Dim colToUse As Long
colToUse = 3
myArray = Split(Range("B1"), ";")
Dim i As Long
For i = LBound(myArray) To UBound(myArray)
Cells(i + 1, colToUse).Value = myArray(i)
Next i
End Sub
Or, you can avoid this macro, and just use Data --> Text to Columns --> Use ; delimiter, then copy and paste transposed.

Excel UDF for capturing numbers within characters

I have a variable text field sitting in cell A1 which contains the following:
Text;#Number;#Text;#Number
This format can keep repeating, but the pattern is always Text;#Number.
The numbers can vary from 1 digit to n digits (limit 7)
Example:
Original Value
MyName;#123;#YourName;#3456;#HisName;#78
Required value:
123, 3456, 78
The field is too variable for excel formulas from my understanding.
I tried using regexp but I am a beginner when it comes to coding. if you can break down the code with some explanation text, it would be much appreciated.
I have tried some of the suggestions below and they work perfectly. One more question.
Now that I can split the numbers from the text, is there any way to utilize the code below and add another layer, where we split the numbers into x cells.
For example: once we run the function, if we get 1234, 567 in the same cell, the function would put 1234 in cell B2, and 567 in cell C2. This would keep updating all cells in the same row until the string has exhausted all of the numbers that are retrieved from the function.
Thanks
This is the John Coleman's suggested method:
Public Function GetTheNumbers(st As String) As String
ary = Split(st, ";#")
GetTheNumbers = ""
For Each a In ary
If IsNumeric(a) Then
If GetTheNumbers = "" Then
GetTheNumbers = a
Else
GetTheNumbers = GetTheNumbers & ", " & a
End If
End If
Next a
End Function
If the pattern is fixed, and the location of the numbers never changes, you can assume the numbers will be located in the even places in the string. This means that in the array result of a split on the source string, you can use the odd indexes of the resulting array. For example in this string "Text;#Number;#Text;#Number" array indexes 1, 3 would be the numbers ("Text(0);#Number(1);#Text(2);#Number(3)"). I think this method is easier and safer to use if the pattern is indeed fixed, as it avoids the need to verify data types.
Public Function GetNums(src As String) As String
Dim arr
Dim i As Integer
Dim result As String
arr = Split(src, ";#") ' Split the string to an array.
result = ""
For i = 1 To UBound(arr) Step 2 ' Loop through the array, starting with the second item, and skipping one item (using Step 2).
result = result & arr(i) & ", "
Next
If Len(result) > 2 Then
GetNums = Left(result, Len(result) - 2) ' Remove the extra ", " at the end of the the result string.
Else
GetNums = ""
End If
End Function
The numbers can vary from 1 digit to n digits (limit 7)
None of the other responses seems to take the provided parameters into consideration so I kludged together a true regex solution.
Option Explicit
Option Base 0 '<~~this is the default but I've included it because it has to be 0
Function numsOnly(str As String, _
Optional delim As String = ", ")
Dim n As Long, nums() As Variant
Static rgx As Object, cmat As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
End If
numsOnly = vbNullString
With rgx
.Global = True
.MultiLine = False
.Pattern = "[0-9]{1,7}"
If .Test(str) Then
Set cmat = .Execute(str)
'resize the nums array to accept the matches
ReDim nums(cmat.Count - 1)
'populate the nums array with the matches
For n = LBound(nums) To UBound(nums)
nums(n) = cmat.Item(n)
Next n
'convert the nums array to a delimited string
numsOnly = Join(nums, delim)
End If
End With
End Function
      
Regexp option that uses Replace
Sub Test()
Debug.Print StrOut("MyName;#123;#YourName;#3456;#HisName;#78")
End Sub
function
Option Explicit
Function StrOut(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "(^|.+?)(\d{1,7})"
.Global = True
If .Test(strIn) Then
StrOut = .Replace(strIn, "$2, ")
StrOut = Left$(StrOut, Len(StrOut) - 2)
Else
StrOut = "Nothing"
End If
End With
End Function

VBA: REGEX LOOKBEHIND MS ACCESS 2010

I have a function that was written so that VBA can be used in MS Access
I wish to do the following
I have set up my code below. Everything before the product works perfectly but trying to get the information behind just returns "" which is strange as when i execute it within Notepad++ it works perfectly fine
So it looks for the letters MIP and one of the 3 letter codes (any of them)
StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
' PART 1
' If MIP appears in the string, then delete any of the following codes if they exist - DOM, DOX, DDI, ECX, LOW, WPX, SDX, DD6, DES, BDX, CMX,
' WMX, TDX, TDT, BSA, EPA, EPP, ACP, ACA, ACE, ACS, GMB, MAL, USP, NWP.
' EXAMPLE 1. Flagged as: MADHUBESOMIPTDTLTRCOYORGLEJ, should be MADHUBESOMIPLTRCOYORGLEJ
Do While regexp(StringToCheck, "MIP(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False) <> ""
' SELECT EVERYTHING BEFORE THE THREE LETTER CODES
strPart1 = regexp(StringToCheck, ".*^[^_]+(?=DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)", False)
' SELECT EVERYTHING AFTER THE THREE LETTER CODES
strPart2 = regexp(StringToCheck, "(?<=(DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX).*", False)
StringToCheck = strPart1 & strPart2
Loop
The function i am using which i have taken from the internet is below
Function regexp(StringToCheck As Variant, PatternToUse As String, Optional CaseSensitive As Boolean = True) As String
On Error GoTo RefErr:
Dim re As New regexp
re.Pattern = PatternToUse
re.Global = False
re.IgnoreCase = Not CaseSensitive
Dim m
For Each m In re.Execute(StringToCheck)
regexp = UCase(m.Value)
Next
RefErr:
On Error Resume Next
End Function
Just do it in two steps:
Check if MIP is in the string
If it is, remove the other codes.
Like this:
Sub Test()
Dim StringToCheck As String
StringToCheck = "MADHUBESOMIPTDTLTRCOYORGLEJ"
Debug.Print StringToCheck
Debug.Print CleanupString(StringToCheck)
End Sub
Function CleanupString(str As String) As String
Dim reCheck As New RegExp
Dim reCodes As New RegExp
reCheck.Pattern = "^(?:...)*?MIP"
reCodes.Pattern = "^((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
reCodes.Global = True
If reCheck.Test(str) Then
While reCodes.Test(str)
str = reCodes.Replace(str, "$1")
Wend
End If
CleanupString = str
End Function
Note that the purpose of (?:...)*? is to group the letters in threes.
Since the VBScript regular expression engine does support look-aheads, you can of course also do it in a single regex:
Function CleanupString(str As String) As String
Dim reClean As New RegExp
reClean.Pattern = "^(?=(?:...)*?MIP)((?:...)*?)(?:DOM|DOX|DDI|ECX|LOW|WPX|SDX|DD6|DES|BDX|CMX|WMX|TDX|TDT|BSA|EPA|EPP|ACP|ACA|ACE|ACS|GMB|MAL|USP|NWP|BBX)"
While reClean.Test(str)
str = reClean.Replace(str, "$1")
Wend
CleanupString = str
End Function
Personally, I like the two-step check/remove pattern better because it is a lot more obvious and therefore more maintainable.
Non RE option:
Function DeMIPString(StringToCheck As String) As String
If Not InStr(StringToCheck, "MIP") Then
DeMIPString = StringToCheck
Else
Dim i As Long
For i = 1 To Len(StringToCheck) Step 3
Select Case Mid$(StringToCheck, i, 3)
Case "MIP", "DOM", "DOX", "DDI", "ECX", "LOW", "WPX", "SDX", "DD6", "DES", "BDX", "CMX", "WMX", "TDX", "TDT", "BSA", "EPA", "EPP", "ACP", "ACA", "ACE", "ACS", "GMB", "MAL", "USP", "NWP":
Case Else
DeMIPString = DeMIPString & Mid$(StringToCheck, i, 3)
End Select
Next
End If
End Function