VBA code for extracting 3 specific number patterns - regex

I am working in excel and need VBA code to extract 3 specific number patterns. In column A I have several rows of strings which include alphabetical characters, numbers, and punctuation. I need to remove all characters except those found in a 13-digit number (containing only numbers), a ten-digit number (containing only numbers), or a 9-digit number immediately followed by an "x" character. These are isbn numbers.
The remaining characters should be separated by one, and only one, space. So, for the following string found in A1: "There are several books here, including 0192145789 and 9781245687456. Also, the book with isbn 045789541x is included. This book is one of 100000000 copies."
The output should be: 0192145789 9781245687456 045789541x
Note that the number 100000000 should not be included in the output because it does not match any of the three patterns mentioned above.
I'm not opposed to a excel formula solution as opposed to VBA, but I assumed that VBA would be cleaner. Thanks in advance.

Here's a VBA function that will do specifically what you've specified
Function ExtractNumbers(inputStr As String) As String
Dim outputStr As String
Dim bNumDetected As Boolean
Dim numCount As Integer
Dim numStart As Integer
numCount = 0
bNumDetected = False
For i = 1 To Len(inputStr)
If IsNumeric(Mid(inputStr, i, 1)) Then
numCount = numCount + 1
If Not bNumDetected Then
bNumDetected = True
bNumStart = i
End If
If (numCount = 9 And Mid(inputStr, i + 1, 1) = "x") Or _
numCount = 13 And Not IsNumeric(Mid(inputStr, i + 1, 1)) Or _
numCount = 10 And Not IsNumeric(Mid(inputStr, i + 1, 1)) Then
If numCount = 9 Then
outputStr = outputStr & Mid(inputStr, bNumStart, numCount) & "x "
Else
outputStr = outputStr & Mid(inputStr, bNumStart, numCount) & " "
End If
End If
Else
numCount = 0
bNumDetected = False
End If
Next i
ExtractNumbers = Trim(outputStr)
End Function
It's nothing fancy, just uses string functions to goes through your string one character at a time looking for sections of 9 digit numbers ending with x, 10 digit numbers and 13 digit numbers and extracts them into a new string.
It's a UDF so you can use it as a formula in your workbook

Related

Extract first floating point number from right in excel string

I have an excel column full of strings, from which I am trying to extract one number.
Here is an example of a particular row (all rows follow this format):
5) something here 93 4. something else- here too(24+Mths) Y Y 249 5) 24+ Months 1) lots more rubbish text Y N some more rubbish text 24/04/2012 25/04/1999 0.263 10 L rubbish text 3521.37233 4130 rubbish text1041023.
I just need to extract the first decimal number from the right, in this case 3521.37233.
UPDATE: I tried using Text to Columns with space as a delimiter, but there are varying number of spaces between characters. Is there a way to delimit by any number of spaces?
This is a question that can be done swiftly by Regex. Unfortunately, Excel does not support Regex using Excel formula.
You can use the following UDF (add this to your workbook).
Usage:
if you want the last decimal number(i.e. 1st from the right): =StrRegex([cell reference],"[0-9]{1,}\.[0-9]{1,}",-1)
if you want all decimal numbers: =StrRegex([cell reference],"[0-9]{1,}\.[0-9]{1,}",0)
Function StrRegex(findIn As String, pattern As String, Optional matchID As Long = 1, Optional separator As String = ",", Optional ignoreCase As Boolean = False) As String ' matchID - 1-based, matchID=0 => return all
Application.Volatile (True)
Dim result As String
Dim allMatches As Object
Dim re As Object
Set re = CreateObject("vbscript.regexp")
Dim mc As Long
Dim i As Long
Dim j As Long
re.pattern = pattern
re.Global = True
re.ignoreCase = ignoreCase
Set allMatches = re.Execute(findIn)
mc = allMatches.count
If mc > 0 Then
If matchID > mc Then
result = CVErr(xlErrNA)
Else
If matchID > 0 Then
result = allMatches.Item(matchID - 1).Value
ElseIf matchID < 0 Then
result = allMatches.Item(mc + matchID).Value
Else
result = ""
For i = 0 To allMatches.count - 1
result = result & separator & allMatches.Item(i).Value
For j = 0 To allMatches.Item(i).submatches.count - 1
result = result & separator & allMatches.Item(i).submatches.Item(j)
Next
Next
If Len(result) <> 0 Then
result = Right(result, Len(result) - Len(separator))
End If
End If
End If
Else
result = ""
End If
StrRegex = result
End Function
For any interested in a native Excel function solution, if you have the FILTERXML function, you can use:
=FILTERXML("<t><s>" & SUBSTITUTE(A1," ","</s><s>")& "</s></t>","//s[number(.) = number(.) and contains(.,'.')][last()]")
The xPath looks for all nodes that are numeric, and also contain a dot, and then returns the last node that meets those specifications.
Note: If your Windows regional settings are using the dot as a thousands separator, this will not work as written. You would have to replace the . with your system decimal separator.

Using Regex To find total occurrence of &T [duplicate]

I have a string (for example: "Hello there. My name is John. I work very hard. Hello there!") and I am trying to find the number of occurrences of the string "hello there". So far, this is the code I have:
Dim input as String = "Hello there. My name is John. I work very hard. Hello there!"
Dim phrase as String = "hello there"
Dim Occurrences As Integer = 0
If input.toLower.Contains(phrase) = True Then
Occurrences = input.Split(phrase).Length
'REM: Do stuff
End If
Unfortunately, what this line of code seems to do is split the string every time it sees the first letter of phrase, in this case, h. So instead of the result Occurrences = 2 that I would hope for, I actually get a much larger number. I know that counting the number of splits in a string is a horrible way to go about doing this, even if I did get the correct answer, so could someone please help me out and provide some assistance?
Yet another idea:
Dim input As String = "Hello there. My name is John. I work very hard. Hello there!"
Dim phrase As String = "Hello there"
Dim Occurrences As Integer = (input.Length - input.Replace(phrase, String.Empty).Length) / phrase.Length
You just need to make sure that phrase.Length > 0.
the best way to do it is this:
Public Function countString(ByVal inputString As String, ByVal stringToBeSearchedInsideTheInputString as String) As Integer
Return System.Text.RegularExpressions.Regex.Split(inputString, stringToBeSearchedInsideTheInputString).Length -1
End Function
str="Thisissumlivinginsumgjhvgsum in the sum bcoz sum ot ih sum"
b= LCase(str)
array1=Split(b,"sum")
l=Ubound(array1)
msgbox l
the output gives u the no. of occurences of a string within another one.
You can create a Do Until loop that stops once an integer variable equals the length of the string you're checking. If the phrase exists, increment your occurences and add the length of the phrase plus the position in which it is found to the cursor variable. If the phrase can not be found, you are done searching (no more results), so set it to the length of the target string. To not count the same occurance more than once, check only from the cursor to the length of the target string in the Loop (strCheckThisString).
Dim input As String = "hello there. this is a test. hello there hello there!"
Dim phrase As String = "hello there"
Dim Occurrences As Integer = 0
Dim intCursor As Integer = 0
Do Until intCursor >= input.Length
Dim strCheckThisString As String = Mid(LCase(input), intCursor + 1, (Len(input) - intCursor))
Dim intPlaceOfPhrase As Integer = InStr(strCheckThisString, phrase)
If intPlaceOfPhrase > 0 Then
Occurrences += 1
intCursor += (intPlaceOfPhrase + Len(phrase) - 1)
Else
intCursor = input.Length
End If
Loop
You just have to change the input of the split function into a string array and then delare the StringSplitOptions.
Try out this line of code:
Occurrences = input.Split({phrase}, StringSplitOptions.None).Length
I haven't checked this, but I'm thinking you'll also have to account for the fact that occurrences would be too high due to the fact that you're splitting using your string and not actually counting how many times it is in the string, so I think Occurrences = Occurrences - 1
Hope this helps
You could create a recursive function using IndexOf. Passing the string to be searched and the string to locate, each recursion increments a Counter and sets the StartIndex to +1 the last found index, until the search string is no longer found. Function will require optional parameters Starting Position and Counter passed by reference:
Function InStrCount(ByVal SourceString As String, _
ByVal SearchString As String, _
Optional ByRef StartPos As Integer = 0, _
Optional ByRef Count As Integer = 0) As Integer
If SourceString.IndexOf(SearchString, StartPos) > -1 Then
Count += 1
InStrCount(SourceString, _
SearchString, _
SourceString.IndexOf(SearchString, StartPos) + 1, _
Count)
End If
Return Count
End Function
Call function by passing string to search and string to locate and, optionally, start position:
Dim input As String = "Hello there. My name is John. I work very hard. Hello there!"
Dim phrase As String = "hello there"
Dim Occurrences As Integer
Occurrances = InStrCount(input.ToLower, phrase.ToLower)
Note the use of .ToLower, which is used to ignore case in your comparison. Do not include this directive if you do wish comparison to be case specific.
One more solution based on InStr(i, str, substr) function (searching substr in str starting from i position, more info about InStr()):
Function findOccurancesCount(baseString, subString)
occurancesCount = 0
i = 1
Do
foundPosition = InStr(i, baseString, subString) 'searching from i position
If foundPosition > 0 Then 'substring is found at foundPosition index
occurancesCount = occurancesCount + 1 'count this occurance
i = foundPosition + 1 'searching from i+1 on the next cycle
End If
Loop While foundPosition <> 0
findOccurancesCount = occurancesCount
End Function
As soon as there is no substring found (InStr returns 0, instead of found substring position in base string), searching is over and occurances count is returned.
Looking at your original attempt, I have found that this should do the trick as "Split" creates an array.
Occurrences = input.split(phrase).ubound
This is CaSe sensitive, so in your case the phrase should equal "Hello there", as there is no "hello there" in the input
Expanding on Sumit Kumar's simple solution, here it is as a one-line working function:
Public Function fnStrCnt(ByVal str As String, ByVal substr As String) As Integer
fnStrCnt = UBound(Split(LCase(str), substr))
End Function
Demo:
Sub testit()
Dim thePhrase
thePhrase = "Once upon a midnight dreary while a man was in a house in the usa."
If fnStrCnt(thePhrase, " a ") > 1 Then
MsgBox "Found " & fnStrCnt(thePhrase, " a ") & " occurrences."
End If
End Sub 'testit()
I don't know if this is more obvious?
Starting from the beginning of longString check the next characters up to the number characters in phrase, if phrase is not found start looking from the second character etc. If it is found start agin from the current position plus the number of characters in phrase and increment the value of occurences
Module Module1
Sub Main()
Dim longString As String = "Hello there. My name is John. I work very hard. Hello there! Hello therehello there"
Dim phrase As String = "hello There"
Dim occurences As Integer = 0
Dim n As Integer = 0
Do Until n >= longString.Length - (phrase.Length - 1)
If longString.ToLower.Substring(n, phrase.Length).Contains(phrase.ToLower) Then
occurences += 1
n = n + (phrase.Length - 1)
End If
n += 1
Loop
Console.WriteLine(occurences)
End Sub
End Module
I used this in Vbscript, You can convert the same to VB.net as well
Dim str, strToFind
str = "sdfsdf:sdsdgs::"
strToFind = ":"
MsgBox GetNoOfOccurranceOf( strToFind, str)
Function GetNoOfOccurranceOf(ByVal subStringToFind As String, ByVal strReference As String)
Dim iTotalLength, newString, iTotalOccCount
iTotalLength = Len(strReference)
newString = Replace(strReference, subStringToFind, "")
iTotalOccCount = iTotalLength - Len(newString)
GetNoOfOccurranceOf = iTotalOccCount
End Function
I know this thread is really old, but I got another solution too:
Function countOccurencesOf(needle As String, s As String)
Dim count As Integer = 0
For i As Integer = 0 to s.Length - 1
If s.Substring(i).Startswith(needle) Then
count = count + 1
End If
Next
Return count
End Function

Need to separate out Letters from numbers in a string, via vb.net

I would like my code to find anywhere in the string that there is a number next to a letter and insert a space between the two.
I have several addresses that have no spaces and I need to insert spaces by separating out the numbers from the letters. For example:
123MainSt.Box123
Should be 123 Main St. Box 123
or
123Parkway134
should be: 123 Parkway 134
Here is where I started with my code but it was combing both numbers in the beginning....
Dim Digits() As String = Regex.Split(Address, "[^0-9]+")
'MsgBox(Digits.Count)
If Digits.Length > 2 Then
' For Each item As String In Digits
Dim Letters As String = Regex.Replace(Address, "(?:[0-9]+\.?[0-9]*|\.[0-9]+)", "")
rCell.Value = Digits(0) & Letters & Digits(1)
End If
If Digits.Length < 3 Then
If Address.Contains("th") Then
Else
Dim Part1 As String = Regex.Replace(Address, "[^0-9]+", "")
Dim Part2 As String = Regex.Replace(Address, "(?:[0-9]+\.?[0-9]*|\.[0-9]+)", "")
'MsgBox(Part1 & " " & Part2)
rCell.Value = Part1 & " " & Part2
End If
End If
I would like my code to find anywhere in the string that there is a number next to a letter and insert a space between the two.
The regex you may use is
Regex.Replace(input, "(?<=\d)(?=\p{L})|(?<=\p{L})(?=\d)", " ")
The first alternative - (?<=\d)(?=\p{L}) - matches the location between a digit and a letter, and the second alternative - (?<=\p{L})(?=\d) - matches a location between a letter and a digit.
Note that (?<=\p{L}) is a positive lookbehind that requires a letter before the current position and (?=\d) is a positive lookahead that requires a digit after the current position. These are lookarounds that do not consume text, thus, you replace empty spaces with (= insert) a space.
Here's a quick function:
Private Function AddSpaces(ByVal input As String) As String
If input.Length < 2 Then Return input
Dim ReturnValue As String = String.Empty
Dim CurrentChar, NextChar As String
For x As Integer = 1 To input.Length
CurrentChar = Mid(input, x, 1)
NextChar = Mid(input, x + 1, 1)
ReturnValue &= CurrentChar
If (IsNumeric(CurrentChar) AndAlso (Not IsNumeric(NextChar))) OrElse
((Not IsNumeric(CurrentChar)) AndAlso IsNumeric(NextChar)) Then
ReturnValue &= " "
End If
Next
Return ReturnValue
End Function

Excel UDF for capturing numbers within characters

I have a variable text field sitting in cell A1 which contains the following:
Text;#Number;#Text;#Number
This format can keep repeating, but the pattern is always Text;#Number.
The numbers can vary from 1 digit to n digits (limit 7)
Example:
Original Value
MyName;#123;#YourName;#3456;#HisName;#78
Required value:
123, 3456, 78
The field is too variable for excel formulas from my understanding.
I tried using regexp but I am a beginner when it comes to coding. if you can break down the code with some explanation text, it would be much appreciated.
I have tried some of the suggestions below and they work perfectly. One more question.
Now that I can split the numbers from the text, is there any way to utilize the code below and add another layer, where we split the numbers into x cells.
For example: once we run the function, if we get 1234, 567 in the same cell, the function would put 1234 in cell B2, and 567 in cell C2. This would keep updating all cells in the same row until the string has exhausted all of the numbers that are retrieved from the function.
Thanks
This is the John Coleman's suggested method:
Public Function GetTheNumbers(st As String) As String
ary = Split(st, ";#")
GetTheNumbers = ""
For Each a In ary
If IsNumeric(a) Then
If GetTheNumbers = "" Then
GetTheNumbers = a
Else
GetTheNumbers = GetTheNumbers & ", " & a
End If
End If
Next a
End Function
If the pattern is fixed, and the location of the numbers never changes, you can assume the numbers will be located in the even places in the string. This means that in the array result of a split on the source string, you can use the odd indexes of the resulting array. For example in this string "Text;#Number;#Text;#Number" array indexes 1, 3 would be the numbers ("Text(0);#Number(1);#Text(2);#Number(3)"). I think this method is easier and safer to use if the pattern is indeed fixed, as it avoids the need to verify data types.
Public Function GetNums(src As String) As String
Dim arr
Dim i As Integer
Dim result As String
arr = Split(src, ";#") ' Split the string to an array.
result = ""
For i = 1 To UBound(arr) Step 2 ' Loop through the array, starting with the second item, and skipping one item (using Step 2).
result = result & arr(i) & ", "
Next
If Len(result) > 2 Then
GetNums = Left(result, Len(result) - 2) ' Remove the extra ", " at the end of the the result string.
Else
GetNums = ""
End If
End Function
The numbers can vary from 1 digit to n digits (limit 7)
None of the other responses seems to take the provided parameters into consideration so I kludged together a true regex solution.
Option Explicit
Option Base 0 '<~~this is the default but I've included it because it has to be 0
Function numsOnly(str As String, _
Optional delim As String = ", ")
Dim n As Long, nums() As Variant
Static rgx As Object, cmat As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
End If
numsOnly = vbNullString
With rgx
.Global = True
.MultiLine = False
.Pattern = "[0-9]{1,7}"
If .Test(str) Then
Set cmat = .Execute(str)
'resize the nums array to accept the matches
ReDim nums(cmat.Count - 1)
'populate the nums array with the matches
For n = LBound(nums) To UBound(nums)
nums(n) = cmat.Item(n)
Next n
'convert the nums array to a delimited string
numsOnly = Join(nums, delim)
End If
End With
End Function
      
Regexp option that uses Replace
Sub Test()
Debug.Print StrOut("MyName;#123;#YourName;#3456;#HisName;#78")
End Sub
function
Option Explicit
Function StrOut(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "(^|.+?)(\d{1,7})"
.Global = True
If .Test(strIn) Then
StrOut = .Replace(strIn, "$2, ")
StrOut = Left$(StrOut, Len(StrOut) - 2)
Else
StrOut = "Nothing"
End If
End With
End Function

Keep trailing zeroes in regex matching formula

I have a function I've written to handle calculations of percent reductions with significant digits, and I'm having a problem with keeping trailing zeroes.
The function:
Function RegexReduction(IValue As Double, EValue As Double) As String
Dim TempPercent As Double
Dim TempString As String
Dim NumFormat As String
Dim DecPlaces As Long
Dim regex As Object
Dim rxMatches As Object
TempPercent = (1 - EValue / IValue)
NumFormat = "0"
Set regex = CreateObject("VBScript.RegExp")
With regex
.Pattern = "([^1-8])*[0-8]{1}[0-9]?"
.Global = False
End With
Set rxMatches = regex.Execute(CStr(TempPercent))
If rxMatches.Count <> 0 Then
TempString = rxMatches.Item(0)
DecPlaces = Len(Split(TempString, ".")(1)) - 2
If DecPlaces > 0 Then NumFormat = NumFormat & "." & String(DecPlaces, "0")
End If
RegexReduction = Format(TempPercent, NumFormat & "%")
End Function
This trims percentages to two digits after any leading zeroes or nines:
99.999954165% -> 99.99954%
34.564968% -> 35%
0.000516% -> 0.00052%
The one problem I've found isn't related to the regex, but to Excel's rounding:
99.50% -> 99.5%
Is there a solution that will save trailing zeroes that could be implemented here?
I suggest a version of your function that uses LTrim in combination with Replace instead of the (costly) regular expression to calculate the value of DecPlaces. The calculation of DecPlaces has become a "one-liner".
The rest of the code is the same except for the additional call to CDec to avoid CStr from returning a scientific notation (like 1.123642E-12) when the value is tiny.
Function Reduction(IValue As Double, EValue As Double) As String
Dim TempPercent As Double
Dim TempString As String
Dim NumFormat As String
Dim DecPlaces As Long
TempPercent = (1 - EValue / IValue)
' Apply CDec so tiny numbers do not get scientific notation
TempString = CStr(CDec(TempPercent))
' Count number of significant digits present by trimming away all other chars,
' and subtract from total length to get number of decimals to display
DecPlaces = Len(TempString) - 2 - _
Len(LTrim(Replace(Replace(Replace(TempString, "0"," "), "9"," "), "."," ")))
' Prepare format of decimals, if any
If DecPlaces > 0 Then NumFormat = "." & String(DecPlaces, "0")
' Apply format
Reduction = Format(TempPercent, "0" & NumFormat & "%")
End Function
It is assumed that TempPercent evaluates to a value between 0 and 1.
Comments on your code
You wrote:
The one problem I've found isn't related to the regex, but to Excel's rounding:
99.50% -> 99.5%
This is actually not related to Excel's rounding. In your code the following
DecPlaces = Len(Split(TempString, ".")(1)) - 2
will evaluate to Len(Split("0.995", ".")(1)) - 2, which is 1, and so the format you apply is 0.0%, explaining the output you get.
Also realise that although you have a capturing group in your regular expression, you do not actually use it. rxMatches.Item(0) will give you the complete matched string, not only the match with the capture group.
You apply a number format of 0% for the case the regular expression does not yield a match. Any number that has no other digits than 0 and 9 will not match. For instance 0.099 should be displayed with format 0.000% to give 9.900, but the format used is 0% as you have no Else block treating this case.
Finally, CStr can turn numbers into scientific notation, which will give wrong results as well. It seems with CDec this can be avoided.
Here is a UDF that attempts to 'read' the incoming raw (non-percentage) value in order to determine the number of decimal places to include.
Function udf_Specific_Scope(rng As Range)
Dim i As Long, str As String
str = rng.Value2 'raw value is 0.999506 for 99.9506%
For i = 1 To Len(str) - 1
If Asc(Mid(str, i, 1)) <> 48 And _
Asc(Mid(str, i, 1)) <> 57 And _
Asc(Mid(str, i, 1)) <> 46 Then _
Exit For
Next i
If InStr(1, str, Chr(46)) < i - 1 Then
udf_Specific_Scope = Val(Format(rng.Value2 * 100, "0." & String(i - 3, Chr(48)))) & Chr(37)
Else
udf_Specific_Scope = Format(rng.Value2, "0%")
End If
End Function
    
The disadvantage here is removing the numerical value from the cell entry but that mirrors your original RegEx method. Ideally, something like the above could be written as a sub based on the Application.Selection property. Just highlight (aka Select) some cells, run the sub and it assigns a cell number format with the correct number of decimals to each in the selection.