Extracting specific words from a single cell containing text string - regex

Basically I have a very long text containing multiple spaces, special characters, etc. in one cell in an excel file and I need to extract only specific words from it, each one to a seperate cell in another column.
What I'm looing for:
symbols that are always 9 characters in lenght, and always contain at least one number (up to 9).
So for an example in A1 I have:
euhe: djj33 dkdakofja. kaowdk ---------- jffjbrjjjj j jrjj 08/01/2222 999ABC123
fjfjfj 321XXX888 .... ........ 123456789AA
And in the end I want to have:
999ABC123 in B1
and
321XXX888 in B2.
Right now I'm doing this by using Text to columns feature and then just looking for specific words manually but sometimes the volume is so big it takes too much time and would be cool to automate this.
Can anyone help with this? Thank you!
EDIT:
More examples:
INPUT: '10/01/2016 1,060X 8.999%!!! 1.33 0.666 928888XE0'
OUTPUT: '928888XE0'
INPUT: 'ABCDEBATX ..... ,,00,001% 20///^^ addcA7 7777a 123456789 djaoij8888888 0.000001 12#'
OUTPUT: '123456789'
INPUT: 'FAR687465 B22222222 __ djj^66 20/20/20/20 1:'
OUTPUT: 'FAR687465' in B1 'B22222222' in B2
INPUT: 'fil476 .00 20/.. BUT AAAAAAAAA k98776 000.0001'
OUTPUT: 'blank'
To clarify: the 9 character string can be anywhere, there is no rule what is before or after them, they can be next to each other, or just at the beginning and end of this wall of text, no rules here, the text is random, taken out of some system, can contain dates, etc anything... The symbols are always 9 characters long and they are not the only 9 character symbols in the text. I call them symbols but they should only consist of numbers and letters. Can be only numbers, but never only letters. A1 cell can contain multiple spaces/tabs between words/symbols.
Also if possible to do this not only for A1, but the whole column A until it finds the first blank cell.

Try this code
Sub Test()
Dim r As Range
Dim i As Long
Dim m As Long
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "\b[a-zA-Z\d]{9}\b"
For Each r In Range("A1", Range("A" & Rows.Count).End(xlUp))
If .Test(r.Value) Then
For i = 0 To .Execute(r.Value).Count - 1
If CBool(.Execute(r.Value)(i) Like "*[0-9]*") Then
m = IIf(Cells(1, 2).Value = "", 1, Cells(Rows.Count, 2).End(xlUp).Row + 1)
Cells(m, 2).Value = .Execute(r.Value)(i)
End If
Next i
End If
Next r
End With
End Sub

This bit of code is almost it... just need to check the strings... but excel crashes on the Str line of code
Sub Test()
Dim Outputs, i As Integer, LastRow As Long, Prueba, Prueba2
Outputs = Split(Range("A1"), " ")
For i = 0 To UBound(Outputs)
If Len(Outputs(i)) = 9 Then
Prueba = 0
Prueba2 = 0
On Error Resume Next
Prueba = Val(Outputs(i))
Prueba2 = Str(Outputs(i))
On Error GoTo 0
If Prueba <> 0 And Prueba2 <> 0 Then
LastRow = Range("B10000").End(xlUp).Row + 1
Cells(LastRow, 2) = Outputs(i)
End If
End If
Next i
End Sub
If someone could help to set the string check.. that would do the thing I guess.

Related

Regex to move cells based on leading spaces

Ill start by saying that I'm not a coder, only someone who very rarely dabbles to make spreadsheets slightly more bearable.
I currently have some data that I need to break out into columns based on the number of leading spaces in the cell. Basically, if the cell begins with 2 spaces move it 1 column to the right, If there are 3 spaces, move it 2 columns to the right and so on.
I realised that I would need to use regex for this as FIND and LEFT would match all of the 3 space cells when searching for 2 space cells.
So I searched around and cobbled together this mess
Sub MoveStuff()
Dim RE as Object
Dim LSearchRow As Long
Dim LCopyToColumn As Long
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = " (a-zA-Z)"
LSearchRow = 2
While Len(Cells(LSearchRow, "B").Value) > 0
If RE.Test(Cells(LSearchRow, "B").Value) Then
Up to here, it will match correctly, but I don't know how to get it to shift the cell over. Then I'll obviously need to have multiple RE.Patterns and If statements to match 3 and 4 space cells
A general solution is the following. You count the leading spaces (let's call this value N), then remove them from your cell value and copy the cell N column on the right.
Public Sub movestuff()
Dim curr_row, curr_column, s
curr_column = 2 'COLUMN "B"
curr_row = 1
While (ActiveSheet.Cells(curr_row, curr_column) <> "")
s = ActiveSheet.Cells(curr_row, curr_column)
For x = 1 To Len(s) Step 1
If Mid(s, x, 1) <> " " Then
Exit For
End If
Next
s = Mid(s, x)
ActiveSheet.Cells(curr_row, curr_column + (x - 1)) = s
curr_row = curr_row + 1
Wend
End Sub

Getting numbers out of cells with text and producing a list with all these numbers [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a column given to me in a spreadsheet which looks like that:
What I need is to get all the references out, the ones in square brackets, to provide with the full list to a user:
... and then get a full list of all references, as follows:
Does anyone have an idea of how I can do this using any Excel formulas/filtering or maybe VBA?
assuming:
worksheet to process named after "pressure"
column "A" with cells to get references out of
column "B" to write corresponding extracted references in
column "C"to write full list of all references in
you could try this
Option Explicit
Sub main()
Dim cell As Range
Dim references As String
Dim referencesArr As Variant
With Worksheets("pressure") '<-- change "pressure" to your actual worksheet name
For Each cell In .Range("A1", .Cells(.Rows.Count, 1).End(xlUp))
references = references & GetReferences(cell) & "; "
Next cell
If references <> "" Then
referencesArr = Split(Left(references, Len(references) - 2), ";")
.Range("C1").Resize(UBound(referencesArr)).Value = Application.Transpose(referencesArr)
End If
End With
End Sub
Function GetReferences(rng As Range) As String
Dim arr As Variant, iElem As Long
Dim strng As String
With rng
arr = Split(Replace(Replace(.Value, "[", "|["), "]", "]|"), "|")
For iElem = 1 To UBound(arr) - 1 Step 2
strng = strng & Mid(CStr(arr(iElem)), 2, Len(CStr(arr(iElem))) - 2) & "; "
Next iElem
End With
If strng <> "" Then
GetReferences = Left(strng, Len(strng) - 2)
rng.Offset(, 1) = GetReferences
End If
End Function
There are many examples of regex number parsing¹ from text on this site. Pulling numbers from narrative text is one of the easier regular expression 'patterns'² to construct; especially so with a fixed number of digits regardless of delimiter or grouping character(s).
Put the following into a standard module code sheet.
Option Explicit
Option Base 0 '<~~this is the default but I've included it because it has to be 0
Function numberParse(str As String, _
Optional ndx As Integer = 0, _
Optional delim As String = "; ") As Variant
Dim n As Long, nums() As Variant
Static rgx As Object, cmat As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
Else
Set cmat = Nothing
End If
numberParse = vbNullString
With rgx
.Global = True
.MultiLine = True
.Pattern = "[0-9]{4}"
If .Test(str) Then
Set cmat = .Execute(str)
If CBool(ndx) Then
'pull the index of the array of matches
numberParse = cmat.Item(ndx - 1)
Else
'resize the nums array to accept the matches
ReDim nums(cmat.Count - 1)
'populate the nums array with the matches
For n = LBound(nums) To UBound(nums)
nums(n) = cmat.Item(n)
Next n
'convert the nums array to a delimited string
numberParse = Join(nums, delim)
End If
End If
End With
End Function
With your blurb in A2, put the following into B2,
=numberParse(A2)
With your blurb in A2, put the following into A4 and fill down,
=numberParse(A$2, ROW(1:1))
Your results should resemble the following,
¹ The above was modified from my response in Excel UDF for capturing numbers within characters which wasn't that hard to find.
² See How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops for more information.
For a quick start, you can use =MID(A1,SEARCH("[",A1)+1,SEARCH("]",A1)-SEARCH("[",A1)-1) to extract the text between the brackets. Then you're left with a string, separated by semicolons.
Then, you can run this sub (with tweaking most likely, to narrow down the ranges):
Sub splitSemiColons()
Dim myArray() As String
Dim colToUse As Long
colToUse = 3
myArray = Split(Range("B1"), ";")
Dim i As Long
For i = LBound(myArray) To UBound(myArray)
Cells(i + 1, colToUse).Value = myArray(i)
Next i
End Sub
Or, you can avoid this macro, and just use Data --> Text to Columns --> Use ; delimiter, then copy and paste transposed.

Regex pattern in Word 2013

I have a word document which contains 6 series of numbers (plain text, not numbered style) as following:
1) blah blah blah
2) again blah blah blah
.
.
.
20) something
And this pattern has been repeated six times. How can I used Regex and serialise all numbers before parentheses so that they start with 1 and end up with 120?
You can use VBA - add this to the ThisDocument module:
Public Sub FixNumbers()
Dim p As Paragraph
Dim i As Long
Dim realCount As Long
realCount = 1
Set p = Application.ActiveDocument.Paragraphs.First
'Iterate through paragraphs with Paragraph.Next - using For Each doesn't work and I wouldn't trust indexing since we're making changes
Do While Not p Is Nothing
digitCount = 0
For i = 1 To Len(p.Range.Text)
'Keep track of how many characters are in the number
If IsNumeric(Mid(p.Range.Text, i, 1)) Then
digitCount = digitCount + 1
Else
'We check the first non-number character we find to see if it is the list delimiter ")" and we make sure that there were some digits before it
If Mid(p.Range.Text, i, 1) = ")" And digitCount > 0 Then
'If so, we get rid of the original number and put the correct one
p.Range.Text = realCount & Right(p.Range.Text, Len(p.Range.Text) - digitCount) 'It's important to note that a side effect of assigning the text is that p is set to p.Next
'realCount holds the current "real" line number - everytime we assign a line, we increment it
realCount = realCount + 1
Exit For
Else
'If not, we skip the line assuming it's not part of the list numbering
Set p = p.Next
Exit For
End If
End If
Next
Loop
End Sub
You can run it by clicking anywhere inside of the code and clicking the "play" button in the VBA IDE.

Add two different numbers in a single text field space separated in Access VBA

I am using Access and VBA to tidy up a database before a migration. One field is going from text to an INT. So I need to convert and possibly add some numbers which exist in a singular field.
Examples:
F/C 3 other 8 should become 11
Calender-7 should become 7
21 F/C and 1 other should become 22
29 (natural ways) should become 29
The second and fourth line are simple enough, just use the following regex in VBA
Dim rgx As New RegExp
Dim inputText As String
Dim outputText As String
rgx.Pattern = "[^0-9]*"
rgx.Global = True
inputText = "29 (natural ways)"
outputText = rgx.Replace(inputText, "")
The downside is if I use it on option 1 or 3:
F/C 3 other 8 will become 38
Calender-7 will become 7
21 F/C and 1 other will become 211
29 (natural ways) will become 29
This is simple enough in bash, I can just keep the spaces by adding one to [^0-9 ]* and then piping it into awk which will add every field using a space as a delimiter like so:
sed 's/[^0-9 ]*//g' | awk -F' ' 's=0; {for (i=1; i<=NF; i++) s=s+$i; print s}'
F/C 3 other 8 will become 11
21 F/C and 1 other will become 22
The problem is I cannot use bash, and there are far too many values to do it by hand. Is there any way to use VBA to accomplish this?
Instead of using the replace method, just capture and then add up all the numbers. For example:
Option Explicit
Function outputText(inputText)
Dim rgx As RegExp
Dim mc As MatchCollection, m As Match
Dim I As Integer
Set rgx = New RegExp
rgx.Pattern = "[0-9]+"
rgx.Global = True
Set mc = rgx.Execute(inputText)
For Each m In mc
I = I + CInt(m) 'may Need to be cast as an int in Access VBA; not required in Excel VBA
Next m
outputText = I
End Function
I'm not sure if there are any easier way for your question. Here I've wrote small function for you.
Requirement: add all numbers in a string, identify "consecutive" digits as one number.
pseudo:
Loop through given text
find the first number and check/loop if following chars are numbers
if following chars are numbers treat as one number else pass the
result
continue searching from last point and add the result to the total
in code:
Public Function ADD_NUMB(iText As String) As Long
Dim I, J As Integer
Dim T As Long
Dim TM As String
For I = 1 To Len(iText)
If (InStr(1, "12346567890", Mid$(iText, I, 1)) >= 1) Then
TM = Mid(iText, I, 1)
For J = I + 1 To Len(iText)
If (InStr(1, "12346567890", Mid$(iText, J, 1)) >= 1) Then
TM = TM & Mid$(iText, J, 1)
Else
Exit For
End If
Next J
T = T + Val(Nz(TM, 0))
I = J
End If
Next I
ADD_NUMB = T
End Function
usage:
dim total as integer
total = ADD_NUMB("21 F/C and 1 other")
not sure about performance but it will get you what you need :)

VBA code for extracting 3 specific number patterns

I am working in excel and need VBA code to extract 3 specific number patterns. In column A I have several rows of strings which include alphabetical characters, numbers, and punctuation. I need to remove all characters except those found in a 13-digit number (containing only numbers), a ten-digit number (containing only numbers), or a 9-digit number immediately followed by an "x" character. These are isbn numbers.
The remaining characters should be separated by one, and only one, space. So, for the following string found in A1: "There are several books here, including 0192145789 and 9781245687456. Also, the book with isbn 045789541x is included. This book is one of 100000000 copies."
The output should be: 0192145789 9781245687456 045789541x
Note that the number 100000000 should not be included in the output because it does not match any of the three patterns mentioned above.
I'm not opposed to a excel formula solution as opposed to VBA, but I assumed that VBA would be cleaner. Thanks in advance.
Here's a VBA function that will do specifically what you've specified
Function ExtractNumbers(inputStr As String) As String
Dim outputStr As String
Dim bNumDetected As Boolean
Dim numCount As Integer
Dim numStart As Integer
numCount = 0
bNumDetected = False
For i = 1 To Len(inputStr)
If IsNumeric(Mid(inputStr, i, 1)) Then
numCount = numCount + 1
If Not bNumDetected Then
bNumDetected = True
bNumStart = i
End If
If (numCount = 9 And Mid(inputStr, i + 1, 1) = "x") Or _
numCount = 13 And Not IsNumeric(Mid(inputStr, i + 1, 1)) Or _
numCount = 10 And Not IsNumeric(Mid(inputStr, i + 1, 1)) Then
If numCount = 9 Then
outputStr = outputStr & Mid(inputStr, bNumStart, numCount) & "x "
Else
outputStr = outputStr & Mid(inputStr, bNumStart, numCount) & " "
End If
End If
Else
numCount = 0
bNumDetected = False
End If
Next i
ExtractNumbers = Trim(outputStr)
End Function
It's nothing fancy, just uses string functions to goes through your string one character at a time looking for sections of 9 digit numbers ending with x, 10 digit numbers and 13 digit numbers and extracts them into a new string.
It's a UDF so you can use it as a formula in your workbook