Regex pattern in Word 2013 - regex

I have a word document which contains 6 series of numbers (plain text, not numbered style) as following:
1) blah blah blah
2) again blah blah blah
.
.
.
20) something
And this pattern has been repeated six times. How can I used Regex and serialise all numbers before parentheses so that they start with 1 and end up with 120?

You can use VBA - add this to the ThisDocument module:
Public Sub FixNumbers()
Dim p As Paragraph
Dim i As Long
Dim realCount As Long
realCount = 1
Set p = Application.ActiveDocument.Paragraphs.First
'Iterate through paragraphs with Paragraph.Next - using For Each doesn't work and I wouldn't trust indexing since we're making changes
Do While Not p Is Nothing
digitCount = 0
For i = 1 To Len(p.Range.Text)
'Keep track of how many characters are in the number
If IsNumeric(Mid(p.Range.Text, i, 1)) Then
digitCount = digitCount + 1
Else
'We check the first non-number character we find to see if it is the list delimiter ")" and we make sure that there were some digits before it
If Mid(p.Range.Text, i, 1) = ")" And digitCount > 0 Then
'If so, we get rid of the original number and put the correct one
p.Range.Text = realCount & Right(p.Range.Text, Len(p.Range.Text) - digitCount) 'It's important to note that a side effect of assigning the text is that p is set to p.Next
'realCount holds the current "real" line number - everytime we assign a line, we increment it
realCount = realCount + 1
Exit For
Else
'If not, we skip the line assuming it's not part of the list numbering
Set p = p.Next
Exit For
End If
End If
Next
Loop
End Sub
You can run it by clicking anywhere inside of the code and clicking the "play" button in the VBA IDE.

Related

Regex to move cells based on leading spaces

Ill start by saying that I'm not a coder, only someone who very rarely dabbles to make spreadsheets slightly more bearable.
I currently have some data that I need to break out into columns based on the number of leading spaces in the cell. Basically, if the cell begins with 2 spaces move it 1 column to the right, If there are 3 spaces, move it 2 columns to the right and so on.
I realised that I would need to use regex for this as FIND and LEFT would match all of the 3 space cells when searching for 2 space cells.
So I searched around and cobbled together this mess
Sub MoveStuff()
Dim RE as Object
Dim LSearchRow As Long
Dim LCopyToColumn As Long
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = " (a-zA-Z)"
LSearchRow = 2
While Len(Cells(LSearchRow, "B").Value) > 0
If RE.Test(Cells(LSearchRow, "B").Value) Then
Up to here, it will match correctly, but I don't know how to get it to shift the cell over. Then I'll obviously need to have multiple RE.Patterns and If statements to match 3 and 4 space cells
A general solution is the following. You count the leading spaces (let's call this value N), then remove them from your cell value and copy the cell N column on the right.
Public Sub movestuff()
Dim curr_row, curr_column, s
curr_column = 2 'COLUMN "B"
curr_row = 1
While (ActiveSheet.Cells(curr_row, curr_column) <> "")
s = ActiveSheet.Cells(curr_row, curr_column)
For x = 1 To Len(s) Step 1
If Mid(s, x, 1) <> " " Then
Exit For
End If
Next
s = Mid(s, x)
ActiveSheet.Cells(curr_row, curr_column + (x - 1)) = s
curr_row = curr_row + 1
Wend
End Sub

Extracting specific words from a single cell containing text string

Basically I have a very long text containing multiple spaces, special characters, etc. in one cell in an excel file and I need to extract only specific words from it, each one to a seperate cell in another column.
What I'm looing for:
symbols that are always 9 characters in lenght, and always contain at least one number (up to 9).
So for an example in A1 I have:
euhe: djj33 dkdakofja. kaowdk ---------- jffjbrjjjj j jrjj 08/01/2222 999ABC123
fjfjfj 321XXX888 .... ........ 123456789AA
And in the end I want to have:
999ABC123 in B1
and
321XXX888 in B2.
Right now I'm doing this by using Text to columns feature and then just looking for specific words manually but sometimes the volume is so big it takes too much time and would be cool to automate this.
Can anyone help with this? Thank you!
EDIT:
More examples:
INPUT: '10/01/2016 1,060X 8.999%!!! 1.33 0.666 928888XE0'
OUTPUT: '928888XE0'
INPUT: 'ABCDEBATX ..... ,,00,001% 20///^^ addcA7 7777a 123456789 djaoij8888888 0.000001 12#'
OUTPUT: '123456789'
INPUT: 'FAR687465 B22222222 __ djj^66 20/20/20/20 1:'
OUTPUT: 'FAR687465' in B1 'B22222222' in B2
INPUT: 'fil476 .00 20/.. BUT AAAAAAAAA k98776 000.0001'
OUTPUT: 'blank'
To clarify: the 9 character string can be anywhere, there is no rule what is before or after them, they can be next to each other, or just at the beginning and end of this wall of text, no rules here, the text is random, taken out of some system, can contain dates, etc anything... The symbols are always 9 characters long and they are not the only 9 character symbols in the text. I call them symbols but they should only consist of numbers and letters. Can be only numbers, but never only letters. A1 cell can contain multiple spaces/tabs between words/symbols.
Also if possible to do this not only for A1, but the whole column A until it finds the first blank cell.
Try this code
Sub Test()
Dim r As Range
Dim i As Long
Dim m As Long
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "\b[a-zA-Z\d]{9}\b"
For Each r In Range("A1", Range("A" & Rows.Count).End(xlUp))
If .Test(r.Value) Then
For i = 0 To .Execute(r.Value).Count - 1
If CBool(.Execute(r.Value)(i) Like "*[0-9]*") Then
m = IIf(Cells(1, 2).Value = "", 1, Cells(Rows.Count, 2).End(xlUp).Row + 1)
Cells(m, 2).Value = .Execute(r.Value)(i)
End If
Next i
End If
Next r
End With
End Sub
This bit of code is almost it... just need to check the strings... but excel crashes on the Str line of code
Sub Test()
Dim Outputs, i As Integer, LastRow As Long, Prueba, Prueba2
Outputs = Split(Range("A1"), " ")
For i = 0 To UBound(Outputs)
If Len(Outputs(i)) = 9 Then
Prueba = 0
Prueba2 = 0
On Error Resume Next
Prueba = Val(Outputs(i))
Prueba2 = Str(Outputs(i))
On Error GoTo 0
If Prueba <> 0 And Prueba2 <> 0 Then
LastRow = Range("B10000").End(xlUp).Row + 1
Cells(LastRow, 2) = Outputs(i)
End If
End If
Next i
End Sub
If someone could help to set the string check.. that would do the thing I guess.

Remove everything but numbers from a cell

I have an excel sheet where i use the follwoing command to get numbers from a cell that contains a form text:
=MID(D2;SEARCH("number";D2)+6;13)
It searches for the string "number" and gets the next 13 characters that comes after it. But some times the results get more than the number due to the fact these texts within the cells do not have a pattern, like the example below:
62999999990
21999999990
11999999990
6299999993) (
17999999999)
21914714753)
58741236714 P
18888888820
How do i avoid taking anything but numbers OR how do i remove everything but numbers from what i get?
You can user this User Defined Function (UDF) that will get only the numbers inside a specific cell.
Code:
Function only_numbers(strSearch As String) As String
Dim i As Integer, tempVal As String
For i = 1 To Len(strSearch)
If IsNumeric(Mid(strSearch, i, 1)) Then
tempVal = tempVal + Mid(strSearch, i, 1)
End If
Next
only_numbers = tempVal
End Function
To use it, you must:
Press ALT + F11
Insert new Module
Paste code inside Module window
Now you can use the formula =only_numbers(A1) at your spreadsheet, by changing A1 to your data location.
Example Images:
Inserting code at module window:
Executing the function
Ps.: if you want to delimit the number of digits to 13, you can change the last line of code from:
only_numbers = tempVal
to
only_numbers = Left(tempVal, 13)
Alternatively you can take a look a this topic to understand how to achieve this using formulas.
If you are going to go to a User Defined Function (aka UDF) then perform all of the actions; don't rely on the preliminary worksheet formula to pass a stripped number and possible suffix text to the UDF.
In a standard code module as,
Function udfJustNumber(str As String, _
Optional delim As String = "number", _
Optional startat As Long = 1, _
Optional digits As Long = 13, _
Optional bCaseSensitive As Boolean = False, _
Optional bNumericReturn As Boolean = True)
Dim c As Long
udfJustNumber = vbNullString
str = Trim(Mid(str, InStr(startat, str, delim, IIf(bCaseSensitive, vbBinaryCompare, vbTextCompare)) + Len(delim), digits))
For c = 1 To Len(str)
Select Case Asc(Mid(str, c, 1))
Case 32
'do nothing- skip over
Case 48 To 57
If bNumericReturn Then
udfJustNumber = Val(udfJustNumber & Mid(str, c, 1))
Else
udfJustNumber = udfJustNumber & Mid(str, c, 1)
End If
Case Else
Exit For
End Select
Next c
End Function
I've used your narrative to add several optional parameters. You can change these if your circumstances change. Most notable is whether to return a true number or text-that-looks-like-a-number with the bNumericReturn option. Note that the returned values are right-aligned as true numbers should be in the following supplied image.
By supplying FALSE to the sixth parameter, the returned content is text-that-looks-like-a-number and is now left-aligned in the worksheet cell.
If you don't want VBA and would like to use Excel Formulas only, try this one:
=SUMPRODUCT(MID(0&MID(D2,SEARCH("number",D2)+6,13),LARGE(INDEX(ISNUMBER(--MID(MID(D2,SEARCH("number",D2)+6,13),ROW($1:$13),1))* ROW($1:$13),0),ROW($1:$13))+1,1)*10^ROW($1:$13)/10)

VBA code for extracting 3 specific number patterns

I am working in excel and need VBA code to extract 3 specific number patterns. In column A I have several rows of strings which include alphabetical characters, numbers, and punctuation. I need to remove all characters except those found in a 13-digit number (containing only numbers), a ten-digit number (containing only numbers), or a 9-digit number immediately followed by an "x" character. These are isbn numbers.
The remaining characters should be separated by one, and only one, space. So, for the following string found in A1: "There are several books here, including 0192145789 and 9781245687456. Also, the book with isbn 045789541x is included. This book is one of 100000000 copies."
The output should be: 0192145789 9781245687456 045789541x
Note that the number 100000000 should not be included in the output because it does not match any of the three patterns mentioned above.
I'm not opposed to a excel formula solution as opposed to VBA, but I assumed that VBA would be cleaner. Thanks in advance.
Here's a VBA function that will do specifically what you've specified
Function ExtractNumbers(inputStr As String) As String
Dim outputStr As String
Dim bNumDetected As Boolean
Dim numCount As Integer
Dim numStart As Integer
numCount = 0
bNumDetected = False
For i = 1 To Len(inputStr)
If IsNumeric(Mid(inputStr, i, 1)) Then
numCount = numCount + 1
If Not bNumDetected Then
bNumDetected = True
bNumStart = i
End If
If (numCount = 9 And Mid(inputStr, i + 1, 1) = "x") Or _
numCount = 13 And Not IsNumeric(Mid(inputStr, i + 1, 1)) Or _
numCount = 10 And Not IsNumeric(Mid(inputStr, i + 1, 1)) Then
If numCount = 9 Then
outputStr = outputStr & Mid(inputStr, bNumStart, numCount) & "x "
Else
outputStr = outputStr & Mid(inputStr, bNumStart, numCount) & " "
End If
End If
Else
numCount = 0
bNumDetected = False
End If
Next i
ExtractNumbers = Trim(outputStr)
End Function
It's nothing fancy, just uses string functions to goes through your string one character at a time looking for sections of 9 digit numbers ending with x, 10 digit numbers and 13 digit numbers and extracts them into a new string.
It's a UDF so you can use it as a formula in your workbook

Add Semicolon to each value (each line) in a cell

I have the following values in a single cell let be A1
1234
567
454
Likewise all the A(N) are filled with values. N various from 1000 to 1500
I want this to get converted as
1234;567;454
Any shortcut available?
Edit: Sorry, had not read your questions properly...
You could write a vba-script like that:
Sub test()
Dim result As String
result = Replace(ActiveCell.value, Chr(10), ";")
ActiveCell.Offset(1, 0).Select
ActiveCell.value = result
End Sub
It will take the active cell, replace all newlines by semicolons and put the result in the next line.
Edit: Another version doing this for multiple cells:
Sub test()
Dim value As String
Do
value = ActiveCell.value
If (value = "") Then Exit Do
ActiveCell.Offset(0, 1).value = Replace(ActiveCell.value, Chr(10), ";")
ActiveCell.Offset(1, 0).Select
Loop While (True)
End Sub
This version will start at the active cell, and loop through all cell below until it finds an empty cell.
The replaced value is written into the cell next to the original one. If you want to replace the original value, remove .Offset(0, 1).
The second parameter is the value to be replaced, it's Chr(10), the Newline character in our case.