Regex to move cells based on leading spaces - regex

Ill start by saying that I'm not a coder, only someone who very rarely dabbles to make spreadsheets slightly more bearable.
I currently have some data that I need to break out into columns based on the number of leading spaces in the cell. Basically, if the cell begins with 2 spaces move it 1 column to the right, If there are 3 spaces, move it 2 columns to the right and so on.
I realised that I would need to use regex for this as FIND and LEFT would match all of the 3 space cells when searching for 2 space cells.
So I searched around and cobbled together this mess
Sub MoveStuff()
Dim RE as Object
Dim LSearchRow As Long
Dim LCopyToColumn As Long
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = " (a-zA-Z)"
LSearchRow = 2
While Len(Cells(LSearchRow, "B").Value) > 0
If RE.Test(Cells(LSearchRow, "B").Value) Then
Up to here, it will match correctly, but I don't know how to get it to shift the cell over. Then I'll obviously need to have multiple RE.Patterns and If statements to match 3 and 4 space cells

A general solution is the following. You count the leading spaces (let's call this value N), then remove them from your cell value and copy the cell N column on the right.
Public Sub movestuff()
Dim curr_row, curr_column, s
curr_column = 2 'COLUMN "B"
curr_row = 1
While (ActiveSheet.Cells(curr_row, curr_column) <> "")
s = ActiveSheet.Cells(curr_row, curr_column)
For x = 1 To Len(s) Step 1
If Mid(s, x, 1) <> " " Then
Exit For
End If
Next
s = Mid(s, x)
ActiveSheet.Cells(curr_row, curr_column + (x - 1)) = s
curr_row = curr_row + 1
Wend
End Sub

Related

Removing leading whitespace using VBA

I am trying to remove leading whitespace from a word " 00000000000000231647300000000002KK".
Below is my VBA code
Option Explicit
Sub myfunction()
Dim getarray, getarray1 As Variant
Dim Text As String
Dim RegularText
getarray = Sheets("Sheet1").Range("A1:A4").Value
getarray1 = getarray
Set RegularText = New regexp
RegularText.Global = True
RegularText.MultiLine = True
RegularText.Pattern = "(^\\s+)"
Text = CStr(getarray(1, 1))
getarray1(1, 1) = RegularText.Replace(getarray(1, 1), "")
Sheets("Sheet1").Range("B1:B4").Value = getarray1
End Sub
However above code fails to remove the leading whitespace from my word.
Below is the excel workbook with result and above code
https://easyupload.io/jv6n2p
If you could help to understand why my code is failing to remove leading whitespace, it will be very helpful.
Thanks for your time
There are a few things wrong with the original code.
RegularText.Pattern = "(^\\s+)"
Explanations from regex101.com.
(^\\s+) pattern:
Basically, the first backslash is escaping the second backslash. This tells the RegEx to treat the second \ as a normal character. (^\\s+) is grouping leading \s characters together not whitespace.
(^\s+) pattern:
RegularText.MultiLine = True
The MultiLine property indicates every line in a value should be searched not row in an array. This doesn't seem to be the intended result. So set it to false.
`RegularText.MultiLine = False`
Range("A1:A4").Value is 1 row by 4 columns and Range("B1:B4") is 1 column by 4 rows. In my examples I will use Range("A2:D2") for simplicity.
Sub RegExRemoveTrailingSpace()
Dim Data As Variant
Data = Sheets("Sheet1").Range("A1:A4").Value
Dim RegularText As New RegExp
RegularText.Global = False
RegularText.Pattern = "(^\s+)"
[b4] = RegularText.Replace([A1], "")
Dim r As Long, c As Long
For r = 1 To UBound(Data)
For c = 1 To UBound(Data, 2)
Data(r, c) = RegularText.Replace(Data(r, c), "")
Next
Next
Sheets("Sheet1").Range("A2:D2").Value = Data
End Sub
We could just use LTrim() to remove the leading spaces from the string.
Sub LTrimTrailingSpace()
Dim Data As Variant
Data = Sheets("Sheet1").Range("A1:A4").Value
Dim r As Long, c As Long
For r = 1 To UBound(Data)
For c = 1 To UBound(Data, 2)
Data(r, c) = LTrim(Data(r, c))
Next
Next
Sheets("Sheet1").Range("A2:D2").Value = Data
End Sub

Check string has a date in it and extract part of the string

I have thousands of lines of text that I need to work through and the lines I am interested with lines that look like the following:
01/04/2019 09:35:41 - Test user (Additional Comments)
I am currently using this code to filter out all the other rows:
If InStr(FullCell(i), " - ") <> 0 And InStr(FullCell(i), ":") <> 0 And InStr(FullCell(i), "(") <> 0 Then
FullCell is the array that I am working through.
which I know is not the best way to do it. Is there a way to check that there is a date at the beginning of the string in the format dd/mm/yyyy and then extract the user name inbetween the '-' and the '(' symbol.
I had a play with regex to see if that could help but i'm limited in skills to be able to pull off both VBA and regex in the same code.
Whats the best way to do this.
Assuming Fullcell(i) contains the string,
If Left(Fullcell(i), 10) Like "##/##/####"
Will return True if you have a date (note that it will not differentiate between dd/mm/yyyy and mm/dd/yyyy.
And
Mid(Fullcell(i), InStr(Fullcell(i), " - ") + 2, InStr(Fullcell(i), " (") - InStr(Fullcell(i), " - ") - 2)
Will return the username
I'm sure there is a more efficient way to do this, but I've used the following solution quite a few times:
This will select the date:
x = 1
Do While Mid(FullCell,1,x) <> " "
x = x + 1
Loop
strDate = Left(FullCell,x)
This will find the character number of the hyphen, the username starts 2 characters after.
x = 1
Do While Mid(FullCell,x,1) <> "-"
x = x + 1
Loop
Then we will find the end of the username
y = x + 2
Do While Mid(FullCell,y,1) <> " "
y = y + 1
Loop
The username should now be characters (x+2 to y-1)
strUsername = Mid(FullCell, x + 2, y - (x + 2) - 1)
Here's how I would do it
Dim your variables
Dim ring as Range
Dim dat as variant
Dim FullCell() as string
Dim User as string
Dim I as long
Set your range
Set rng = ` any way you choose
Dat = rng.value2
Loop dat
For i = 1 to UBound(dat, 1)
Split the data
FullCell = Trim(Split(FullCell, "-"))
Test if it split
If UBound(FullCell) > 0 Then
Test if it matches
If IsDate(FullCell(0)) Then
i = Instr(FullCell(1), "(")-1)
If i then
User = left$(FullCell(1), i)
' Found a user
End If
End If
End If
Next
Abstraction is your friend, it's always helpful to break these into their own private functions whenever you can. You could put your code in a function and call it something like ExtractUsername.
Below I did an example of this, and I decided to go with the RegExp approach (late binding), but you could use string functions like the examples above as well.
This function returns the username if it finds the pattern you mentioned above, otherwise, it returns an empty string.
Private Function ExtractUsername(ByVal SourceString As String) As String
Dim RegEx As Object
Set RegEx = CreateObject("vbscript.regexp")
'(FIRST GROUP FINDS THE DATE FORMATTED AS DD/MM/YYY, AS WELL AS THE FORWARD SLASH)
'(SECOND GROUP FINDS THE USERNAME) THIS WILL BE SUBMATCH 1
With RegEx
.Pattern = "(^\d{2}\/\d{2}\/\d{4}.*-)(.+)(\()"
.Global = True
End With
Dim Match As Object
Set Match = RegEx.Execute(SourceString)
'ONLY RETURN IF A MATCH WAS FOUND
If Match.Count > 0 Then
ExtractUsername = Trim(Match(0).SubMatches(1))
End If
Set RegEx = Nothing
End Function
The regex pattern is grouped into three parts, the date (and slash), username, and opening parentheses. What you are interested in is the username, which in the SubMatch would be number 1.
Regexr is a helpful site for practicing regular expressions and can show you a bit more of what the pattern I went with is doing.
Please note that using regular expressions might give you performance issues and you should test it against regular string functions to see what works best for your situation.

Extracting specific words from a single cell containing text string

Basically I have a very long text containing multiple spaces, special characters, etc. in one cell in an excel file and I need to extract only specific words from it, each one to a seperate cell in another column.
What I'm looing for:
symbols that are always 9 characters in lenght, and always contain at least one number (up to 9).
So for an example in A1 I have:
euhe: djj33 dkdakofja. kaowdk ---------- jffjbrjjjj j jrjj 08/01/2222 999ABC123
fjfjfj 321XXX888 .... ........ 123456789AA
And in the end I want to have:
999ABC123 in B1
and
321XXX888 in B2.
Right now I'm doing this by using Text to columns feature and then just looking for specific words manually but sometimes the volume is so big it takes too much time and would be cool to automate this.
Can anyone help with this? Thank you!
EDIT:
More examples:
INPUT: '10/01/2016 1,060X 8.999%!!! 1.33 0.666 928888XE0'
OUTPUT: '928888XE0'
INPUT: 'ABCDEBATX ..... ,,00,001% 20///^^ addcA7 7777a 123456789 djaoij8888888 0.000001 12#'
OUTPUT: '123456789'
INPUT: 'FAR687465 B22222222 __ djj^66 20/20/20/20 1:'
OUTPUT: 'FAR687465' in B1 'B22222222' in B2
INPUT: 'fil476 .00 20/.. BUT AAAAAAAAA k98776 000.0001'
OUTPUT: 'blank'
To clarify: the 9 character string can be anywhere, there is no rule what is before or after them, they can be next to each other, or just at the beginning and end of this wall of text, no rules here, the text is random, taken out of some system, can contain dates, etc anything... The symbols are always 9 characters long and they are not the only 9 character symbols in the text. I call them symbols but they should only consist of numbers and letters. Can be only numbers, but never only letters. A1 cell can contain multiple spaces/tabs between words/symbols.
Also if possible to do this not only for A1, but the whole column A until it finds the first blank cell.
Try this code
Sub Test()
Dim r As Range
Dim i As Long
Dim m As Long
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "\b[a-zA-Z\d]{9}\b"
For Each r In Range("A1", Range("A" & Rows.Count).End(xlUp))
If .Test(r.Value) Then
For i = 0 To .Execute(r.Value).Count - 1
If CBool(.Execute(r.Value)(i) Like "*[0-9]*") Then
m = IIf(Cells(1, 2).Value = "", 1, Cells(Rows.Count, 2).End(xlUp).Row + 1)
Cells(m, 2).Value = .Execute(r.Value)(i)
End If
Next i
End If
Next r
End With
End Sub
This bit of code is almost it... just need to check the strings... but excel crashes on the Str line of code
Sub Test()
Dim Outputs, i As Integer, LastRow As Long, Prueba, Prueba2
Outputs = Split(Range("A1"), " ")
For i = 0 To UBound(Outputs)
If Len(Outputs(i)) = 9 Then
Prueba = 0
Prueba2 = 0
On Error Resume Next
Prueba = Val(Outputs(i))
Prueba2 = Str(Outputs(i))
On Error GoTo 0
If Prueba <> 0 And Prueba2 <> 0 Then
LastRow = Range("B10000").End(xlUp).Row + 1
Cells(LastRow, 2) = Outputs(i)
End If
End If
Next i
End Sub
If someone could help to set the string check.. that would do the thing I guess.

Regex pattern in Word 2013

I have a word document which contains 6 series of numbers (plain text, not numbered style) as following:
1) blah blah blah
2) again blah blah blah
.
.
.
20) something
And this pattern has been repeated six times. How can I used Regex and serialise all numbers before parentheses so that they start with 1 and end up with 120?
You can use VBA - add this to the ThisDocument module:
Public Sub FixNumbers()
Dim p As Paragraph
Dim i As Long
Dim realCount As Long
realCount = 1
Set p = Application.ActiveDocument.Paragraphs.First
'Iterate through paragraphs with Paragraph.Next - using For Each doesn't work and I wouldn't trust indexing since we're making changes
Do While Not p Is Nothing
digitCount = 0
For i = 1 To Len(p.Range.Text)
'Keep track of how many characters are in the number
If IsNumeric(Mid(p.Range.Text, i, 1)) Then
digitCount = digitCount + 1
Else
'We check the first non-number character we find to see if it is the list delimiter ")" and we make sure that there were some digits before it
If Mid(p.Range.Text, i, 1) = ")" And digitCount > 0 Then
'If so, we get rid of the original number and put the correct one
p.Range.Text = realCount & Right(p.Range.Text, Len(p.Range.Text) - digitCount) 'It's important to note that a side effect of assigning the text is that p is set to p.Next
'realCount holds the current "real" line number - everytime we assign a line, we increment it
realCount = realCount + 1
Exit For
Else
'If not, we skip the line assuming it's not part of the list numbering
Set p = p.Next
Exit For
End If
End If
Next
Loop
End Sub
You can run it by clicking anywhere inside of the code and clicking the "play" button in the VBA IDE.

VBA code for extracting 3 specific number patterns

I am working in excel and need VBA code to extract 3 specific number patterns. In column A I have several rows of strings which include alphabetical characters, numbers, and punctuation. I need to remove all characters except those found in a 13-digit number (containing only numbers), a ten-digit number (containing only numbers), or a 9-digit number immediately followed by an "x" character. These are isbn numbers.
The remaining characters should be separated by one, and only one, space. So, for the following string found in A1: "There are several books here, including 0192145789 and 9781245687456. Also, the book with isbn 045789541x is included. This book is one of 100000000 copies."
The output should be: 0192145789 9781245687456 045789541x
Note that the number 100000000 should not be included in the output because it does not match any of the three patterns mentioned above.
I'm not opposed to a excel formula solution as opposed to VBA, but I assumed that VBA would be cleaner. Thanks in advance.
Here's a VBA function that will do specifically what you've specified
Function ExtractNumbers(inputStr As String) As String
Dim outputStr As String
Dim bNumDetected As Boolean
Dim numCount As Integer
Dim numStart As Integer
numCount = 0
bNumDetected = False
For i = 1 To Len(inputStr)
If IsNumeric(Mid(inputStr, i, 1)) Then
numCount = numCount + 1
If Not bNumDetected Then
bNumDetected = True
bNumStart = i
End If
If (numCount = 9 And Mid(inputStr, i + 1, 1) = "x") Or _
numCount = 13 And Not IsNumeric(Mid(inputStr, i + 1, 1)) Or _
numCount = 10 And Not IsNumeric(Mid(inputStr, i + 1, 1)) Then
If numCount = 9 Then
outputStr = outputStr & Mid(inputStr, bNumStart, numCount) & "x "
Else
outputStr = outputStr & Mid(inputStr, bNumStart, numCount) & " "
End If
End If
Else
numCount = 0
bNumDetected = False
End If
Next i
ExtractNumbers = Trim(outputStr)
End Function
It's nothing fancy, just uses string functions to goes through your string one character at a time looking for sections of 9 digit numbers ending with x, 10 digit numbers and 13 digit numbers and extracts them into a new string.
It's a UDF so you can use it as a formula in your workbook