Split string on single forward slashes with RegExp - regex

edit: wow, thanks for so many suggestions, but I wanted to have a regexp solution specifically for future, more complex use.
I need support with splitting text string in VBA Excel. I looked around but solutions are either for other languages or I can't make it work in VBA.
I want to split words by single slashes only:
text1/text2- split
text1//text2- no split
text1/text2//text3 - split after text1
I tried using regexp.split function, but don't think it works in VBA. When it comes to pattern I was thinking something like below:
(?i)(?:(?<!\/)\/(?!\/))
but I also get error when executing search in my macro while it works on sites like: https://www.myregextester.com/index.php#sourcetab

You can use a RegExp match approach rather than split one. You need to match any character other than / or double // to grab the values you need.
Here is a "wrapped" (i.e. with alternation) version of the regex:
(?:[^/]|//)+
Here is a demo
And here is a more efficient, but less readable:
[^/]+(?://[^/]*)*
See another demo
Here is a working VBA code:
Sub GetMatches(ByRef str As String, ByRef coll As collection)
Dim rExp As Object, rMatch As Object
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = True
.pattern = "(?:[^/]|//)+"
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
For Each r_item In rMatch
coll.Add r_item.Value
Debug.Print r_item.Value
Next r_item
End If
Debug.Print ""
End Sub
Call the sub as follows:
Dim matches As New collection
Set matches = New collection
GetMatches str:="text1/text2", coll:=matches
Here are the results for the 3 strings above:
1. text1/text2
text1
text2
2. text1/text2//text3
text1
text2//text3
3. text1//text2
text1//text2

Public Sub customSplit()
Dim v As Variant
v = Split("text1/text2//text3", "/")
v = Replace(Join(v, ","), ",,", "//")
Debug.Print v '-> "text1,text2//text3"
End Sub
or
Replace(Replace("text1/text2//text3", "/", ","), ",,", "//") '-> "text1,text2//text3"

Go to Data tab, then Text to Columns option. Later, choose "Delimited" option and then select "other" and put any delimiter you want.

Text to columns will work. Another option, if you want to keep the original value, is to use formulas:
in B1
=left(a1,find(":",a1)-1)
in C1
=mid(a1,find(":",a1)+1,len(a1))

Related

VBA Word Wildcards - finding shortest possible set of characters

I have trouble finding working solution for couple of hours now. I hope you will help me.
My problem:
I need to find and select in Word a whole sentence after providing the starting and ending strings of particular sentence.
For example, when my starting string is "People" and ending string is "apples." I expect Word to select the whole "People like red apples." sentence in my document. (If such a sentence exists)
For this purpose I prepared a macro which works almost like I want. The only problem is that it doesn't select the smallest possible set of characters (which I want it to do). To make it clear let's assume I have this text in my document: People like smoking. People like red apples.
Now, when I provide the starting and ending strings to the macro respectively as "People" and "apples.", it selects all the text, which contains 2 sentences mentioned above. That is my problem: I wanted it to select only the second sentence (People like red apples.), not both of them, even though they start with the same word. So, basically, I always want to select the shortest possible set of characters (which in this case is only the last sentence).
Here is a part of my macro in VBA:
`text_str = startStr & "*" & endStr
With Application.Selection.Find
.ClearFormatting
.Forward = True
.Wrap = wdFindContinue
.Text = text_str
.MatchWildcards = True
.MatchCase = True
.Execute
End With
I know the problem is with the Wildcards (or very limited set of regular expressions), so I also tried something like this as the search string:
text_str = "(" & startStr & "*){1}" & endStr
It also didn't help. I'm stuck here. :/
Thanks for any suggestions!
Selection.Find has something similar to regular expressions,
but in this case you must use real regular expressions.
The pattern (in this particular case) should be:
People[^.]+apples\.
I wrote an example macro, which:
Selects the whole text in the document and assigns it to src
variable (searched by the regex).
Sets the cursor at the beginning of the document.
Checks whether the pattern can be matched (regEx.Test).
Executes the regex.
Assigns the matched string to ret variable.
Displays it in a message box.
Below you have a complete macro. Probably you should change it to
select (find) the text matched (instead of the message box).
Sub Re()
Dim startStr As String: startStr = "People"
Dim endStr As String: endStr = "apples"
Dim pattern As String: pattern = startStr & "[^.]+" & endStr & "\."
Dim regEx As New RegExp
Dim src As String
Dim ret As String
Dim colMatches As MatchCollection
ActiveDocument.Range.Select
src = ActiveDocument.Range.Text
Selection.StartOf
regEx.pattern = pattern
If (regEx.Test(src)) Then
Set colMatches = regEx.Execute(src)
ret = "Match: " & colMatches(0).Value
Else
ret = "Matching Failed"
End If
MsgBox ret, vbOKOnly, "Result"
End Sub

Extract text using word VBA regex then save to a variable as string

I am trying to create code in Word VBA that will automatically save (as PDF) and name a document based on it's content, which is in text and not fields. Luckily the formatting is standardized and I already know how to save it. I tested my regex elsewhere to make sure it pulls what I am looking for. The trouble is I need to extract the matched statement, convert it to a string, and save it to an object (so I have something to pass on to the code where it names the document).
The part of the document I need to match is below, from the start of "Program" through the end of the line and looks like:
Program: Program Name (abr)
and the regex I worked out for this is "Program:[^\n]"
The code I have so far is below, but I don't know how to execute the regex in the active document, convert the output to a string and save to an object:
Sub RegExProgram()
Dim regEx
Dim pattern As String
Set regEx = CreateObject("VBScript.RegExp")
regEx.IgnoreCase = True
regEx.Global = False
regEx.pattern = "Program\:[^\n]"
(missing code here)
End Sub
Any ideas are welcome, and I am sorry if this is simple and I am just overlooking something obvious. This is my first VBA project, and most of the resources I can find suggest replacing using regex, not saving extracted text as string. Thank you!
Try this:
You can find documentation for the RegExp class here.
Dim regEx as Object
Dim matchCollection As Object
Dim extractedString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.IgnoreCase = True
.Global = False ' Only look for 1 match; False is actually the default.
.Pattern = "Program: ([^\r]+)" ' Word separates lines with CR (\r)
End With
' Pass the text of your document as the text to search through to regEx.Execute().
' For a quick test of this statement, pass "Program: Program Name (abr)"
set matchCollection = regEx.Execute(ActiveDocument.Content.Text)
' Extract the first submatch's (capture group's) value -
' e.g., "Program Name (abr)" - and assign it to variable extractedString.
extractedString = matchCollection(0).SubMatches(0)
I've modified your regex based on the assumption that you want to capture everything after Program: through the end of the line; your original regex would only have captured Program:<space>.
Enclosing [^\r]+ (all chars. through the end of the line) in (...) defines a so-called subexpression (a.k.a. capture group), which allows selective extraction of only the substring of interest from what the overall pattern captures.
The .Execute() method, to which you pass the string to search in, always returns a collection of matches (Match objects).
Since the .Global property is set to False in your code, the output collection has (at most) 1 entry (at index 0) in this case.
If the regular expression has subexpressions (1 in our case), then each entry of the match collection has a nonempty .SubMatches collection, with one entry for each subexpression, but note that the .SubMatches entries are strings, not Match objects.
Match objects have properties .FirstIndex, .Length, and Value (the captured string). Since the .Value property is the default property, it is sufficient to access the object itself, without needing to reference the .Value property (e.g., instead of the more verbose matchCollection(0).Value to access the captured string (in full), you can use shortcut matchCollection(0) (again, by contrast, .SubMatches entries are strings only).
If you're just looking for a string that starts with "Program:" and want to go to the end of the line from there, you don't need a regular expression:
Public Sub ReadDocument()
Dim aLine As Paragraph
Dim aLineText As String
Dim start As Long
For Each aLine In ActiveDocument.Paragraphs
aLineText = aLine.Range.Text
start = InStr(aLineText, "Program:")
If start > 0 Then
my_str = Mid(aLineText, start)
End If
Next aLine
End Sub
This reads through the document line by line, and stores your match in the variable "my_str" when it encounters a line that has the match.
Lazier version:
a = Split(ActiveDocument.Range.Text, "Program:")
If UBound(a) > 0 Then
extractedString = Trim(Split(a(1), vbCr)(0))
End If
If I remember correctly, paragraphs in Word end with vbCr ( \r not \n )

Find specific instance of a match in string using RegEx

I am very new to RegEx and I can't seem to find what I looking for. I have a string such as:
[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]
and I want to get everything within the first set of brackets as well as the second set of brackets. If there is a way that I can do this with one pattern so that I can just loop through the matches, that would be great. If not, thats fine. I just need to be able to get the different sections of text separately. So far, the following is all I have come up with, but it just returns the whole string minus the first opening bracket and the last closing bracket:
[\[-\]]
(Note: I'm using the replace function, so this might be the reverse of what you are expecting.)
In my research, I have discovered that there are different RegEx engines. I'm not sure the name of the one that I'm using, but I'm using it in MS Access.
If you're using Access, you can use the VBScript Regular Expressions Library to do this. For example:
Const SOME_TEXT = "[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]"
Dim re
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.Pattern = "\[([^\]]+)\]"
Dim m As Object
For Each m In re.Execute(SOME_TEXT)
Debug.Print m.Submatches(0)
Next
Output:
cmdSubmitToDatacenter_Click
Form_frm_bk_UnsubmittedWires
Here is what I ended up using as it made it easier to get the individual values returned. I set a reference to the Microsoft VBScript Regular Expression 5.5 so that I could get Intellisense help.
Public Sub GetText(strInput As String)
Dim regex As RegExp
Dim colMatches As MatchCollection
Dim strModule As String
Dim strProcedure As String
Set regex = New RegExp
With regex
.Global = True
.Pattern = "\[([^\]]+)\]"
End With
Set colMatches = regex.Execute(strInput)
With colMatches
strProcedure = .Item(0).submatches.Item(0)
strModule = .Item(1).submatches.Item(0)
End With
Debug.Print "Module: " & strModule
Debug.Print "Procedure: " & strProcedure
Set regex = Nothing
End Sub

Remove tweet regular expressions from string of text

I have an excel sheet filled with tweets. There are several entries which contain #blah type of strings among other. I need to keep the rest of the text and remove the #blah part. For example: "#villos hey dude" needs to be transformed into : "hey dude". This is what i ve done so far.
Sub Macro1()
'
' Macro1 Macro
'
Dim counter As Integer
Dim strIN As String
Dim newstring As String
For counter = 1 To 46
Cells(counter, "E").Select
ActiveCell.FormulaR1C1 = strIN
StripChars (strIN)
newstring = StripChars(strIN)
ActiveCell.FormulaR1C1 = StripChars(strIN)
Next counter
End Sub
Function StripChars(strIN As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "^#?(\w){1,15}$"
.ignorecase = True
StripChars = .Replace(strIN, vbNullString)
End With
End Function
Moreover there are also entries like this one: Ÿ³é‡ï¼Ÿã€€åˆã‚ã¦çŸ¥ã‚Šã¾ã—ãŸã€‚ shiftã—ãªãŒã‚‰ã‚¨ã‚¯ã‚¹ãƒ
I need them gone too! Ideas?
For every line in the spreadsheet run the following regex on it: ^(#.+?)\s+?(.*)$
If the line matches the regex, the information you will be interested in will be in the second capturing group. (Usually zero indexed but position 0 will contain the entire match). The first capturing group will contain the twitter handle if you need that too.
Regex demo here.
However, this will not match tweets that are not replies (starting with #). In this situation the only way to distinguish between regular tweets and the junk you are not interested in is to restrict the tweet to alphanumerics - but this may mean some tweets are missed if they contain any non-alphanumerical characters. The following regex will work if that is not an issue for you:
^(?:(#.+?)\s+?)?([\w\t ]+)$
Demo 2.

excel regex end of line

I am looking for a regex for excel 2007 that can replace all instances of -3 ONLY at the end of the string, replacing it with absolutely nothing (removing it). There are instances of -3 throughout the strings, however I need to remove only the ones at the end. This is being integrated into a macro, so find and replace using a single regex is preferred.
You can do this without Regex by using VBA's Instr function. Here is the code:
Sub ReplaceIt()
Dim myRng As Range
myRange = Range("A1") ' change as needed
If InStr(Len(myRange.Text) - 2, myRange.Text, "-3") > 0 Then
myRange.Value = Left(myRange, Len(myRange) - 2)
End If
End Sub
Update
Based on Juri's comment below, changing the If statement to this will also work, and it's a bit cleaner.
If Right (MyRange, 2) = "-3" Then MyRange=Left(MyRange, Len(MyRange)-2)
Please try the following:-
Edit as per OP's comments:
Sub mymacro()
Dim myString as String
//'--do stuff
//'-- you could just do this or save the returning
//'-- string to another string for further processing :)
MsgBox replaceAllNeg3s(myString)
End Sub
Function replaceAllNeg3s(ByRef urstring As String) As String
Dim regex As Object
Dim strtxt As String
strtxt = urstring
Set regex = CreateObject("VBScript.RegExp")
With regex
//'-- replace all -3s at the end of the String
.Pattern = "[(-3)]+$"
.Global = True
If .test(strtxt) Then
//'-- ContainsAMatch = Left(strText,Len(strText)-2)
//'-- infact you can use replace
replaceAllNeg3s = Trim(.Replace(strText,""))
Else
replaceAllNeg3s = strText
End If
End With
End Function
//'-- tested for
//'-- e.g. thistr25ing is -3-3-3-3
//'-- e.g. 25this-3stringis25someting-3-3
//'-- e.g. this-3-3-3stringis25something-5
//'-- e.g. -3this-3-3-3stringis25something-3
Unless its part of a bigger macro, there's no need for VBA here! Simply use this formula and you'll get the result:
=IF(RIGHT(A1,2)="-3",LEFT(A1,LEN(A1)-2),A1)
(assuming that your text is in cell A1)