Vbscript Regular expression, find all between 2 known strings - regex

I would like to select all texts between two know strings. Say for example the following text
*starthere
*General Settings
* some text1
* some text2
*endhere
I would like to select all texts between "*starthere" and "*endhere" using vbscript. so that the final output looks like the following
*General Settings
* some text1
* some text2
I know this would be simpler using a regex since there are multiple instances of such pattern in the file i read.
I tried something like the following
/(.*starthere\s+)(.*)(\s+*endhere.*)/
/(*starthere)(?:[^])*?(*endhere)/
But they dont seem to work and it selects even the start and end strings together. Lookforward and backword dont seem to work either and iam not sure if they have support for vbscript.
This is the code I am using:
'Create a regular expression object
Dim objRegExp
Set objRegExp = New RegExp 'Set our pattern
objRegExp.Pattern = "/\*starthere\s+([\s\S]*?)\s+\*endhere\b/" objRegExp.IgnoreCase = True
objRegExp.Global = True
Do Until objFile.AtEndOfStream
strSearchString = objFile.ReadLine
Dim objMatches
Set objMatches = objRegExp.Execute(strSearchString)
If objMatches.Count > 0 Then
out = out & objMatches(0) &vbCrLf
WScript.Echo "found"
End If
Loop
WScript.Echo out
objFile.Close

You can use:
/\bstarthere\s+([\s\S]*?)\s+endhere\b/
and grab the captured group #1
([\s\S]*?) will match any text between these 2 tags including newlines.

Related

Using Regex pattern to copy and rename Excel sheet using VBA

I having trouble creating VBA to copy an existing sheet and rename the copy with a specific suffix.
The existing sheet is named with a variable prefix (a digit code) followed by a fix suffix.
The copied sheet should be renamed with the same prefix, followed by another fix suffix.
I would like to use regex to do so, but I cannot figure out how to specify the sheet names with regex. The pattern would simply be something like [0-9]+ for the prefix.
The suffix are always the same.
Example:
Existing sheet: 123_raw
New copied sheet: 123_analyzed
This is what I have so far and don't know how to go on:
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "[0-9]+"
It should look something similar to this I guess:
Sheets("regex pattern + [suffix]").Select
Sheets("regex pattern + [suffix]").Copy After:=Sheets(3)
Sheets("regex pattern + [suffix] (2)").Select
Sheets("regex pattern + [suffix] (2)").Name = "regex pattern + [new suffix]"
But I have no idea on how to actually code it.
Any help much appreciated!
Assuming your Sheet names are 123_raw 456_raw Or something [3 digits_words] so your Pattern will be Pattern = "([0-9]{3}\_)" https://regex101.com/r/Iu6nxn/1
([0-9]{3}\_) Match a single character present in the list below
0-9 a single character in the range between 0 and 9
{3} Quantifier — Matches exactly 3 times
\_ matches the character _ literally (case sensitive)
VBA Code Example as simple as it can be - Here we are searching for the sheet name 123_ or [3 digits_words] Copy and rename to 3 digits_analyzed
Option Explicit
Public Sub Example()
Dim RegExp As Object
Dim Matches As Variant
Dim Pattern As String
Dim NewName As String
Dim Sht As Worksheet
Set RegExp = CreateObject("VbScript.RegExp")
For Each Sht In ThisWorkbook.Worksheets
Pattern = "([0-9]{3}\_)" ' Sheet name 123_
With RegExp
.Global = False
.Pattern = Pattern
.IgnoreCase = True
Set Matches = .Execute(Sht.Name)
End With
If Matches.Count > 0 Then
Debug.Print Matches(0) ' Print on Immediate Win
NewName = Matches(0) & "analyzed" ' New sheet name
Sht.Copy After:=Sheets(3) ' Copy Sheet
ActiveSheet.Name = NewName ' Rename sheet with new name
End If
Next
Set RegExp = Nothing
Set Matches = Nothing
Set Sht = Nothing
End Sub
Something like this (where _new replaces the prior suffix)
Sub B()
Dim ws As Worksheet
Dim objRegex As Object
Set ws = Sheets(1)
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "([0-9]+)_[a-z]+"
If .test(ws.Name) Then
ws.Copy After:=Sheets(Sheets.Count)
ActiveSheet.Name = .Replace(ws.Name, "$1_new")
End If
End With
End Sub

Find specific instance of a match in string using RegEx

I am very new to RegEx and I can't seem to find what I looking for. I have a string such as:
[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]
and I want to get everything within the first set of brackets as well as the second set of brackets. If there is a way that I can do this with one pattern so that I can just loop through the matches, that would be great. If not, thats fine. I just need to be able to get the different sections of text separately. So far, the following is all I have come up with, but it just returns the whole string minus the first opening bracket and the last closing bracket:
[\[-\]]
(Note: I'm using the replace function, so this might be the reverse of what you are expecting.)
In my research, I have discovered that there are different RegEx engines. I'm not sure the name of the one that I'm using, but I'm using it in MS Access.
If you're using Access, you can use the VBScript Regular Expressions Library to do this. For example:
Const SOME_TEXT = "[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]"
Dim re
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.Pattern = "\[([^\]]+)\]"
Dim m As Object
For Each m In re.Execute(SOME_TEXT)
Debug.Print m.Submatches(0)
Next
Output:
cmdSubmitToDatacenter_Click
Form_frm_bk_UnsubmittedWires
Here is what I ended up using as it made it easier to get the individual values returned. I set a reference to the Microsoft VBScript Regular Expression 5.5 so that I could get Intellisense help.
Public Sub GetText(strInput As String)
Dim regex As RegExp
Dim colMatches As MatchCollection
Dim strModule As String
Dim strProcedure As String
Set regex = New RegExp
With regex
.Global = True
.Pattern = "\[([^\]]+)\]"
End With
Set colMatches = regex.Execute(strInput)
With colMatches
strProcedure = .Item(0).submatches.Item(0)
strModule = .Item(1).submatches.Item(0)
End With
Debug.Print "Module: " & strModule
Debug.Print "Procedure: " & strProcedure
Set regex = Nothing
End Sub

Extract a string on a text file using VBS

Okay so I have this file sample.txt
("checkAssdMobileNo1".equals(ACTION)
("checkAssdMobileNo2".equals(ACTION)
("checkAssdMobileNo3".equals(ACTION)
("checkAssdMobileNo4".equals(ACTION)
("checkAssdMobileNo5".equals(ACTION)
("checkAssdMobileNo6".equals(ACTION)
How can I output only these:
checkAssdMobileNo1
checkAssdMobileNo2
checkAssdMobileNo3
checkAssdMobileNo4
checkAssdMobileNo5
checkAssdMobileNo6
I tried using the following code but it would not output anything and I couldn't figure out what I did wrong:
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set file = objFSO.OpenTextFile("sample.txt" , ForReading)
Const ForReading = 1
Dim re
Set re = new regexp
re.Pattern = """\w+?""[.]equals(ACTION)"
re.IgnoreCase = True
re.Global = True
Dim line
Do Until file.AtEndOfStream
line = file.ReadLine
For Each m In re.Execute(line)
Wscript.Echo m.Submatches(0)
Next
Loop
Your regular expression is close, but missing 2 things:
You need to escape the parentheses surrounding ACTION
You need to use unescaped parentheses to extract the group between the quotes
Something like this should work:
re.Pattern = """(\w+?)""[.]equals\(ACTION\)"
Regex you need is
\("(\w+)"
Demo on regex101
It uses the concept of Group Capture

Replace matched pattern with different font

I am using Outlook 2010, and I am trying to write a macro to replace the font of text with a different one, if it matches a pattern.
The logic I am trying to apply is simple - in the user selected text, check for a pattern, and on match, change the font for the matched text.
So far I have been able to split the text and apply/check regex, but the replacement is something that I am not clear on how to do.
Dim objOL As Application
Dim objDoc As Object
Dim objSel As Object
Dim regEx As RegExp
Dim matches As MatchCollection
Dim m As Match
Dim lines As Variant
Dim ms As String
Set objOL = Application
Set objDoc = objOL.ActiveInspector.WordEditor
Set objSel = objDoc.Windows(1).Selection
lines = Split(objSel, Chr(13))
For i = 0 To UBound(lines) Step 1
Set regEx = New RegExp
With regEx
.Pattern = "\[(ok|edit|error)\](\[.*\])?" ' <-- this is just one regex, I want to be able to check more regexes
.Global = True
End With
If regEx.Test(lines(i)) Then
Set matches = regEx.Execute(lines(i))
For Each m In matches
ms = m.SubMatches(1)
' ms.Font.Italic = True
' <-- here is where I am not sure how to replace! :( -->
Next
End If
Next i
P.S there seems to be text-search (objSel.Find.Text)and replace (objSel.Find.Replacement.Text) methods in Selection object, but not pattern-search ! (or I am missing it)
--EDIT--
Adding a sample text
user#host> show some data
..<few lines of data>.. <-- these lines as-is (but monospaced)
[ok][2014-11-26 11:05:02]
user#host> edit some other data
[edit data]
user#host(data)% some other command
I want to convert the whole block to a monospaced font (like Courier New, or Consolas)
And change the part that begins with something#somewhere.. and till > or % to dimmer color,
(i.e in this example user#host> and user#host(data)% dimmer/grey)
The rest in that line to bold (show some data et al)
And, all the bracketed text followed by time-stamps (or without timestamps) similar to 2. (i.e, dim/grey)
This is getting closer to being done. The framework is here to make all sorts of changes now. Just need to get some of the regex patterns down to make the changes.
Sub FormatSelection()
Dim objMailItem As Outlook.MailItem
Dim objInspector As Outlook.Inspector: Set objInspector = Application.ActiveInspector
Dim objHtmlEditor As Object
Dim objWord As Object
Dim Range As Word.Selection
Dim objSavedSelection As Word.Selection
Dim objFoundText As Object
' Verify a mail object is in focus.
If objInspector.CurrentItem.Class = olMail Then
' Get the mail object.
Set objMailItem = objInspector.CurrentItem
If objInspector.EditorType = olEditorWord Then
' We are using a Word editor. Get the selected text.
Set objHtmlEditor = objMailItem.GetInspector.WordEditor
Set objWord = objHtmlEditor.Application
Set Range = objWord.Selection
Debug.Print Range.Range
' Set defaults for the selection
With Range.Font
.Name = "Courier"
.ColorIndex = wdAuto
End With
' Stylize the bracketed text
Call FormatTextWithRegex(Range, 2, "\[(.+?)\]")
' Prompt style text.
Call FormatTextWithRegex(Range, 2, "(\w+?#.+?)(?=[\>\%])")
' Text following the prompt.
Call FormatTextWithRegex(Range, 3, "(\w+?#.+?[\>\%])(.+)")
End If
End If
Set objInspector = Nothing
Set Range = Nothing
Set objHtmlEditor = Nothing
Set objMailItem = Nothing
End Sub
Private Sub FormatTextWithRegex(ByRef pRange As Word.Selection, pActionIndex As Integer, pPattern As String)
' This routine will perform a regex replacement on the text in pRange using pPattern
' on text based on the pactionindex passed.
Const intLightColourIndex = 15
Dim objRegex As RegExp: Set objRegex = New RegExp
Dim objSingleMatch As Object
Dim objMatches As Object
' Configure Regex object.
With objRegex
.IgnoreCase = True
.MultiLine = False
.Pattern = pPattern ' Example "\[(ok|edit|error)\](\[.+?\])?"
.Global = True
End With
' Locate all matches if any.
Set objMatches = objRegex.Execute(pRange.Text)
' Find
If (objMatches.Count > 0) Then
Debug.Print objMatches.Count & " Match(es) Found"
For Each objSingleMatch In objMatches
' Locate the text associated to this match in the selection so we can replace it.
Debug.Print "Match Found: '" & objSingleMatch & "'"
With pRange.Find
'.ClearFormatting
.Text = objSingleMatch.Value
.ClearFormatting
Select Case pActionIndex
Case 1 ' Italisize text
.Replacement.Text = objSingleMatch.Value
.Replacement.Font.Bold = False
.Replacement.Font.Italic = True
.Replacement.Font.ColorIndex = wdAuto
.Execute Replace:=wdReplaceAll
Case 2 ' Dim the colour
.Replacement.Text = objSingleMatch.Value
.Replacement.Font.Bold = False
.Replacement.Font.Italic = False
.Replacement.Font.ColorIndex = intLightColourIndex
.Execute Replace:=wdReplaceAll
Case 3 ' Bold that text!
.Replacement.Text = objSingleMatch.Value
.Replacement.Font.Bold = True
.Replacement.Font.Italic = False
.Replacement.Font.ColorIndex = wdAuto
.Execute Replace:=wdReplaceAll
End Select
End With
Next
Else
Debug.Print "No matches found for pattern: " & pPattern
End If
Set objRegex = Nothing
Set objSingleMatch = Nothing
Set objMatches = Nothing
End Sub
So we take what the user has selected and execute the macro. I have my Outlook configured with Word for the editor so that is tested for. Take the selected text and run the regex query against the text saving the matches.
The issue you had is what to do with the match once you found it. In my case since we have the actual text that matched we can run that through a find and replace using the selection once again. Replacing the text with itself instead styled as directed.
Caveats
My testing text was the following:
asdfadsfadsf [ok][Test]dsfadsfasdf asdfadsfadsfasdfasdfadsfadsf [ok][Test]dsfadsfasdf asdfadsfadsfasdf
I had to change your regex in your sample to be less greedy since it was matching both [ok][Test] sections. I don't know what kind of text you are working with so my logic might not apply to your situation. Test with caution.
You also had a comment that you needed to test multiple regexes... regexies.... I don't know what the plural is. Wouldn't be hard to create another function that calls this one for several patterns. Assuming this logic works repeating it should not be a big deal. I would like to make this work for you so if something is wrong let me know.
Code Update
I have changed the code so that the regex replacement is in a sub. So what the code does right now is change the selected text to courier and italisize text based on a regex. Now with how it is set up you can use the sub routine FormatTextWithRegex to make changes. Just need to update the pattern and action index which will perform the different styles. Will be updating this again soon with more information. Right now all that exists is the structure that I think you need.
Having issues with the bolding still but you can see the grey part is working correctly. Also the since this relies on highlighting the multiple calls to the function are causing an issue. Just not sure what it is.

VBA Regex to match Two sets of Four Digits with String

Im trying to make a Excel Regex pattern to find a certain string. This what Im trying:
I'm trying to make it match 0 and 0000 to 9999
StringToMatch = "a75z6878"
Dim objRegExp As New RegExp
Set objRegExp = CreateObject("vbscript.regexp")
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "[a-z]([0-9][0-9][0-9][0-9])[a-z]([0-9][0-9][0-9][0-9])"
objRegExp.Pattern = "[a-z]([0-9]{1-4})[a-z]([0-9]{1-4})"
If objRegExp.Test(StringToMatch) Then MsgBox(Found!)
I have tried different patterns but none work.
What am I doing wrong???
What is wrong in objRegExp.Pattern = "[a-z]([0-9]{1-4})[a-z]([0-9]{1-4})"
The quantifier must be specified as {m,n} and not {m-n}
change the regex to
[a-z][0-9]{1,4}[a-z][0-9]{1,4}
For example see the link http://regex101.com/r/wA2qM3/1
OR a shorter version like
[a-z]\d{1,4}[a-z]\d{1,4}