regex not matching correctly - regex

First of all, I would like an opinion if using regex is even the best solution here, I'm fairly new to this area and regex is the first thing I found and it seemed somewhat easy to use, until I need to grab a long section of text out of a line lol. I'm using a vb.net environment for regex.
Basically, I'm taking this line here:
21:24:55 "READ/WRITE: ['PASS',false,'27880739',[40,[459.313,2434.11,0.00221252]],[["ItemFlashlight","ItemWatch","ItemMap","ItemKnife","ItemEtool","ItemGPS","ItemHatchet","ItemCompass","ItemMatchbox","M9SD","ItemFlashlightRed","NVGoggles","Binocular_Vector","ItemToolbox","M4A1_AIM_SD_camo"],["ItemPainkiller","ItemMorphine","ItemSodaPepsi","FoodSteakCooked",["30Rnd_556x45_StanagSD",29],"30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD",["15Rnd_9x19_M9SD",12],["15Rnd_9x19_M9SD",10],"15Rnd_9x19_M9SD","15Rnd_9x19_M9SD","ItemBandage"]],["DZ_Backpack_EP1",[["BAF_AS50_TWS"],[1]],[["FoodSteakCooked","ItemPainkiller","ItemMorphine","ItemSodaCoke","5Rnd_127x99_as50","ItemBloodbag"],[2,1,1,2,4,1]]],[316,517,517],Sniper1_DZ,0.94]"
Using the following regex:
\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\],
To try and get the following:
[["ItemFlashlight","ItemWatch","ItemMap","ItemKnife","ItemEtool","ItemGPS","ItemHatchet","ItemCompass","ItemMatchbox","M9SD","ItemFlashlightRed","NVGoggles","Binocular_Vector","ItemToolbox","M4A1_AIM_SD_camo"],["ItemPainkiller","ItemMorphine","ItemSodaPepsi","FoodSteakCooked",["30Rnd_556x45_StanagSD",29],"30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD",["15Rnd_9x19_M9SD",12],["15Rnd_9x19_M9SD",10],"15Rnd_9x19_M9SD","15Rnd_9x19_M9SD","ItemBandage"]]
However either my regex is flawed, or my vb.net code is. It only displays the following data:
[["ItemFlashlight","ItemWatch","ItemMap","ItemKnife","ItemEtool","ItemGPS","ItemHatchet","ItemCompass","ItemMatchbox","M9SD","ItemFlashlightRed","NVGoggles","Binocular_Vector","ItemToolbox","M4A1_AIM_SD_camo"],["ItemPainkiller","ItemMorphine","ItemSodaPepsi",
My vb.net code in case you need to peek at it is:
ListView1.Clear()
Call initList(Me.ListView1)
My.Computer.FileSystem.CurrentDirectory = My.Settings.cfgPath
My.Computer.FileSystem.CopyFile("arma2oaserver.RPT", "tempRPT.txt")
Dim ScriptLine As String = ""
Dim path As String = My.Computer.FileSystem.CurrentDirectory & "\tempRPT.txt"
Dim lines As String() = IO.File.ReadAllLines(path, System.Text.Encoding.Default)
Dim que = New Queue(Of String)(lines)
ProgressBar1.Maximum = lines.Count + 1
ProgressBar1.Value = 0
Do While que.Count > 0
ScriptLine = que.Dequeue()
ScriptLine = LCase(ScriptLine)
If InStr(ScriptLine, "login attempt:") Then
Dim rtime As Match = Regex.Match(ScriptLine, ("(\d{1,2}:\d{2}:\d{2})"))
Dim nam As Match = Regex.Match(ScriptLine, "\""([^)]*)\""")
Dim name As String = nam.ToString.Replace("""", "")
Dim next_line As String = que.Peek 'Read next line temporarily 'This is where it would move to next line temporarily to read from it
next_line = LCase(next_line)
If InStr(next_line, "read/write:") > 0 Then 'Or InStr(next_line, "update: [b") > 0 Then 'And InStr(next_line, "setmarkerposlocal.sqf") < 1 Then
Dim coords As Match = Regex.Match(next_line, "\[(\d+)\,\[(-?\d+)\.\d+\,(-?\d+)\.\d+,([\d|.|-]+)\]\]")
Dim inv As Match = Regex.Match(next_line, "\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\],") '\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\],
'\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\]:\[([\w|_|\""|,|\[|\]]*)\]\:
Dim back As Match = Regex.Match(next_line, "\""([\w|_]+)\"",\[\[([\w|_|\""|,]*)\],\[([\d|,]*)\]\],\[\[([\w|_|\""|,]*)\],\[([\d|,]*)\]\]")
Dim held As Match = Regex.Match(next_line, "\[\""([\w|_|\""|,]+)\""\,\d+\]")
With Me.ListView1
.Items.Add(name.ToString)
With .Items(.Items.Count - 1).SubItems
.Add(rtime.ToString)
.Add(coords.ToString)
.Add(inv.ToString)
.Add(back.ToString)
.Add(held.ToString)
End With
End With
End If
End If
ProgressBar1.Value += 1
Loop
My.Computer.FileSystem.DeleteFile("tempRPT.txt")
ProgressBar1.Value = 0
The odd thing is, when I test my regex in Expresso it gets the full, correct match. So I don't know what I'm doing wrong.

I'm not sure what's wrong with the regex you have, but the first match off of this one seems to work fine:
\[\[.*?\]\]
Hope this helps.
-EDIT-
The problem isn't the regex, it's that ListView is truncating the display of the string. See here

Try this regular expression instead: \Q[[\E(?:(?!\Q[[\E).)+]]
http://regex101.com/r/zP1aC5
If you need a backref, use \Q[[\E((?:(?!\Q[[\E).)+)]]

Perhaps you should specify whether you are working with single line or multi line input text. Depending on your input text format, try with:
Dim variableName as Match = Regex.Match("input", "pattern", RegexOptions.SingleLine);
or
Dim variableName as Match = Regex.Match("input", "pattern", RegexOptions.Multiline);

Related

Excel VBA - Looking up a string with wildcards

Im trying to look up a string which contains wildcards. I need to find where in a specific row the string occurs. The string all take form of "IP##W## XX" where XX are the 2 letters by which I look up the value and the ## are the number wildcards that can be any random number. Hence this is what my look up string looks like :
FullLookUpString = "IP##W## " & LookUpString
I tried using the Find Command to find the column where this first occurs but I keep on getting with errors. Here's what I had so far but it doesn't work :L if anyone has an easy way of doing. Quite new to VBA -.-
Dim GatewayColumn As Variant
Dim GatewayDateColumn As Variant
Dim FirstLookUpRange As Range
Dim SecondLookUpRange As Range
FullLookUpString = "IP##W## " & LookUpString
Set FirstLookUpRange = wsMPNT.Range(wsMPNT.Cells(3, 26), wsMPNT.Cells(3, lcolumnMPNT))
Debug.Print FullLookUpString
GatewayColumn = FirstLookUpRange.Find(What:=FullLookUpString, After:=Range("O3")).Column
Debug.Print GatewayColumn
Per the comment by #SJR you can do this two ways. Using LIKE the pattern is:
IP##W## [A-Z][A-Z]
Using regular expressions, the pattern is:
IP\d{2}W\d{2} [A-Z]{2}
Example code:
Option Explicit
Sub FindString()
Dim ws As Worksheet
Dim rngData As Range
Dim rngCell As Range
Set ws = ThisWorkbook.Worksheets("Sheet1") '<-- set your sheet
Set rngData = ws.Range("A1:A4")
' with LIKE operator
For Each rngCell In rngData
If rngCell.Value Like "IP##W## [A-Z][A-Z]" Then
Debug.Print rngCell.Address
End If
Next rngCell
' with regular expression
Dim objRegex As Object
Dim objMatch As Object
Set objRegex = CreateObject("VBScript.RegExp")
objRegex.Pattern = "IP\d{2}W\d{2} [A-Z]{2}"
For Each rngCell In rngData
If objRegex.Test(rngCell.Value) Then
Debug.Print rngCell.Address
End If
Next rngCell
End Sub
If we can assume that ALL the strings in the row match the given pattern, then we can examine only the last three characters:
Sub FindAA()
Dim rng As Range, r As Range, Gold As String
Set rng = Range(Range("A1"), Cells(1, Columns.Count))
Gold = " AA"
For Each r In rng
If Right(r.Value, 3) = Gold Then
MsgBox r.Address(0, 0)
Exit Sub
End If
Next r
End Sub
Try this:
If FullLookUpString Like "*IP##W##[a-zA-Z][a-zA-Z]*" Then
MsgBox "Match is found"
End If
It will find your pattern (pattern can be surrounded by any characters - that's allowed by *).

What is the RegExp Pattern to Extract Bullet Points Between Two Group Words using VBA in Word?

I can't seem to figure out the RegExp to extract the bullet points between two group of words in a word document.
For example:
Risk Assessment:
Test 1
Test 2
Test 3
Internal Audit
In this case I want to extract the bullet points between "Risk Assessment" and "Internal Audit", one bullet at a time and assign that bullet to an Excel cell. As shown in the code below I have pretty much everything done, except I cant figure out the correct Regex pattern. Any help would be great. Thanks in advance!
Sub PopulateExcelTable()
Dim fd As Office.FileDialog
Set fd = Application.FileDialog(msoFileDialogFilePicker)
With fd
.AllowMultiSelect = False
.Title = "Please select the file."
.Filters.Clear
.Filters.Add "Word 2007-2013", "*.docx"
If .Show = True Then
txtFileName = .SelectedItems(1)
End If
End With
Dim WordApp As Word.Application
Set WordApp = CreateObject("Word.Application")
Dim WordDoc As Word.Document
Set WordDoc = WordApp.Documents.Open(txtFileName)
Dim str As String: str = WordDoc.Content.Text ' Assign entire document content to string
Dim rex As New RegExp
rex.Pattern = "\b[^Risk Assessment\s].*[^Internal Audit\s]"
Dim i As long : i = 1
rex.Global = True
For Each mtch In rex.Execute(str)
Debug.Print mtch
Range("A" & i).Value = mtch
i = i + 1
Next mtch
WordDoc.Close
WordApp.Quit
End Sub
This is probably a long way around the problem but it works.
Steps I'm taking:
Find bullet list items using keywords before and after list in regexp.
(Group) regexp pattern so that you can extract everything in-between words.
Store listed items group into a string.
Split string by new line character into a new array.
Output each array item to excel.
Loop again since there may be more than one list in document.
Note: I don't see your code for a link to Excel workbook. I'll assume this part is working.
Dim rex As New RegExp
rex.Pattern = "(\bRisk Assessment\s)(.*)(Internal\sAudit\s)"
rex.Global = True
rex.MultiLine = True
rex.IgnoreCase = True
Dim lineArray() As String
Dim myMatches As Object
Set myMatches = rex.Execute(str)
For Each mtch In rex.Execute(str)
'Debug.Print mtch.SubMatches(1)
lineArray = Split(mtch.SubMatches(1), vbLf)
For x = LBound(lineArray) To UBound(lineArray)
'Debug.Print lineArray(x)
Range("A" & i).Value = lineArray(x)
i = i + 1
Next
Next mtch
My test page looks like this:
Results from inner Debug.Print line return this:
Item 1
Item 2
Item 3

How to extract substring in parentheses using Regex pattern

This is probably a simple problem, but unfortunately I wasn't able to get the results I wanted...
Say, I have the following line:
"Wouldn't It Be Nice" (B. Wilson/Asher/Love)
I would have to look for this pattern:
" (<any string>)
In order to retrieve:
B. Wilson/Asher/Love
I tried something like "" (([^))]*)) but it doesn't seem to work. Also, I'd like to use Match.Submatches(0) so that might complicate things a bit because it relies on brackets...
Edit: After examining your document, the problem is that there are non-breaking spaces before the parentheses, not regular spaces. So this regex should work: ""[ \xA0]*\(([^)]+)\)
"" 'quote (twice to escape)
[ \xA0]* 'zero or more non-breaking (\xA0) or a regular spaces
\( 'left parenthesis
( 'open capturing group
[^)]+ 'anything not a right parenthesis
) 'close capturing group
\) 'right parenthesis
In a function:
Public Function GetStringInParens(search_str As String)
Dim regEx As New VBScript_RegExp_55.RegExp
Dim matches
GetStringInParens = ""
regEx.Pattern = """[ \xA0]*\(([^)]+)\)"
regEx.Global = True
If regEx.test(search_str) Then
Set matches = regEx.Execute(search_str)
GetStringInParens = matches(0).SubMatches(0)
End If
End Function
Not strictly an answer to your question, but sometimes, for things this simple, good ol' string functions are less confusing and more concise than Regex.
Function BetweenParentheses(s As String) As String
BetweenParentheses = Mid(s, InStr(s, "(") + 1, _
InStr(s, ")") - InStr(s, "(") - 1)
End Function
Usage:
Debug.Print BetweenParentheses("""Wouldn't It Be Nice"" (B. Wilson/Asher/Love)")
'B. Wilson/Asher/Love
EDIT #alan points our that this will falsely match the contents of parentheses in the song title. This is easily circumvented with a little modification:
Function BetweenParentheses(s As String) As String
Dim iEndQuote As Long
Dim iLeftParenthesis As Long
Dim iRightParenthesis As Long
iEndQuote = InStrRev(s, """")
iLeftParenthesis = InStr(iEndQuote, s, "(")
iRightParenthesis = InStr(iEndQuote, s, ")")
If iLeftParenthesis <> 0 And iRightParenthesis <> 0 Then
BetweenParentheses = Mid(s, iLeftParenthesis + 1, _
iRightParenthesis - iLeftParenthesis - 1)
End If
End Function
Usage:
Debug.Print BetweenParentheses("""Wouldn't It Be Nice"" (B. Wilson/Asher/Love)")
'B. Wilson/Asher/Love
Debug.Print BetweenParentheses("""Don't talk (yell)""")
' returns empty string
Of course this is less concise than before!
This a nice regex
".*\(([^)]*)
In VBA/VBScript:
Dim myRegExp, ResultString, myMatches, myMatch As Match
Dim myRegExp As RegExp
Set myRegExp = New RegExp
myRegExp.Pattern = """.*\(([^)]*)"
Set myMatches = myRegExp.Execute(SubjectString)
If myMatches.Count >= 1 Then
Set myMatch = myMatches(0)
If myMatch.SubMatches.Count >= 3 Then
ResultString = myMatch.SubMatches(3-1)
Else
ResultString = ""
End If
Else
ResultString = ""
End If
This matches
Put Your Head on My Shoulder
in
"Don't Talk (Put Your Head on My Shoulder)"
Update 1
I let the regex loose on your doc file and it matches as requested. Quite sure the regex is fine. I'm not fluent in VBA/VBScript but my guess is that's where it goes wrong
If you want to discuss the regex some further that's fine with me. I'm not eager to start digging into this VBscript API which looks arcane.
Given the new input the regex is tweaked to
".*".*\(([^)]*)
So that it doesn't falsely match (Put Your Head on My Shoulder) which appears inside the quotes.
This function worked on your example string:
Function GetArtist(songMeta As String) As String
Dim artist As String
' split string by ")" and take last portion
artist = Split(songMeta, "(")(UBound(Split(songMeta, "(")))
' remove closing parenthesis
artist = Replace(artist, ")", "")
End Function
Ex:
Sub Test()
Dim songMeta As String
songMeta = """Wouldn't It Be Nice"" (B. Wilson/Asher/Love)"
Debug.Print GetArtist(songMeta)
End Sub
prints "B. Wilson/Asher/Love" to the Immediate Window.
It also solves the problem alan mentioned. Ex:
Sub Test()
Dim songMeta As String
songMeta = """Wouldn't (It Be) Nice"" (B. Wilson/Asher/Love)"
Debug.Print GetArtist(songMeta)
End Sub
also prints "B. Wilson/Asher/Love" to the Immediate Window. Unless of course, the artist names also include parentheses.
This another Regex tested with a vbscript (?:\()(.*)(?:\)) Demo Here
Data = """Wouldn't It Be Nice"" (B. Wilson/Asher/Love)"
wscript.echo Extract(Data)
'---------------------------------------------------------------
Function Extract(Data)
Dim strPattern,oRegExp,Matches
strPattern = "(?:\()(.*)(?:\))"
Set oRegExp = New RegExp
oRegExp.IgnoreCase = True
oRegExp.Pattern = strPattern
set Matches = oRegExp.Execute(Data)
If Matches.Count > 0 Then Extract = Matches(0).SubMatches(0)
End Function
'---------------------------------------------------------------
I think you need a better data file ;) You might want to consider pre-processing the file to a temp file for modification, so that outliers that don't fit your pattern are modified to where they'll meet your pattern. It's a bit time consuming to do, but it is always difficult when a data file lacks consistency.

Making RegEx Match Bold - VB.NET

This is my current RegEx: \[b\](.*?)\[/b\]
That works perfectly fine, it replaces exactly what I want it to. But, I'm trying to figure out how to make it replace the string between [b][/b] with a bold string, but the actual text stays the same.
Example string: [b]This is an example![/b]
Desired output: This is an example!
I'm using VB.NET and this is what I currently have:
Dim reg As New Regex("\[b\](.*?)\[/b\]")
Dim str As String = String.Empty
For Each m As Match In reg.Matches(MainBox.Text)
str = reg.Replace(MainBox.Text, "test")
Next
Preview.Show()
Preview.RichTextBox1.Text = str
Preview.Size = New Size(Preview.MaximumSize.Width, Preview.MaximumSize.Height)
You need to set the start of the selection, and set the attributes of the text before inserting it.
Preview.RichTextBox1.SelectionStart = Preview.RichTextBox1.Text.Length
Preview.RichTextBox1.SelectionFont = New Font("Tahoma", 12, FontStyle.Bold)
Preview.RichTextBox1.SelectedText = str

Split a string according to a regexp in VBScript

I would like to split a string into an array according to a regular expression similar to what can be done with preg_split in PHP or VBScript Split function but with a regex in place of delimiter.
Using VBScript Regexp object, I can execute a regex but it returns the matches (so I get a collection of my splitters... that's not what I want)
Is there a way to do so ?
Thank you
If you can reserve a special delimiter string, i.e. a string that you can choose that will never be a part of the real input string (perhaps something like "###"), then you can use regex replacement to replace all matches of your pattern to "###", and then split on "###".
Another possibility is to use a capturing group. If your delimiter regex is, say, \d+, then you search for (.*?)\d+, and then extract what the group captured in each match (see before and after on rubular.com).
You can alway use the returned array of matches as input to the split function. You split the original string using the first match - the first part of the string is the first split, then split the remainder of the string (minus the first part and the first match)... continue until done.
I wrote this for my use. Might be what you're looking for.
Function RegSplit(szPattern, szStr)
Dim oAl, oRe, oMatches
Set oRe = New RegExp
oRe.Pattern = "^(.*)(" & szPattern & ")(.*)$"
oRe.IgnoreCase = True
oRe.Global = True
Set oAl = CreateObject("System.Collections.ArrayList")
Do
Set oMatches = oRe.Execute(szStr)
If oMatches.Count > 0 Then
oAl.Add oMatches(0).SubMatches(2)
szStr = oMatches(0).SubMatches(0)
Else
oAl.Add szStr
Exit Do
End If
Loop
oAl.Reverse
RegSplit = oAl.ToArray
End Function
'**************************************************************
Dim A
A = RegSplit("[,|;|#]", "bob,;joe;tony#bill")
WScript.Echo Join(A, vbCrLf)
Returns:
bob
joe
tony
bill
I think you can achieve this by using Execute to match on the required splitter string, but capturing all the preceding characters (after the previous match) as a group. Here is some code that could do what you want.
'// Function splits a string on matches
'// against a given string
Function SplitText(strInput,sFind)
Dim ArrOut()
'// Don't do anything if no string to be found
If len(sFind) = 0 then
redim ArrOut(0)
ArrOut(0) = strInput
SplitText = ArrOut
Exit Function
end If
'// Define regexp
Dim re
Set re = New RegExp
'// Pattern to be found - i.e. the given
'// match or the end of the string, preceded
'// by any number of characters
re.Pattern="(.*?)(?:" & sFind & "|$)"
re.IgnoreCase = True
re.Global = True
'// find all the matches >> match collection
Dim oMatches: Set oMatches = re.Execute( strInput )
'// Prepare to process
Dim oMatch
Dim ix
Dim iMax
'// Initialize the output array
iMax = oMatches.Count - 1
redim arrOut( iMax)
'// Process each match
For ix = 0 to iMax
'// get the match
Set oMatch = oMatches(ix)
'// Get the captured string that precedes the match
arrOut( ix ) = oMatch.SubMatches(0)
Next
Set re = nothing
'// Check if the last entry was empty - this
'// removes one entry if the string ended on a match
if arrOut(iMax) = "" then Redim Preserve ArrOut(iMax-1)
'// Return the processed output
SplitText = arrOut
End Function