regex not matching correctly - regex
First of all, I would like an opinion if using regex is even the best solution here, I'm fairly new to this area and regex is the first thing I found and it seemed somewhat easy to use, until I need to grab a long section of text out of a line lol. I'm using a vb.net environment for regex.
Basically, I'm taking this line here:
21:24:55 "READ/WRITE: ['PASS',false,'27880739',[40,[459.313,2434.11,0.00221252]],[["ItemFlashlight","ItemWatch","ItemMap","ItemKnife","ItemEtool","ItemGPS","ItemHatchet","ItemCompass","ItemMatchbox","M9SD","ItemFlashlightRed","NVGoggles","Binocular_Vector","ItemToolbox","M4A1_AIM_SD_camo"],["ItemPainkiller","ItemMorphine","ItemSodaPepsi","FoodSteakCooked",["30Rnd_556x45_StanagSD",29],"30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD",["15Rnd_9x19_M9SD",12],["15Rnd_9x19_M9SD",10],"15Rnd_9x19_M9SD","15Rnd_9x19_M9SD","ItemBandage"]],["DZ_Backpack_EP1",[["BAF_AS50_TWS"],[1]],[["FoodSteakCooked","ItemPainkiller","ItemMorphine","ItemSodaCoke","5Rnd_127x99_as50","ItemBloodbag"],[2,1,1,2,4,1]]],[316,517,517],Sniper1_DZ,0.94]"
Using the following regex:
\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\],
To try and get the following:
[["ItemFlashlight","ItemWatch","ItemMap","ItemKnife","ItemEtool","ItemGPS","ItemHatchet","ItemCompass","ItemMatchbox","M9SD","ItemFlashlightRed","NVGoggles","Binocular_Vector","ItemToolbox","M4A1_AIM_SD_camo"],["ItemPainkiller","ItemMorphine","ItemSodaPepsi","FoodSteakCooked",["30Rnd_556x45_StanagSD",29],"30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD",["15Rnd_9x19_M9SD",12],["15Rnd_9x19_M9SD",10],"15Rnd_9x19_M9SD","15Rnd_9x19_M9SD","ItemBandage"]]
However either my regex is flawed, or my vb.net code is. It only displays the following data:
[["ItemFlashlight","ItemWatch","ItemMap","ItemKnife","ItemEtool","ItemGPS","ItemHatchet","ItemCompass","ItemMatchbox","M9SD","ItemFlashlightRed","NVGoggles","Binocular_Vector","ItemToolbox","M4A1_AIM_SD_camo"],["ItemPainkiller","ItemMorphine","ItemSodaPepsi",
My vb.net code in case you need to peek at it is:
ListView1.Clear()
Call initList(Me.ListView1)
My.Computer.FileSystem.CurrentDirectory = My.Settings.cfgPath
My.Computer.FileSystem.CopyFile("arma2oaserver.RPT", "tempRPT.txt")
Dim ScriptLine As String = ""
Dim path As String = My.Computer.FileSystem.CurrentDirectory & "\tempRPT.txt"
Dim lines As String() = IO.File.ReadAllLines(path, System.Text.Encoding.Default)
Dim que = New Queue(Of String)(lines)
ProgressBar1.Maximum = lines.Count + 1
ProgressBar1.Value = 0
Do While que.Count > 0
ScriptLine = que.Dequeue()
ScriptLine = LCase(ScriptLine)
If InStr(ScriptLine, "login attempt:") Then
Dim rtime As Match = Regex.Match(ScriptLine, ("(\d{1,2}:\d{2}:\d{2})"))
Dim nam As Match = Regex.Match(ScriptLine, "\""([^)]*)\""")
Dim name As String = nam.ToString.Replace("""", "")
Dim next_line As String = que.Peek 'Read next line temporarily 'This is where it would move to next line temporarily to read from it
next_line = LCase(next_line)
If InStr(next_line, "read/write:") > 0 Then 'Or InStr(next_line, "update: [b") > 0 Then 'And InStr(next_line, "setmarkerposlocal.sqf") < 1 Then
Dim coords As Match = Regex.Match(next_line, "\[(\d+)\,\[(-?\d+)\.\d+\,(-?\d+)\.\d+,([\d|.|-]+)\]\]")
Dim inv As Match = Regex.Match(next_line, "\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\],") '\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\],
'\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\]:\[([\w|_|\""|,|\[|\]]*)\]\:
Dim back As Match = Regex.Match(next_line, "\""([\w|_]+)\"",\[\[([\w|_|\""|,]*)\],\[([\d|,]*)\]\],\[\[([\w|_|\""|,]*)\],\[([\d|,]*)\]\]")
Dim held As Match = Regex.Match(next_line, "\[\""([\w|_|\""|,]+)\""\,\d+\]")
With Me.ListView1
.Items.Add(name.ToString)
With .Items(.Items.Count - 1).SubItems
.Add(rtime.ToString)
.Add(coords.ToString)
.Add(inv.ToString)
.Add(back.ToString)
.Add(held.ToString)
End With
End With
End If
End If
ProgressBar1.Value += 1
Loop
My.Computer.FileSystem.DeleteFile("tempRPT.txt")
ProgressBar1.Value = 0
The odd thing is, when I test my regex in Expresso it gets the full, correct match. So I don't know what I'm doing wrong.
I'm not sure what's wrong with the regex you have, but the first match off of this one seems to work fine:
\[\[.*?\]\]
Hope this helps.
-EDIT-
The problem isn't the regex, it's that ListView is truncating the display of the string. See here
Try this regular expression instead: \Q[[\E(?:(?!\Q[[\E).)+]]
http://regex101.com/r/zP1aC5
If you need a backref, use \Q[[\E((?:(?!\Q[[\E).)+)]]
Perhaps you should specify whether you are working with single line or multi line input text. Depending on your input text format, try with:
Dim variableName as Match = Regex.Match("input", "pattern", RegexOptions.SingleLine);
or
Dim variableName as Match = Regex.Match("input", "pattern", RegexOptions.Multiline);
Related
Excel VBA - Looking up a string with wildcards
Im trying to look up a string which contains wildcards. I need to find where in a specific row the string occurs. The string all take form of "IP##W## XX" where XX are the 2 letters by which I look up the value and the ## are the number wildcards that can be any random number. Hence this is what my look up string looks like : FullLookUpString = "IP##W## " & LookUpString I tried using the Find Command to find the column where this first occurs but I keep on getting with errors. Here's what I had so far but it doesn't work :L if anyone has an easy way of doing. Quite new to VBA -.- Dim GatewayColumn As Variant Dim GatewayDateColumn As Variant Dim FirstLookUpRange As Range Dim SecondLookUpRange As Range FullLookUpString = "IP##W## " & LookUpString Set FirstLookUpRange = wsMPNT.Range(wsMPNT.Cells(3, 26), wsMPNT.Cells(3, lcolumnMPNT)) Debug.Print FullLookUpString GatewayColumn = FirstLookUpRange.Find(What:=FullLookUpString, After:=Range("O3")).Column Debug.Print GatewayColumn
Per the comment by #SJR you can do this two ways. Using LIKE the pattern is: IP##W## [A-Z][A-Z] Using regular expressions, the pattern is: IP\d{2}W\d{2} [A-Z]{2} Example code: Option Explicit Sub FindString() Dim ws As Worksheet Dim rngData As Range Dim rngCell As Range Set ws = ThisWorkbook.Worksheets("Sheet1") '<-- set your sheet Set rngData = ws.Range("A1:A4") ' with LIKE operator For Each rngCell In rngData If rngCell.Value Like "IP##W## [A-Z][A-Z]" Then Debug.Print rngCell.Address End If Next rngCell ' with regular expression Dim objRegex As Object Dim objMatch As Object Set objRegex = CreateObject("VBScript.RegExp") objRegex.Pattern = "IP\d{2}W\d{2} [A-Z]{2}" For Each rngCell In rngData If objRegex.Test(rngCell.Value) Then Debug.Print rngCell.Address End If Next rngCell End Sub
If we can assume that ALL the strings in the row match the given pattern, then we can examine only the last three characters: Sub FindAA() Dim rng As Range, r As Range, Gold As String Set rng = Range(Range("A1"), Cells(1, Columns.Count)) Gold = " AA" For Each r In rng If Right(r.Value, 3) = Gold Then MsgBox r.Address(0, 0) Exit Sub End If Next r End Sub
Try this: If FullLookUpString Like "*IP##W##[a-zA-Z][a-zA-Z]*" Then MsgBox "Match is found" End If It will find your pattern (pattern can be surrounded by any characters - that's allowed by *).
What is the RegExp Pattern to Extract Bullet Points Between Two Group Words using VBA in Word?
I can't seem to figure out the RegExp to extract the bullet points between two group of words in a word document. For example: Risk Assessment: Test 1 Test 2 Test 3 Internal Audit In this case I want to extract the bullet points between "Risk Assessment" and "Internal Audit", one bullet at a time and assign that bullet to an Excel cell. As shown in the code below I have pretty much everything done, except I cant figure out the correct Regex pattern. Any help would be great. Thanks in advance! Sub PopulateExcelTable() Dim fd As Office.FileDialog Set fd = Application.FileDialog(msoFileDialogFilePicker) With fd .AllowMultiSelect = False .Title = "Please select the file." .Filters.Clear .Filters.Add "Word 2007-2013", "*.docx" If .Show = True Then txtFileName = .SelectedItems(1) End If End With Dim WordApp As Word.Application Set WordApp = CreateObject("Word.Application") Dim WordDoc As Word.Document Set WordDoc = WordApp.Documents.Open(txtFileName) Dim str As String: str = WordDoc.Content.Text ' Assign entire document content to string Dim rex As New RegExp rex.Pattern = "\b[^Risk Assessment\s].*[^Internal Audit\s]" Dim i As long : i = 1 rex.Global = True For Each mtch In rex.Execute(str) Debug.Print mtch Range("A" & i).Value = mtch i = i + 1 Next mtch WordDoc.Close WordApp.Quit End Sub
This is probably a long way around the problem but it works. Steps I'm taking: Find bullet list items using keywords before and after list in regexp. (Group) regexp pattern so that you can extract everything in-between words. Store listed items group into a string. Split string by new line character into a new array. Output each array item to excel. Loop again since there may be more than one list in document. Note: I don't see your code for a link to Excel workbook. I'll assume this part is working. Dim rex As New RegExp rex.Pattern = "(\bRisk Assessment\s)(.*)(Internal\sAudit\s)" rex.Global = True rex.MultiLine = True rex.IgnoreCase = True Dim lineArray() As String Dim myMatches As Object Set myMatches = rex.Execute(str) For Each mtch In rex.Execute(str) 'Debug.Print mtch.SubMatches(1) lineArray = Split(mtch.SubMatches(1), vbLf) For x = LBound(lineArray) To UBound(lineArray) 'Debug.Print lineArray(x) Range("A" & i).Value = lineArray(x) i = i + 1 Next Next mtch My test page looks like this: Results from inner Debug.Print line return this: Item 1 Item 2 Item 3
How to extract substring in parentheses using Regex pattern
This is probably a simple problem, but unfortunately I wasn't able to get the results I wanted... Say, I have the following line: "Wouldn't It Be Nice" (B. Wilson/Asher/Love) I would have to look for this pattern: " (<any string>) In order to retrieve: B. Wilson/Asher/Love I tried something like "" (([^))]*)) but it doesn't seem to work. Also, I'd like to use Match.Submatches(0) so that might complicate things a bit because it relies on brackets...
Edit: After examining your document, the problem is that there are non-breaking spaces before the parentheses, not regular spaces. So this regex should work: ""[ \xA0]*\(([^)]+)\) "" 'quote (twice to escape) [ \xA0]* 'zero or more non-breaking (\xA0) or a regular spaces \( 'left parenthesis ( 'open capturing group [^)]+ 'anything not a right parenthesis ) 'close capturing group \) 'right parenthesis In a function: Public Function GetStringInParens(search_str As String) Dim regEx As New VBScript_RegExp_55.RegExp Dim matches GetStringInParens = "" regEx.Pattern = """[ \xA0]*\(([^)]+)\)" regEx.Global = True If regEx.test(search_str) Then Set matches = regEx.Execute(search_str) GetStringInParens = matches(0).SubMatches(0) End If End Function
Not strictly an answer to your question, but sometimes, for things this simple, good ol' string functions are less confusing and more concise than Regex. Function BetweenParentheses(s As String) As String BetweenParentheses = Mid(s, InStr(s, "(") + 1, _ InStr(s, ")") - InStr(s, "(") - 1) End Function Usage: Debug.Print BetweenParentheses("""Wouldn't It Be Nice"" (B. Wilson/Asher/Love)") 'B. Wilson/Asher/Love EDIT #alan points our that this will falsely match the contents of parentheses in the song title. This is easily circumvented with a little modification: Function BetweenParentheses(s As String) As String Dim iEndQuote As Long Dim iLeftParenthesis As Long Dim iRightParenthesis As Long iEndQuote = InStrRev(s, """") iLeftParenthesis = InStr(iEndQuote, s, "(") iRightParenthesis = InStr(iEndQuote, s, ")") If iLeftParenthesis <> 0 And iRightParenthesis <> 0 Then BetweenParentheses = Mid(s, iLeftParenthesis + 1, _ iRightParenthesis - iLeftParenthesis - 1) End If End Function Usage: Debug.Print BetweenParentheses("""Wouldn't It Be Nice"" (B. Wilson/Asher/Love)") 'B. Wilson/Asher/Love Debug.Print BetweenParentheses("""Don't talk (yell)""") ' returns empty string Of course this is less concise than before!
This a nice regex ".*\(([^)]*) In VBA/VBScript: Dim myRegExp, ResultString, myMatches, myMatch As Match Dim myRegExp As RegExp Set myRegExp = New RegExp myRegExp.Pattern = """.*\(([^)]*)" Set myMatches = myRegExp.Execute(SubjectString) If myMatches.Count >= 1 Then Set myMatch = myMatches(0) If myMatch.SubMatches.Count >= 3 Then ResultString = myMatch.SubMatches(3-1) Else ResultString = "" End If Else ResultString = "" End If This matches Put Your Head on My Shoulder in "Don't Talk (Put Your Head on My Shoulder)" Update 1 I let the regex loose on your doc file and it matches as requested. Quite sure the regex is fine. I'm not fluent in VBA/VBScript but my guess is that's where it goes wrong If you want to discuss the regex some further that's fine with me. I'm not eager to start digging into this VBscript API which looks arcane. Given the new input the regex is tweaked to ".*".*\(([^)]*) So that it doesn't falsely match (Put Your Head on My Shoulder) which appears inside the quotes.
This function worked on your example string: Function GetArtist(songMeta As String) As String Dim artist As String ' split string by ")" and take last portion artist = Split(songMeta, "(")(UBound(Split(songMeta, "("))) ' remove closing parenthesis artist = Replace(artist, ")", "") End Function Ex: Sub Test() Dim songMeta As String songMeta = """Wouldn't It Be Nice"" (B. Wilson/Asher/Love)" Debug.Print GetArtist(songMeta) End Sub prints "B. Wilson/Asher/Love" to the Immediate Window. It also solves the problem alan mentioned. Ex: Sub Test() Dim songMeta As String songMeta = """Wouldn't (It Be) Nice"" (B. Wilson/Asher/Love)" Debug.Print GetArtist(songMeta) End Sub also prints "B. Wilson/Asher/Love" to the Immediate Window. Unless of course, the artist names also include parentheses.
This another Regex tested with a vbscript (?:\()(.*)(?:\)) Demo Here Data = """Wouldn't It Be Nice"" (B. Wilson/Asher/Love)" wscript.echo Extract(Data) '--------------------------------------------------------------- Function Extract(Data) Dim strPattern,oRegExp,Matches strPattern = "(?:\()(.*)(?:\))" Set oRegExp = New RegExp oRegExp.IgnoreCase = True oRegExp.Pattern = strPattern set Matches = oRegExp.Execute(Data) If Matches.Count > 0 Then Extract = Matches(0).SubMatches(0) End Function '---------------------------------------------------------------
I think you need a better data file ;) You might want to consider pre-processing the file to a temp file for modification, so that outliers that don't fit your pattern are modified to where they'll meet your pattern. It's a bit time consuming to do, but it is always difficult when a data file lacks consistency.
Making RegEx Match Bold - VB.NET
This is my current RegEx: \[b\](.*?)\[/b\] That works perfectly fine, it replaces exactly what I want it to. But, I'm trying to figure out how to make it replace the string between [b][/b] with a bold string, but the actual text stays the same. Example string: [b]This is an example![/b] Desired output: This is an example! I'm using VB.NET and this is what I currently have: Dim reg As New Regex("\[b\](.*?)\[/b\]") Dim str As String = String.Empty For Each m As Match In reg.Matches(MainBox.Text) str = reg.Replace(MainBox.Text, "test") Next Preview.Show() Preview.RichTextBox1.Text = str Preview.Size = New Size(Preview.MaximumSize.Width, Preview.MaximumSize.Height)
You need to set the start of the selection, and set the attributes of the text before inserting it. Preview.RichTextBox1.SelectionStart = Preview.RichTextBox1.Text.Length Preview.RichTextBox1.SelectionFont = New Font("Tahoma", 12, FontStyle.Bold) Preview.RichTextBox1.SelectedText = str
Split a string according to a regexp in VBScript
I would like to split a string into an array according to a regular expression similar to what can be done with preg_split in PHP or VBScript Split function but with a regex in place of delimiter. Using VBScript Regexp object, I can execute a regex but it returns the matches (so I get a collection of my splitters... that's not what I want) Is there a way to do so ? Thank you
If you can reserve a special delimiter string, i.e. a string that you can choose that will never be a part of the real input string (perhaps something like "###"), then you can use regex replacement to replace all matches of your pattern to "###", and then split on "###". Another possibility is to use a capturing group. If your delimiter regex is, say, \d+, then you search for (.*?)\d+, and then extract what the group captured in each match (see before and after on rubular.com).
You can alway use the returned array of matches as input to the split function. You split the original string using the first match - the first part of the string is the first split, then split the remainder of the string (minus the first part and the first match)... continue until done.
I wrote this for my use. Might be what you're looking for. Function RegSplit(szPattern, szStr) Dim oAl, oRe, oMatches Set oRe = New RegExp oRe.Pattern = "^(.*)(" & szPattern & ")(.*)$" oRe.IgnoreCase = True oRe.Global = True Set oAl = CreateObject("System.Collections.ArrayList") Do Set oMatches = oRe.Execute(szStr) If oMatches.Count > 0 Then oAl.Add oMatches(0).SubMatches(2) szStr = oMatches(0).SubMatches(0) Else oAl.Add szStr Exit Do End If Loop oAl.Reverse RegSplit = oAl.ToArray End Function '************************************************************** Dim A A = RegSplit("[,|;|#]", "bob,;joe;tony#bill") WScript.Echo Join(A, vbCrLf) Returns: bob joe tony bill
I think you can achieve this by using Execute to match on the required splitter string, but capturing all the preceding characters (after the previous match) as a group. Here is some code that could do what you want. '// Function splits a string on matches '// against a given string Function SplitText(strInput,sFind) Dim ArrOut() '// Don't do anything if no string to be found If len(sFind) = 0 then redim ArrOut(0) ArrOut(0) = strInput SplitText = ArrOut Exit Function end If '// Define regexp Dim re Set re = New RegExp '// Pattern to be found - i.e. the given '// match or the end of the string, preceded '// by any number of characters re.Pattern="(.*?)(?:" & sFind & "|$)" re.IgnoreCase = True re.Global = True '// find all the matches >> match collection Dim oMatches: Set oMatches = re.Execute( strInput ) '// Prepare to process Dim oMatch Dim ix Dim iMax '// Initialize the output array iMax = oMatches.Count - 1 redim arrOut( iMax) '// Process each match For ix = 0 to iMax '// get the match Set oMatch = oMatches(ix) '// Get the captured string that precedes the match arrOut( ix ) = oMatch.SubMatches(0) Next Set re = nothing '// Check if the last entry was empty - this '// removes one entry if the string ended on a match if arrOut(iMax) = "" then Redim Preserve ArrOut(iMax-1) '// Return the processed output SplitText = arrOut End Function