VB.Net help selecting first index of string with regex - regex

I was wondering if there was a way I could start a selection from the Regex string i have in the below example
The below example works exactly how I want it too however if there is text that matches before it on another line it is choosing the wrong text and highlighting it.
What im wondering is if there is a way to get the start index of the regex string?
If Regex.IsMatch(Me.TextBox1.Text, "\b" + Regex.Escape("is") + "\b") Then
Me.TextBox1.SelectionStart = Me.TextBox1.Text.IndexOf("is")
Dim linenumber As Integer = Me.TextBox1.GetLineFromCharIndex(Me.TextBox1.Text.IndexOf("is"))
Me.TextBox1.SelectionLength = Me.TextBox1.Lines(linenumber).Length
Me.TextBox1.Focus()
Me.TextBox1.SelectedText = "is " & Me.TextBox2.Text

The System.Text.RegularExpression.Match object has a property which should help you here: Match.Index. Match.Index will tell you where the capture starts, and Match.Length tells you how long it is. Using those you could change your code to look like this:
If Regex.IsMatch(Me.TextBox1.Text, "\b" + Regex.Escape("is") + "\b") Then
Dim m as Match
m = Regex.Match(Me.TextBox1.Text, "\b" + Regex.Escape("is") + "\b")
Me.TextBox1.SelectionStart = m.Index
Dim linenumber As Integer = Me.TextBox1.GetLineFromCharIndex(m.Index)
Me.TextBox1.SelectionLength = Me.TextBox1.Lines(linenumber).Length
Me.TextBox1.Focus()
Me.TextBox1.SelectedText = "is " & Me.TextBox2.Text

Related

Check string has a date in it and extract part of the string

I have thousands of lines of text that I need to work through and the lines I am interested with lines that look like the following:
01/04/2019 09:35:41 - Test user (Additional Comments)
I am currently using this code to filter out all the other rows:
If InStr(FullCell(i), " - ") <> 0 And InStr(FullCell(i), ":") <> 0 And InStr(FullCell(i), "(") <> 0 Then
FullCell is the array that I am working through.
which I know is not the best way to do it. Is there a way to check that there is a date at the beginning of the string in the format dd/mm/yyyy and then extract the user name inbetween the '-' and the '(' symbol.
I had a play with regex to see if that could help but i'm limited in skills to be able to pull off both VBA and regex in the same code.
Whats the best way to do this.
Assuming Fullcell(i) contains the string,
If Left(Fullcell(i), 10) Like "##/##/####"
Will return True if you have a date (note that it will not differentiate between dd/mm/yyyy and mm/dd/yyyy.
And
Mid(Fullcell(i), InStr(Fullcell(i), " - ") + 2, InStr(Fullcell(i), " (") - InStr(Fullcell(i), " - ") - 2)
Will return the username
I'm sure there is a more efficient way to do this, but I've used the following solution quite a few times:
This will select the date:
x = 1
Do While Mid(FullCell,1,x) <> " "
x = x + 1
Loop
strDate = Left(FullCell,x)
This will find the character number of the hyphen, the username starts 2 characters after.
x = 1
Do While Mid(FullCell,x,1) <> "-"
x = x + 1
Loop
Then we will find the end of the username
y = x + 2
Do While Mid(FullCell,y,1) <> " "
y = y + 1
Loop
The username should now be characters (x+2 to y-1)
strUsername = Mid(FullCell, x + 2, y - (x + 2) - 1)
Here's how I would do it
Dim your variables
Dim ring as Range
Dim dat as variant
Dim FullCell() as string
Dim User as string
Dim I as long
Set your range
Set rng = ` any way you choose
Dat = rng.value2
Loop dat
For i = 1 to UBound(dat, 1)
Split the data
FullCell = Trim(Split(FullCell, "-"))
Test if it split
If UBound(FullCell) > 0 Then
Test if it matches
If IsDate(FullCell(0)) Then
i = Instr(FullCell(1), "(")-1)
If i then
User = left$(FullCell(1), i)
' Found a user
End If
End If
End If
Next
Abstraction is your friend, it's always helpful to break these into their own private functions whenever you can. You could put your code in a function and call it something like ExtractUsername.
Below I did an example of this, and I decided to go with the RegExp approach (late binding), but you could use string functions like the examples above as well.
This function returns the username if it finds the pattern you mentioned above, otherwise, it returns an empty string.
Private Function ExtractUsername(ByVal SourceString As String) As String
Dim RegEx As Object
Set RegEx = CreateObject("vbscript.regexp")
'(FIRST GROUP FINDS THE DATE FORMATTED AS DD/MM/YYY, AS WELL AS THE FORWARD SLASH)
'(SECOND GROUP FINDS THE USERNAME) THIS WILL BE SUBMATCH 1
With RegEx
.Pattern = "(^\d{2}\/\d{2}\/\d{4}.*-)(.+)(\()"
.Global = True
End With
Dim Match As Object
Set Match = RegEx.Execute(SourceString)
'ONLY RETURN IF A MATCH WAS FOUND
If Match.Count > 0 Then
ExtractUsername = Trim(Match(0).SubMatches(1))
End If
Set RegEx = Nothing
End Function
The regex pattern is grouped into three parts, the date (and slash), username, and opening parentheses. What you are interested in is the username, which in the SubMatch would be number 1.
Regexr is a helpful site for practicing regular expressions and can show you a bit more of what the pattern I went with is doing.
Please note that using regular expressions might give you performance issues and you should test it against regular string functions to see what works best for your situation.

Regex - Quantifier {x,y} following nothing

I'm creating a basic text editor and I'm using regex to achieve a find and replace function. To do this I've gotten this code:
Private Function GetRegExpression() As Regex
Dim result As Regex
Dim regExString As [String]
' Get what the user entered
If TabControl1.SelectedIndex = 0 Then
regExString = txtbx_Find2.Text
ElseIf TabControl1.SelectedIndex = 1 Then
regExString = txtbx_Find.Text
End If
If chkMatchCase.Checked Then
result = New Regex(regExString)
Else
result = New Regex(regExString, RegexOptions.IgnoreCase)
End If
Return result
End Function
And this is the Find method
Private Sub FindText()
''
Dim WpfTest1 As New Spellpad.Tb
Dim ElementHost1 As System.Windows.Forms.Integration.ElementHost = frm_Menu.Controls("ElementHost1")
Dim TheTextBox As System.Windows.Controls.TextBox = CType(ElementHost1.Child, Tb).ctrl_TextBox
''
' Is this the first time find is called?
' Then make instances of RegEx and Match
If isFirstFind Then
regex = GetRegExpression()
match = regex.Match(TheTextBox.Text)
isFirstFind = False
Else
' match.NextMatch() is also ok, except in Replace
' In replace as text is changing, it is necessary to
' find again
'match = match.NextMatch();
match = regex.Match(TheTextBox.Text, match.Index + 1)
End If
' found a match?
If match.Success Then
' then select it
Dim row As Integer = TheTextBox.GetLineIndexFromCharacterIndex(TheTextBox.CaretIndex)
MoveCaretToLine(TheTextBox, row + 1)
TheTextBox.SelectionStart = match.Index
TheTextBox.SelectionLength = match.Length
Else
If TabControl1.SelectedIndex = 0 Then
MessageBox.Show([String].Format("Cannot find ""{0}"" ", txtbx_Find2.Text), Application.ProductName, MessageBoxButtons.OK, MessageBoxIcon.Information)
ElseIf TabControl1.SelectedIndex = 1 Then
MessageBox.Show([String].Format("Cannot find ""{0}"" ", txtbx_Find.Text), Application.ProductName, MessageBoxButtons.OK, MessageBoxIcon.Information)
End If
isFirstFind = True
End If
End Sub
When I run the program I get errors:
For ?, parsing "?" - Quantifier {x,y} following nothing.; and
For *, parsing "*" - Quantifier {x,y} following nothing.
It's as if I can't use these but I really need to. How can I solve this problem?
? and * are quantifiers in regular expressions:
? is used to specify that something is optional, for instance b?au can match both bau and au.
* means the group with which it binds can be repeated zero, one or multiple times: for instance ba*u can bath bu, bau, baau, baaaaaaaau,...
Now most regular expressions use {l,u} as a third pattern with l the lower bound on the number of times something is repeated, and u the upper bound on the number of occurences. So ? is replaced by {0,1} and * by {0,}.
Now if you provide them without any character before them, evidently, the regex parser doesn't know what you mean. In other words if you do (used csharp, but the ideas are generally applicable):
$ csharp
Mono C# Shell, type "help;" for help
Enter statements below.
csharp> Regex r = new Regex("fo*bar");
csharp> r.Replace("Fooobar fooobar fbar fobar","<MATCH>");
"Fooobar <MATCH> <MATCH> <MATCH>"
csharp> r.Replace("fooobar far qux fooobar quux fbar echo fobar","<MATCH>");
"<MATCH> far qux <MATCH> quux <MATCH> echo <MATCH>"
If you wish to do a "raw text find and replace", you should use string.Replace.
EDIT:
Another way to process them is by escaping special regex characters. Ironically enough, you can do this by replacing them by a regex ;).
Private Function GetRegExpression() As Regex
Dim result As Regex
Dim regExString As [String]
' Get what the user entered
If TabControl1.SelectedIndex = 0 Then
regExString = txtbx_Find2.Text
ElseIf TabControl1.SelectedIndex = 1 Then
regExString = txtbx_Find.Text
End If
'Added code
Dim baseRegex As Regex = new Regex("[\\.$^{\[(|)*+?]")
regExString = baseRegex.Replace(regExString,"\$0")
'End added code
If chkMatchCase.Checked Then
result = New Regex(regExString)
Else
result = New Regex(regExString, RegexOptions.IgnoreCase)
End If
Return result
End Function

regex not matching correctly

First of all, I would like an opinion if using regex is even the best solution here, I'm fairly new to this area and regex is the first thing I found and it seemed somewhat easy to use, until I need to grab a long section of text out of a line lol. I'm using a vb.net environment for regex.
Basically, I'm taking this line here:
21:24:55 "READ/WRITE: ['PASS',false,'27880739',[40,[459.313,2434.11,0.00221252]],[["ItemFlashlight","ItemWatch","ItemMap","ItemKnife","ItemEtool","ItemGPS","ItemHatchet","ItemCompass","ItemMatchbox","M9SD","ItemFlashlightRed","NVGoggles","Binocular_Vector","ItemToolbox","M4A1_AIM_SD_camo"],["ItemPainkiller","ItemMorphine","ItemSodaPepsi","FoodSteakCooked",["30Rnd_556x45_StanagSD",29],"30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD",["15Rnd_9x19_M9SD",12],["15Rnd_9x19_M9SD",10],"15Rnd_9x19_M9SD","15Rnd_9x19_M9SD","ItemBandage"]],["DZ_Backpack_EP1",[["BAF_AS50_TWS"],[1]],[["FoodSteakCooked","ItemPainkiller","ItemMorphine","ItemSodaCoke","5Rnd_127x99_as50","ItemBloodbag"],[2,1,1,2,4,1]]],[316,517,517],Sniper1_DZ,0.94]"
Using the following regex:
\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\],
To try and get the following:
[["ItemFlashlight","ItemWatch","ItemMap","ItemKnife","ItemEtool","ItemGPS","ItemHatchet","ItemCompass","ItemMatchbox","M9SD","ItemFlashlightRed","NVGoggles","Binocular_Vector","ItemToolbox","M4A1_AIM_SD_camo"],["ItemPainkiller","ItemMorphine","ItemSodaPepsi","FoodSteakCooked",["30Rnd_556x45_StanagSD",29],"30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD",["15Rnd_9x19_M9SD",12],["15Rnd_9x19_M9SD",10],"15Rnd_9x19_M9SD","15Rnd_9x19_M9SD","ItemBandage"]]
However either my regex is flawed, or my vb.net code is. It only displays the following data:
[["ItemFlashlight","ItemWatch","ItemMap","ItemKnife","ItemEtool","ItemGPS","ItemHatchet","ItemCompass","ItemMatchbox","M9SD","ItemFlashlightRed","NVGoggles","Binocular_Vector","ItemToolbox","M4A1_AIM_SD_camo"],["ItemPainkiller","ItemMorphine","ItemSodaPepsi",
My vb.net code in case you need to peek at it is:
ListView1.Clear()
Call initList(Me.ListView1)
My.Computer.FileSystem.CurrentDirectory = My.Settings.cfgPath
My.Computer.FileSystem.CopyFile("arma2oaserver.RPT", "tempRPT.txt")
Dim ScriptLine As String = ""
Dim path As String = My.Computer.FileSystem.CurrentDirectory & "\tempRPT.txt"
Dim lines As String() = IO.File.ReadAllLines(path, System.Text.Encoding.Default)
Dim que = New Queue(Of String)(lines)
ProgressBar1.Maximum = lines.Count + 1
ProgressBar1.Value = 0
Do While que.Count > 0
ScriptLine = que.Dequeue()
ScriptLine = LCase(ScriptLine)
If InStr(ScriptLine, "login attempt:") Then
Dim rtime As Match = Regex.Match(ScriptLine, ("(\d{1,2}:\d{2}:\d{2})"))
Dim nam As Match = Regex.Match(ScriptLine, "\""([^)]*)\""")
Dim name As String = nam.ToString.Replace("""", "")
Dim next_line As String = que.Peek 'Read next line temporarily 'This is where it would move to next line temporarily to read from it
next_line = LCase(next_line)
If InStr(next_line, "read/write:") > 0 Then 'Or InStr(next_line, "update: [b") > 0 Then 'And InStr(next_line, "setmarkerposlocal.sqf") < 1 Then
Dim coords As Match = Regex.Match(next_line, "\[(\d+)\,\[(-?\d+)\.\d+\,(-?\d+)\.\d+,([\d|.|-]+)\]\]")
Dim inv As Match = Regex.Match(next_line, "\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\],") '\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\],
'\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\]:\[([\w|_|\""|,|\[|\]]*)\]\:
Dim back As Match = Regex.Match(next_line, "\""([\w|_]+)\"",\[\[([\w|_|\""|,]*)\],\[([\d|,]*)\]\],\[\[([\w|_|\""|,]*)\],\[([\d|,]*)\]\]")
Dim held As Match = Regex.Match(next_line, "\[\""([\w|_|\""|,]+)\""\,\d+\]")
With Me.ListView1
.Items.Add(name.ToString)
With .Items(.Items.Count - 1).SubItems
.Add(rtime.ToString)
.Add(coords.ToString)
.Add(inv.ToString)
.Add(back.ToString)
.Add(held.ToString)
End With
End With
End If
End If
ProgressBar1.Value += 1
Loop
My.Computer.FileSystem.DeleteFile("tempRPT.txt")
ProgressBar1.Value = 0
The odd thing is, when I test my regex in Expresso it gets the full, correct match. So I don't know what I'm doing wrong.
I'm not sure what's wrong with the regex you have, but the first match off of this one seems to work fine:
\[\[.*?\]\]
Hope this helps.
-EDIT-
The problem isn't the regex, it's that ListView is truncating the display of the string. See here
Try this regular expression instead: \Q[[\E(?:(?!\Q[[\E).)+]]
http://regex101.com/r/zP1aC5
If you need a backref, use \Q[[\E((?:(?!\Q[[\E).)+)]]
Perhaps you should specify whether you are working with single line or multi line input text. Depending on your input text format, try with:
Dim variableName as Match = Regex.Match("input", "pattern", RegexOptions.SingleLine);
or
Dim variableName as Match = Regex.Match("input", "pattern", RegexOptions.Multiline);

Increment Regex match using Regex.Replace

I'm creating a program in VB.NET to output multiple images. Some images will have the same file name. If there is multiple files with the same name I want to add "_1_" to the end of the file name. If the "_1_" file already exists I want to increment the 1 to be "_2_". If this file already exists I want to continue incrementing the number ultil it doesn't exist. So for example "filename", filename_1_", "filename_2_", etc. Here is the code that I have tried
Dim usedFiles As New List(Of String)
While usedFiles.Contains(returnValue)
If Regex.IsMatch(returnValue, "[_]([0-9]{1,})[_]$") Then
returnValue = Regex.Replace(returnValue, "[_]([0-9]{1,})[_]$", "_" + (CType("$1", Integer) + 1).ToString() + "_")
Else
returnValue += "_1_"
End If
End While
usedFiles.Add(returnValue)
The line that isn't working is:
returnValue = Regex.Replace(returnValue, "[_]([0-9]{1,})[_]$", "_" + (CType("$1", Integer) + 1).ToString() + "_")
which outputs "filename_2_" every time. I have also tried:
returnValue = Regex.Replace(returnValue, "[_]([0-9]{1,})[_]$", "_($1+1)_")
however this returns "filename_($1+1)_". I know I could just remove the "_" then add 1 to the number then put the "_" back on both sides, but I also know this can be done in other languages (like php) using the Regex.
Any ideas?
Thanks!
Ryan
I haven't taken the time to figure out what's wrong with your RegEx expression because it just seems silly to me. You're over thinking it. All you need to do is something simple like this:
Dim fileName As String = returnValue
Dim i As Integer = 0
While usedFiles.Contains(returnValue)
i = i + 1
returnValue = fileName + "_" + i.ToString() + "_"
End While

Get/split text inside brackets/parentheses

Just have a list of words, such as:
gram (g)
kilogram (kg)
pound (lb)
just wondering how I would get the words within the brackets for example get the "g" in "gram (g)" and dim it as a new string.
Possibly using regex?
Thanks.
Use split function ..
strArr = str.Split("(") ' splitting 'gram (g)' returns an array ["gram " , "g)"] index 0 and 1
strArr2 = strArr[1].Split(")") ' splitting 'g)' returns an array ["g " ..]
the string is in
strArr2[0]
Edit
you want getAbbrev and getAbbrev2 to be arrays
try
Dim getAbbrev As String() = Str.Split("(")
Dim getAbbrev2 as String() = getAbbrev[1].Split(")")
To do it without declaring arrays you can do
"gram (g)".Split("(")[1].Split(")")[0]
but that's unreadable
Edit
You have some very trivial errors. I would suggest you strengthen your understanding on objects and declarations first. Then you can look into invoking methods. I rather have you understand it than give it to you. Re-read the book you have or look for a basic tutorial.
Dim unit As String = 'make sure this is the actual string you are getting, not sure where you are supposed to get the string value from => ie grams (g)
Dim getAbbrev As String() = unit.Split("(") 'use unit not Str - Str does not exist
Dim getAbbrev2 As String() = getAbbrev[1].Split(")") 'As no as - case sensitive
for the last line reference getAbbrev2 instead of the unknown abbrev2
Fun with Regular Expressions (I'm really not an expert here, but tested and works)
Imports System.Text.RegularExpressions
.....
Dim charsToTrim() As Char = { "("c, ")"c }
Dim test as String = "gram (g)" + Environment.NewLine +
"kilogram (kg)" + Environment.NewLine +
"pound (lb)"
Dim pattern as String = "\([a-zA-Z0-9]*\)"
Dim r As Regex = new Regex(pattern, RegexOptions.IgnoreCase)
Dim m As Match = r.Match(test)
While(m.Success)
System.Diagnostics.Debug.WriteLine("Match" + "=" + m.Value.ToString())
Dim tempText as String = m.Value.ToString().Trim(charsToTrim)
System.Diagnostics.Debug.WriteLine("String Trimmed" + "=" + tempText)
m = m.NextMatch()
End While
You can split at the space and remove the parens from the second token (by replacing them with an empty string).
A regex is also an option, and is very simple, its pattern is
\w+\s+\((\w+)\)
Which means, a word, then at least one space, then opening parens, then in real regex parens you search for a word, and, eventually a closing paren. The inner parentheses are capturing parentheses, which make it possible to refer to the unit g, kg, lb.