Index out of bounds error using Regex Split - regex

Posting another question here since last time I did the people who answered were extremely helpful. Bear in mind, I'm relatively new to VB.net.
So I'm working on a program that pulls the first and third columns out of a text file using Regex.Split to eliminate the multiple spaces between the alphanumeric characters in the file.
A high level example of what the text file looks like is here:
VARIABLE1 MEAS1 STORAGE1
VARIABLE2 MEAS2 STORAGE2
VARIABLE3 MEAS3 STORAGE3
VARIABLE4 MEAS4 STORAGE4
VARIABLE5 MEAS5 STORAGE5
VARIABLE6 MEAS6 STORAGE6
#VARIABLE7 MEAS7 STORAGE7
VARIABLE8 MEAS8 STORAGE8
VARIABLE9 MEAS9 STORAGE9
VARIABLE10 MEAS10 STORAGE10
VARIABLE11 MEAS11 STORAGE11
VARIABLE12 MEAS12 STORAGE12
VARIABLE13 MEAS13 STORAGE13
VARIABLE14 MEAS14 STORAGE14
The file uses the "#" to denote comments in the file, so in my code I tell the System.IO to ignore that character.
However, when creating a test function to try this, I continuously get an Index out of bounds error, (only on some files. Some in this format work fine, for some reason)
When looking through the execution output, I am receiving the error after it writes the "STORAGE6" line, so there has to be an error traversing from STORAGE6 to VARIABLE7, and I can't quite figure it out. Any insight on this would be extremely appreciated!
The test function I have written is below:
Public Function Testing()
OpenFileDialog1.ShowDialog()
Dim file = System.IO.File.ReadAllLines(OpenFileDialog1.FileName)
For Each line In file
Dim arrWords() As String = System.Text.RegularExpressions.Regex.Split(line, "\s+")
Dim upBound = arrWords.GetUpperBound(0)
If upBound <> 0 Then
If line.Contains("#") Or line.Length = 0 Then
Else
Console.WriteLine(arrWords(0) + " " + arrWords(2))
End If
End If
Next
End Function
I get the out of bounds error when calling "arrWords(2)," which I'm sure was pretty obvious, but just trying to make the question as detailed as possible.

The simple fix is changing these two lines:
If upBound <> 0 Then
If line.Contains("#") Or line.Length = 0 Then
like this:
If upBound > 0 Then
If line.TrimStart().StartsWith("#") OrElse String.IsNullOrWhitespace(line) Then
But I'd really do something more like this:
Public Class DataItem
Public Property Variable As String
Public Property Measure As String
Public Property Storage As String
End Class
Public Function ReadDataFile(fileName As String) As IEnumerable(Of DataItem)
Return File.ReadLines(fileName).
Where(Function(line) Not line.TrimStart().StartsWith("#") AndAlso Not String.IsNullorWhitespace(line)).
Select(Function(line) System.Text.RegularExpressions.Regex.Split(line, "\s+")).
Where(Function(fields) fields.Length = 3).
Select(Function(fields)
Return New DataItem With {
.Variable = fields(0),
.Measure = fields(1),
.Storage = fields(2)}
End Function)
End Function
Public Function Testing()
If OpenFileDialog1.ShowDialog() = DialogResult.OK Then
Dim records = ReadDataFile(OpenFileDialog1.FileName)
For Each record in records
Console.WriteLine($"{record.Variable} {record.Storage}")
Next
End If
End Function

Related

If contains and if does not Contain a colon, select the number

I have an IF statement that return the number, if there is a colon symbol in the string. Sometimes the string does not contain a colon symbol. I'm looking for an else statement that would select the only number "45061 if there is no colon in the string. A = Works when the string has a colon sign but I need some assistance with B, if the string does not have a colon.
A.
String/Text = OM_Account_Master_Slave~Account CP~3712011:Shared-001
B.
String/Text = OM_Account_Master_Slave~Account CP~45061Shared-001
A.
if(contains,":",Substring(Abbrev(),1,Subtract(Length(Abbrev()),11)))
Result = 3712011:Shared-001
B.
if(contains,":",Substring(Abbrev(),1,Subtract(Length(Abbrev()),11)))
else
Consider the following User Defined Function:
Public Function GetNumber(r As Range) As Variant
Dim v As String, capture As Boolean
Dim i As Long, t As String
v = r.Value
GetNumber = ""
If v = "" Then Exit Function
t = ""
capture = False
For i = 1 To Len(v)
m = Mid(v, i, 1)
If IsNumeric(m) Then
t = t & m
capture = True
Else
If capture Then Exit For
End If
Next i
If Len(t) > 0 Then
GetNumber = CLng(t)
End If
End Function
User Defined Functions (UDFs) are very easy to install and use:
ALT-F11 brings up the VBE window
ALT-I
ALT-M opens a fresh module
paste the stuff in and close the VBE window
If you save the workbook, the UDF will be saved with it.
If you are using a version of Excel later then 2003, you must save
the file as .xlsm rather than .xlsx
To remove the UDF:
bring up the VBE window as above
clear the code out
close the VBE window
To use the UDF from Excel:
=GetNumber(A1)
To learn more about macros in general, see:
http://www.mvps.org/dmcritchie/excel/getstarted.htm
and
http://msdn.microsoft.com/en-us/library/ee814735(v=office.14).aspx
and for specifics on UDFs, see:
http://www.cpearson.com/excel/WritingFunctionsInVBA.aspx
Macros must be enabled for this to work!

Use Regex to update VBA code

I have a VBA source code containing many hard coded references to cells. The code is part of the Worksheet_Change sub, so I guess hard coding the range references was necessary and you will see many assignment statements like the following:
Set cell = Range("B7")
If Not Application.Intersect(cell, Range(Target.Address)) Is Nothing Then
I would like insert 2 additional rows on top of the worksheet, so basically all the row references will shift by 2 rows. So for example the above assignment statement will be changed to Set cell = Range("B9").
Given the large number of hard coded row references in the code, I thought of using Regex to increment all the row references by 2. So I have developed the following code.
Sub UpdateVBACode()
'*********************Read Text File Containing VBA code and assign content to string variable*************************
Dim str As String
Dim strFile As String: strFile = "F:\Preprocessed_code.txt"
Open strFile For Input As #1
str = Input$(LOF(1), 1)
Close #1
'*********************Split string variables to lines******************************************************************
Dim vStr As Variant: vStr = Split(str, vbCrLf)
'*********************Regex work***************************************************************************************
Dim rex As New RegExp
rex.Global = True
Dim i As Long
Dim mtch As Object
rex.Pattern = "(""\w)([0-9][0-9])("")" ' 3 capturing groups to reconstruct the replacement string
For i = 0 To UBound(vStr, 1)
If rex.Test(vStr(i)) Then
For Each mtch In rex.Execute(vStr(i))
vStr(i) = rex.Replace(vStr(i), mtch.SubMatches(0) & IncrementString(mtch.SubMatches(1)) & mtch.SubMatches(2))
Next
End If
Next i
'********************Reconstruct String*********************************************************************************
str = ""
For i = 0 To UBound(vStr, 1)
str = str & vbCrLf & vStr(i)
Next i
'********************Write string to text file******************************************************************************
Dim myFile As String
myFile = "F:\Processed_code.txt"
Open myFile For Output As #2
Print #2, str
Close #2
'
End Sub
Function IncrementString(rowNum As String) As String '
Dim num As Integer
num = CInt(rowNum) + 2
IncrementString = CStr(num)
End Function
The above VBA code works, except it fails if there are two row references in the same line, so for instance if we have If Range("B15").Value <> Range("B12").Value Then, after the line gets processed I get If Range("B14").Value <> Range("B14").Value Theninstead of If Range("B17").Value <> Range("B14").Value Then. The problem is in the vStr(i) = rex.Replace(vStr(i), mtch.SubMatches(0) & IncrementString(mtch.SubMatches(1)) & mtch.SubMatches(2)) statement, because it is getting called more than once if a line has more than Regex match.
Any ideas? Thanks in advance
I think what you are trying to do is a bad idea, for two reasons:
Hard-coded cell references are almost always poor practice. A better solution may be to replace hard-coded cell references with named ranges. You can refer to them in the code by name, and the associated references will update automatically if you insert/delete rows or columns. You have some painful upfront work to do but the result will be a much more maintainable spreadsheet.
You are effectively trying to write a VBA parser using regexes. This is pretty much guaranteed not to work in all cases. Your current regex will match lots of things that aren't cell references (e.g. "123", "_12", and "A00") and will also miss lots of hard-coded cell references (e.g. "A1" and Cell(3,7)). That may not matter for your particular code but the only way to be sure it's worked is to check each reference by hand. Which is IMHO not much less effort than refactoring (e.g. replace with named ranges). In my experience you don't fix a regex, you just make the problems more subtle.
That said, since you asked...
<cthulu>
There are only two choices when using RegExp.Replace() - either replace the first match or replace all matches (corresponding to setting RegExp.Global to False or True respectively). You don't have any finer control than that, so your logic has to change. Instead of using Replace() you could write your own code for the replacements, using the FirstIndex property of the Match object, and VBA's string functions to isolate the relevant parts of the string:
Dim rex As Object
Set rex = CreateObject("VBScript.RegExp")
rex.Global = True
Dim i As Long
Dim mtch As Object
Dim newLineText As String
Dim currMatchIndex As Long, prevPosition As Long
rex.Pattern = "(""\w)([0-9][0-9])("")" ' 3 capturing groups to reconstruct the replacement string
For i = 0 To UBound(vStr, 1)
If rex.Test(vStr(i)) Then
currMatchIndex = 0: prevPosition = 1
newLineText = ""
For Each mtch In rex.Execute(vStr(i))
'Note that VBA string functions are indexed from 1 but Match.FirstIndex starts from 0
currMatchIndex = mtch.FirstIndex
newLineText = newLineText & Mid(vStr(i), prevPosition, currMatchIndex - prevPosition + 1) & _
mtch.SubMatches(0) & IncrementString(mtch.SubMatches(1)) & mtch.SubMatches(2)
prevPosition = currMatchIndex + Len(mtch.Value) + 1
Next
vStr(i) = newLineText & Right(vStr(i), Len(vStr(i)) - prevPosition + 1)
End If
Next i
Note that I still haven't fixed the problems with the regex pattern in the first place. I recommend that you just go and use named ranges instead...
Oops, nearly forgot - </cth

Excel VBA Macro: Using regex to return adjacent row data

Alright; so here's the whole thing I'm suppose to do.
Input a number that corresponds with a number in Data Worksheet Column A and return the adjacent row data.
I want it to return the adjacent cells; example. If it finds 052035 in cell A5378, I Want it to return the data or cell numbers B5378, C5378
EDIT: I've deleted my code; since it didn't really follow with a good way to do it.
Worksheet Structure for Data:
A 1-7800ish[6 Digit number 1-9]
B 1-7800ish Area Codes
C 1-7800ish City/States
The data by the way; is a relatively large set that I got from a query on a SQL-Server. The string number that I'm looking for should have no duplicates based on my original query. [I grouped by before copying it over]
If ya'll have resources for a quick introduction to VB from a programming perspective that'll be helpful. I can program in C/C++ but the syntax in VB is a little weird to me.
If your end goal is to simply find the exact match in column A, and return the values in corresponding row, columns B & C, Regular Expressions is the wrong tool for the job. Use built in functions like Match.
I still don't understand the point of this exercise, as, the data is already arranged in columns A, B and C., you could simply use AutoFilter... This subroutine simply tells you that the value is found (and returns the corresponding data) or not found.
I have tested this (made a small change in dimensioning vals variable)
Sub Foo()
Dim valToLookFor As String
Dim rngToLookAt As Range
Dim foundRow As Long
Dim vals() As Variant
valToLookFor = "052035"
Set rngToLookAt = Range("A:A")
If Not IsError(Application.Match(valToLookFor, rngToLookAt, False)) Then
foundRow = Application.Match(valToLookFor, rngToLookAt, False)
ReDim vals(1)
vals(0) = rngToLookAt.Cells(foundRow).Offset(0, 1).Value
vals(1) = rngToLookAt.Cells(foundRow).Offset(0, 2).Value
'Alternatively, to return the cell address:
'vals(0) = rngToLookAt.Cells(foundRow).Offset(0,1).Address
'vals(1) = rngToLookAt.Cells(foundRow).Offset(0,2).Address
MsgBox Join(vals, ",")
Else:
Erase vals
MsgBox valToLookFor & " not found!", vbInformation
End If
End Sub
Here is proof that it works:

Pretty String Manipulation

I have the following string which I wish to extract parts from:
<FONT COLOR="GREEN">201 KAR 2:340.</FONT>
In this particular case, I wish to extract the numbers 201,2, and 340, which I will later use to concatenate to form another string:
http://www.lrc.state.ky.us/kar/201/002/340reg.htm
I have a solution, but it is not easily readable, and it seems rather clunky. It involves using the mid function. Here it is:
intTitle = CInt(Mid(strFontTag,
InStr(strFontTag, ">") + 1,
(InStr(strFontTag, "KAR") - InStr(strFontTag, ">"))
- 3))
I would like to know if perhaps there is a better way to approach this task. I realize I could make some descriptive variable names, like intPosOfEndOfOpeningFontTag to describe what the first InStr function does, but it still feels clunky to me.
Should I be using some sort of split function, or regex, or some more elegant way that I have not come across yet? I have been manipulating strings in this fashion for years, and I just feel there must be a better way. Thanks.
<FONT[^>]*>[^\d]*(\d+)[^\d]*(\d+):(\d+)[^\d]*</FONT>
The class
Imports System
Imports System.IO
Imports System.Text
Imports System.Text.RegularExpressions
Imports System.Xml
Imports System.Xml.Linq
Imports System.Linq
Public Class clsTester
'methods
Public Sub New()
End Sub
Public Function GetTitleUsingRegEx(ByVal fpath$) As XElement
'use this function if your input string is not a well-formed
Dim result As New XElement(<result/>)
Try
Dim q = Regex.Matches(File.ReadAllText(fpath), Me.titPattern1, RegexOptions.None)
For Each mt As Match In q
Dim t As New XElement(<title/>)
t.Add(New XAttribute("name", mt.Groups("name").Value))
t.Add(New XAttribute("num1", mt.Groups("id_1").Value))
t.Add(New XAttribute("num2", mt.Groups("id_2").Value))
t.Add(New XAttribute("num3", mt.Groups("id_3").Value))
t.Add(mt.Value)
result.Add(t)
Next mt
Return result
Catch ex As Exception
result.Add(<error><%= ex.ToString %></error>)
Return result
End Try
End Function
Public Function GetTitleUsingXDocument(ByVal fpath$) As XElement
'use this function if your input string is well-formed
Dim result As New XElement(<result/>)
Try
Dim q = XElement.Load(fpath).Descendants().Where(Function(c) Regex.IsMatch(c.Name.LocalName, "(?is)^font$")).Where(Function(c) Regex.IsMatch(c.Value, Me.titPattern2, RegexOptions.None))
For Each nd As XElement In q
Dim s = Regex.Match(nd.Value, Me.titPattern2, RegexOptions.None)
Dim t As New XElement(<title/>)
t.Add(New XAttribute("name", s.Groups("name").Value))
t.Add(New XAttribute("num1", s.Groups("id_1").Value))
t.Add(New XAttribute("num2", s.Groups("id_2").Value))
t.Add(New XAttribute("num3", s.Groups("id_3").Value))
t.Add(nd.Value)
result.Add(t)
Next nd
Return result
Catch ex As Exception
result.Add(<error><%= ex.ToString %></error>)
Return result
End Try
End Function
'fields
Private titPattern1$ = "(?is)(?<=<font[^<>]*>)(?<id_1>\d+)\s+(?<name>[a-z]+)\s+(?<id_2>\d+):(?<id_3>\d+)(?=\.?</font>)"
Private titPattern2$ = "(?is)^(?<id_1>\d+)\s+(?<name>[a-z]+)\s+(?<id_2>\d+):(?<id_3>\d+)\.?$"
End Class
The usage
Sub Main()
Dim y = New clsTester().GetTitleUsingRegEx("C:\test.htm")
If y.<error>.Count = 0 Then
Console.WriteLine(String.Format("Result from GetTitleUsingRegEx:{0}{1}", vbCrLf, y.ToString))
Else
Console.WriteLine(y...<error>.First().Value)
End If
Console.WriteLine("")
Dim z = New clsTester().GetTitleUsingXDocument("C:\test.htm")
If z.<error>.Count = 0 Then
Console.WriteLine(String.Format("Result from GetTitleUsingXDocument:{0}{1}", vbCrLf, z.ToString))
Else
Console.WriteLine(z...<error>.First().Value)
End If
Console.ReadLine()
End Sub
Hope this helps.
regex pattern: <FONT[^>]*>.*?(\d+).*?(\d+).*?(\d+).*?<\/FONT>
I think #Jean-François Corbett has it right.
Hide it away in a function and never look back
Change your code to this:
intTitle = GetCodesFromColorTag("<FONT COLOR="GREEN">201 KAR 2:340.</FONT>")
Create a new function:
Public Function GetCodesFromColorTag(FontTag as String) as Integer
Return CInt(Mid(FontTag, InStr(FontTag, ">") + 1,
(InStr(FontTag, "KAR") - InStr(FontTag, ">"))
- 3))
End Function

VB.Net Matching and replacing the contents of multiple overlapping sets of brackets in a string

I am using vb.net to parse my own basic scripting language, sample below. I am a bit stuck trying to deal with the 2 separate types of nested brackets.
Assuming name = Sam
Assuming timeFormat = hh:mm:ss
Assuming time() is a function that takes a format string but
has a default value and returns a string.
Hello [[name]], the time is [[time(hh:mm:ss)]].
Result: Hello Sam, the time is 19:54:32.
The full time is [[time()]].
Result: The full time is 05/06/2011 19:54:32.
The time in the format of your choice is [[time([[timeFormat]])]].
Result: The time in the format of your choice is 19:54:32.
I could in theory change the syntax of the script completely but I would rather not. It is designed like this to enable strings without quotes because it will be included in an XML file and quotes in that context were getting messy and very prone to errors and readability issues. If this fails I could redesign using something other than quotes to mark out strings but I would rather use this method.
Preferably, unless there is some other way I am not aware of, I would like to do this using regex. I am aware that the standard regex is not really capable of this but I believe this is possible using MatchEvaluators in vb.net and some form of recursion based replacing. However I have not been able to get my head around it for the last day or so, possibly because it is hugely difficult, possibly because I am ill, or possibly because I am plain thick.
I do have the following regex for parts of it.
Detecting the parentheses: (\w*?)\((.*?)\)(?=[^\(+\)]*(\(|$))
Detecting the square brackets: \[\[(.*?)\]\](?=[^\[+\]]*(\[\[|$))
I would really appreciate some help with this as it is holding the rest of my project back at the moment. And sorry if I have babbled on too much or not put enough detail, this is my first question on here.
Here's a little sample which might help you iterate through several matches/groups/captures. I realize that I am posting C# code, but it would be easy for you to convert that into VB.Net
//these two may be passed in as parameters:
string tosearch;//the string you are searching through
string regex;//your pattern to match
//...
Match m;
CaptureCollection cc;
GroupCollection gc;
Regex r = new Regex(regex, RegexOptions.IgnoreCase);
m = r.Match(tosearch);
gc = m.Groups;
Debug.WriteLine("Number of groups found = " + gc.Count.ToString());
// Loop through each group.
for (int i = 0; i < gc.Count; i++)
{
cc = gc[i].Captures;
counter = cc.Count;
int grpnum = i + 1;
Debug.WriteLine("Scanning group: " + grpnum.ToString() );
// Print number of captures in this group.
Debug.WriteLine(" Captures count = " + counter.ToString());
if (cc.Count >= 1)
{
foreach (Capture cap in cc)
{
Debug.WriteLine(string.format(" Capture found: {0}", cap.ToString()));
}
}
}
Here is a slightly simplified version of the code I wrote for this. Thanks for the help everyone and sorry I forgot to post this before. If you have any questions or anything feel free to ask.
Function processString(ByVal scriptString As String)
' Functions
Dim pattern As String = "\[\[((\w+?)\((.*?)\))(?=[^\(+\)]*(\(|$))\]\]"
scriptString = Regex.Replace(scriptString, pattern, New MatchEvaluator(Function(match) processFunction(match)))
' Variables
pattern = "\[\[([A-Za-z0-9+_]+)\]\]"
scriptString = Regex.Replace(scriptString, pattern, New MatchEvaluator(Function(match) processVariable(match)))
Return scriptString
End Function
Function processFunction(ByVal match As Match)
Dim nameString As String = match.Groups(2).Value
Dim paramString As String = match.Groups(3).Value
paramString = processString(paramString)
Select Case nameString
Case "time"
Return getLocalValueTime(paramString)
Case "math"
Return getLocalValueMath(paramString)
End Select
Return ""
End Function
Function processVariable(ByVal match As Match)
Try
Return moduleDictionary("properties")("vars")(match.Groups(1).Value)
Catch ex As Exception
End Try
End Function