Increment Regex match using Regex.Replace - regex

I'm creating a program in VB.NET to output multiple images. Some images will have the same file name. If there is multiple files with the same name I want to add "_1_" to the end of the file name. If the "_1_" file already exists I want to increment the 1 to be "_2_". If this file already exists I want to continue incrementing the number ultil it doesn't exist. So for example "filename", filename_1_", "filename_2_", etc. Here is the code that I have tried
Dim usedFiles As New List(Of String)
While usedFiles.Contains(returnValue)
If Regex.IsMatch(returnValue, "[_]([0-9]{1,})[_]$") Then
returnValue = Regex.Replace(returnValue, "[_]([0-9]{1,})[_]$", "_" + (CType("$1", Integer) + 1).ToString() + "_")
Else
returnValue += "_1_"
End If
End While
usedFiles.Add(returnValue)
The line that isn't working is:
returnValue = Regex.Replace(returnValue, "[_]([0-9]{1,})[_]$", "_" + (CType("$1", Integer) + 1).ToString() + "_")
which outputs "filename_2_" every time. I have also tried:
returnValue = Regex.Replace(returnValue, "[_]([0-9]{1,})[_]$", "_($1+1)_")
however this returns "filename_($1+1)_". I know I could just remove the "_" then add 1 to the number then put the "_" back on both sides, but I also know this can be done in other languages (like php) using the Regex.
Any ideas?
Thanks!
Ryan

I haven't taken the time to figure out what's wrong with your RegEx expression because it just seems silly to me. You're over thinking it. All you need to do is something simple like this:
Dim fileName As String = returnValue
Dim i As Integer = 0
While usedFiles.Contains(returnValue)
i = i + 1
returnValue = fileName + "_" + i.ToString() + "_"
End While

Related

Check string has a date in it and extract part of the string

I have thousands of lines of text that I need to work through and the lines I am interested with lines that look like the following:
01/04/2019 09:35:41 - Test user (Additional Comments)
I am currently using this code to filter out all the other rows:
If InStr(FullCell(i), " - ") <> 0 And InStr(FullCell(i), ":") <> 0 And InStr(FullCell(i), "(") <> 0 Then
FullCell is the array that I am working through.
which I know is not the best way to do it. Is there a way to check that there is a date at the beginning of the string in the format dd/mm/yyyy and then extract the user name inbetween the '-' and the '(' symbol.
I had a play with regex to see if that could help but i'm limited in skills to be able to pull off both VBA and regex in the same code.
Whats the best way to do this.
Assuming Fullcell(i) contains the string,
If Left(Fullcell(i), 10) Like "##/##/####"
Will return True if you have a date (note that it will not differentiate between dd/mm/yyyy and mm/dd/yyyy.
And
Mid(Fullcell(i), InStr(Fullcell(i), " - ") + 2, InStr(Fullcell(i), " (") - InStr(Fullcell(i), " - ") - 2)
Will return the username
I'm sure there is a more efficient way to do this, but I've used the following solution quite a few times:
This will select the date:
x = 1
Do While Mid(FullCell,1,x) <> " "
x = x + 1
Loop
strDate = Left(FullCell,x)
This will find the character number of the hyphen, the username starts 2 characters after.
x = 1
Do While Mid(FullCell,x,1) <> "-"
x = x + 1
Loop
Then we will find the end of the username
y = x + 2
Do While Mid(FullCell,y,1) <> " "
y = y + 1
Loop
The username should now be characters (x+2 to y-1)
strUsername = Mid(FullCell, x + 2, y - (x + 2) - 1)
Here's how I would do it
Dim your variables
Dim ring as Range
Dim dat as variant
Dim FullCell() as string
Dim User as string
Dim I as long
Set your range
Set rng = ` any way you choose
Dat = rng.value2
Loop dat
For i = 1 to UBound(dat, 1)
Split the data
FullCell = Trim(Split(FullCell, "-"))
Test if it split
If UBound(FullCell) > 0 Then
Test if it matches
If IsDate(FullCell(0)) Then
i = Instr(FullCell(1), "(")-1)
If i then
User = left$(FullCell(1), i)
' Found a user
End If
End If
End If
Next
Abstraction is your friend, it's always helpful to break these into their own private functions whenever you can. You could put your code in a function and call it something like ExtractUsername.
Below I did an example of this, and I decided to go with the RegExp approach (late binding), but you could use string functions like the examples above as well.
This function returns the username if it finds the pattern you mentioned above, otherwise, it returns an empty string.
Private Function ExtractUsername(ByVal SourceString As String) As String
Dim RegEx As Object
Set RegEx = CreateObject("vbscript.regexp")
'(FIRST GROUP FINDS THE DATE FORMATTED AS DD/MM/YYY, AS WELL AS THE FORWARD SLASH)
'(SECOND GROUP FINDS THE USERNAME) THIS WILL BE SUBMATCH 1
With RegEx
.Pattern = "(^\d{2}\/\d{2}\/\d{4}.*-)(.+)(\()"
.Global = True
End With
Dim Match As Object
Set Match = RegEx.Execute(SourceString)
'ONLY RETURN IF A MATCH WAS FOUND
If Match.Count > 0 Then
ExtractUsername = Trim(Match(0).SubMatches(1))
End If
Set RegEx = Nothing
End Function
The regex pattern is grouped into three parts, the date (and slash), username, and opening parentheses. What you are interested in is the username, which in the SubMatch would be number 1.
Regexr is a helpful site for practicing regular expressions and can show you a bit more of what the pattern I went with is doing.
Please note that using regular expressions might give you performance issues and you should test it against regular string functions to see what works best for your situation.

Obfuscation VBA Excel - Break Line, Var Names

Im trying to build my own Obfuscation add-in for my VBA projects.
I started with the easier tasks:
Remove Blank Lines
Remove Indentts
Remove Comments
I could figure out how to do this things, maybe not in the best way, but im stuck in:
Insert Random Break Lines (" _")
I would like to have this working for diferent types of delimiter, for now im only working with "=" signal. By the way, i have problems when i have multiple delimiters in the line (Eg: If bla = "abc" or ble = "acd"). The code causes incorrects splits in my line.
Sub VBE_Break_The_Lines()
Dim VBC As VBComponent
Dim a, i, j, lCount As Long
Dim str As String
Dim temp As Variant
lCount = 0
i = 1
Dim blnStringMode, blnLineContinue As Boolean
For Each VBC In VBProjToClean.VBComponents
blnStringMode = False
i = 1
With VBC.CodeModule
Do Until i > .CountOfLines
If Not .ProcOfLine(i, vbext_pk_Proc) = "VBE_Remove_Comments" Then
str = .Lines(i, 1)
End If
If InStr(1, str, " = ", vbTextCompare) > 0 Then
temp = Split(str, " = ")
.InsertLines i, ""
.ReplaceLine i, temp(0) & " _"
.InsertLines i + 1, "= " & temp(1)
.DeleteLines i + 2
lCount = lCount + 1
'a = InStr(1, str, "=", vbTextCompare)
i = i + 1
End If
i = i + 1
Loop
End With
Next
MsgBox lCount & " LINES BREAKED ( = )", , strFileToClean
End Sub
My next step will be change procedure/variable names, but not sure if REGEX should be the best way, i just read a lot, but not sure yet.
Hope you guys can give me a way to follow
Why not start with this Open Source code and make a new build from that instead of reinventing the wheel?
You need to change all the file extensions in the code from .xls to .xlsm, save the IB_test.xls workbook as a macro-enabled workbook and save the addin as .xlam, not .xla to make this work, but even though the code is 9 years old, it still works in Excel 2013.
If you are into VBA obfuscation, you may want to try out VBASH (www.ayedeal.com/vbash). It is pretty straight forward and powerful.

Use Regex to update VBA code

I have a VBA source code containing many hard coded references to cells. The code is part of the Worksheet_Change sub, so I guess hard coding the range references was necessary and you will see many assignment statements like the following:
Set cell = Range("B7")
If Not Application.Intersect(cell, Range(Target.Address)) Is Nothing Then
I would like insert 2 additional rows on top of the worksheet, so basically all the row references will shift by 2 rows. So for example the above assignment statement will be changed to Set cell = Range("B9").
Given the large number of hard coded row references in the code, I thought of using Regex to increment all the row references by 2. So I have developed the following code.
Sub UpdateVBACode()
'*********************Read Text File Containing VBA code and assign content to string variable*************************
Dim str As String
Dim strFile As String: strFile = "F:\Preprocessed_code.txt"
Open strFile For Input As #1
str = Input$(LOF(1), 1)
Close #1
'*********************Split string variables to lines******************************************************************
Dim vStr As Variant: vStr = Split(str, vbCrLf)
'*********************Regex work***************************************************************************************
Dim rex As New RegExp
rex.Global = True
Dim i As Long
Dim mtch As Object
rex.Pattern = "(""\w)([0-9][0-9])("")" ' 3 capturing groups to reconstruct the replacement string
For i = 0 To UBound(vStr, 1)
If rex.Test(vStr(i)) Then
For Each mtch In rex.Execute(vStr(i))
vStr(i) = rex.Replace(vStr(i), mtch.SubMatches(0) & IncrementString(mtch.SubMatches(1)) & mtch.SubMatches(2))
Next
End If
Next i
'********************Reconstruct String*********************************************************************************
str = ""
For i = 0 To UBound(vStr, 1)
str = str & vbCrLf & vStr(i)
Next i
'********************Write string to text file******************************************************************************
Dim myFile As String
myFile = "F:\Processed_code.txt"
Open myFile For Output As #2
Print #2, str
Close #2
'
End Sub
Function IncrementString(rowNum As String) As String '
Dim num As Integer
num = CInt(rowNum) + 2
IncrementString = CStr(num)
End Function
The above VBA code works, except it fails if there are two row references in the same line, so for instance if we have If Range("B15").Value <> Range("B12").Value Then, after the line gets processed I get If Range("B14").Value <> Range("B14").Value Theninstead of If Range("B17").Value <> Range("B14").Value Then. The problem is in the vStr(i) = rex.Replace(vStr(i), mtch.SubMatches(0) & IncrementString(mtch.SubMatches(1)) & mtch.SubMatches(2)) statement, because it is getting called more than once if a line has more than Regex match.
Any ideas? Thanks in advance
I think what you are trying to do is a bad idea, for two reasons:
Hard-coded cell references are almost always poor practice. A better solution may be to replace hard-coded cell references with named ranges. You can refer to them in the code by name, and the associated references will update automatically if you insert/delete rows or columns. You have some painful upfront work to do but the result will be a much more maintainable spreadsheet.
You are effectively trying to write a VBA parser using regexes. This is pretty much guaranteed not to work in all cases. Your current regex will match lots of things that aren't cell references (e.g. "123", "_12", and "A00") and will also miss lots of hard-coded cell references (e.g. "A1" and Cell(3,7)). That may not matter for your particular code but the only way to be sure it's worked is to check each reference by hand. Which is IMHO not much less effort than refactoring (e.g. replace with named ranges). In my experience you don't fix a regex, you just make the problems more subtle.
That said, since you asked...
<cthulu>
There are only two choices when using RegExp.Replace() - either replace the first match or replace all matches (corresponding to setting RegExp.Global to False or True respectively). You don't have any finer control than that, so your logic has to change. Instead of using Replace() you could write your own code for the replacements, using the FirstIndex property of the Match object, and VBA's string functions to isolate the relevant parts of the string:
Dim rex As Object
Set rex = CreateObject("VBScript.RegExp")
rex.Global = True
Dim i As Long
Dim mtch As Object
Dim newLineText As String
Dim currMatchIndex As Long, prevPosition As Long
rex.Pattern = "(""\w)([0-9][0-9])("")" ' 3 capturing groups to reconstruct the replacement string
For i = 0 To UBound(vStr, 1)
If rex.Test(vStr(i)) Then
currMatchIndex = 0: prevPosition = 1
newLineText = ""
For Each mtch In rex.Execute(vStr(i))
'Note that VBA string functions are indexed from 1 but Match.FirstIndex starts from 0
currMatchIndex = mtch.FirstIndex
newLineText = newLineText & Mid(vStr(i), prevPosition, currMatchIndex - prevPosition + 1) & _
mtch.SubMatches(0) & IncrementString(mtch.SubMatches(1)) & mtch.SubMatches(2)
prevPosition = currMatchIndex + Len(mtch.Value) + 1
Next
vStr(i) = newLineText & Right(vStr(i), Len(vStr(i)) - prevPosition + 1)
End If
Next i
Note that I still haven't fixed the problems with the regex pattern in the first place. I recommend that you just go and use named ranges instead...
Oops, nearly forgot - </cth

VB.Net help selecting first index of string with regex

I was wondering if there was a way I could start a selection from the Regex string i have in the below example
The below example works exactly how I want it too however if there is text that matches before it on another line it is choosing the wrong text and highlighting it.
What im wondering is if there is a way to get the start index of the regex string?
If Regex.IsMatch(Me.TextBox1.Text, "\b" + Regex.Escape("is") + "\b") Then
Me.TextBox1.SelectionStart = Me.TextBox1.Text.IndexOf("is")
Dim linenumber As Integer = Me.TextBox1.GetLineFromCharIndex(Me.TextBox1.Text.IndexOf("is"))
Me.TextBox1.SelectionLength = Me.TextBox1.Lines(linenumber).Length
Me.TextBox1.Focus()
Me.TextBox1.SelectedText = "is " & Me.TextBox2.Text
The System.Text.RegularExpression.Match object has a property which should help you here: Match.Index. Match.Index will tell you where the capture starts, and Match.Length tells you how long it is. Using those you could change your code to look like this:
If Regex.IsMatch(Me.TextBox1.Text, "\b" + Regex.Escape("is") + "\b") Then
Dim m as Match
m = Regex.Match(Me.TextBox1.Text, "\b" + Regex.Escape("is") + "\b")
Me.TextBox1.SelectionStart = m.Index
Dim linenumber As Integer = Me.TextBox1.GetLineFromCharIndex(m.Index)
Me.TextBox1.SelectionLength = Me.TextBox1.Lines(linenumber).Length
Me.TextBox1.Focus()
Me.TextBox1.SelectedText = "is " & Me.TextBox2.Text

Get/split text inside brackets/parentheses

Just have a list of words, such as:
gram (g)
kilogram (kg)
pound (lb)
just wondering how I would get the words within the brackets for example get the "g" in "gram (g)" and dim it as a new string.
Possibly using regex?
Thanks.
Use split function ..
strArr = str.Split("(") ' splitting 'gram (g)' returns an array ["gram " , "g)"] index 0 and 1
strArr2 = strArr[1].Split(")") ' splitting 'g)' returns an array ["g " ..]
the string is in
strArr2[0]
Edit
you want getAbbrev and getAbbrev2 to be arrays
try
Dim getAbbrev As String() = Str.Split("(")
Dim getAbbrev2 as String() = getAbbrev[1].Split(")")
To do it without declaring arrays you can do
"gram (g)".Split("(")[1].Split(")")[0]
but that's unreadable
Edit
You have some very trivial errors. I would suggest you strengthen your understanding on objects and declarations first. Then you can look into invoking methods. I rather have you understand it than give it to you. Re-read the book you have or look for a basic tutorial.
Dim unit As String = 'make sure this is the actual string you are getting, not sure where you are supposed to get the string value from => ie grams (g)
Dim getAbbrev As String() = unit.Split("(") 'use unit not Str - Str does not exist
Dim getAbbrev2 As String() = getAbbrev[1].Split(")") 'As no as - case sensitive
for the last line reference getAbbrev2 instead of the unknown abbrev2
Fun with Regular Expressions (I'm really not an expert here, but tested and works)
Imports System.Text.RegularExpressions
.....
Dim charsToTrim() As Char = { "("c, ")"c }
Dim test as String = "gram (g)" + Environment.NewLine +
"kilogram (kg)" + Environment.NewLine +
"pound (lb)"
Dim pattern as String = "\([a-zA-Z0-9]*\)"
Dim r As Regex = new Regex(pattern, RegexOptions.IgnoreCase)
Dim m As Match = r.Match(test)
While(m.Success)
System.Diagnostics.Debug.WriteLine("Match" + "=" + m.Value.ToString())
Dim tempText as String = m.Value.ToString().Trim(charsToTrim)
System.Diagnostics.Debug.WriteLine("String Trimmed" + "=" + tempText)
m = m.NextMatch()
End While
You can split at the space and remove the parens from the second token (by replacing them with an empty string).
A regex is also an option, and is very simple, its pattern is
\w+\s+\((\w+)\)
Which means, a word, then at least one space, then opening parens, then in real regex parens you search for a word, and, eventually a closing paren. The inner parentheses are capturing parentheses, which make it possible to refer to the unit g, kg, lb.