.txt filename iteration vb.net - regex

I've got a little problem with regards to iterating the filename of the txt files. I've got a filename format that goes like this: <date>-<year>_filename-<number>.txt. The problem is that when <number> reaches 9, the filename stops iterating.
The filenames goes like this:
31-2014_filename-1
31-2014_filename-2
31-2014_filename-3
31-2014_filename-4
31-2014_filename-5
31-2014_filename-6
31-2014_filename-7
31-2014_filename-8
31-2014_filename-9
31-2014_filename-10
The function only detects up to 9. Anything beyond that number is ignored.
Below is the code
Dim lastreport As Integer = 1
Public Sub GetLastNo(ByVal filePath As String)
Dim lastFile As String = 1
Dim files() As String = Directory.GetFiles(filePath, "*.txt")
For Each File As String In files
File = Path.GetFileNameWithoutExtension(File)
Dim numbers As MatchCollection = Regex.Matches(File, "(?<num>[\d]+)")
For Each number In numbers
number = CInt(number.ToString())
If number > 0 And number < 1000 And number > lastFile Then
lastFile = number
End If
lastreport = number
Next
Next
End Sub

Here it is:
(?<num>\d+(?=$))
This would make sure that the digits are followed by a > and $(End of line). This would make sure that it is the last set of digits.

It would really help to see some real filenames, including some that fail to match (your description is not completely unambiguous: for example what is <date> if it does not include the year?).
But assuming files like:
30May-2014_Stuff-1.txt
30May-2014_Stuff-3.txt
30May-2014_Stuff-5.txt
30May-2014_Stuff-7.txt
30May-2014_Stuff-9.txt
30May-2014_Stuff-11.txt
then using the .NET regex engine (from PowerShell (PSH) here as quicker to test with):
(?<num>\d+)$
should match the final digits ($ matches the end of the string) of the filename without extension: BaseName in PSH):
dir | foreach { if ($_.BaseName -match '(?<num>\d+)$') { $matches['num'] } }
gives:
1
11
3
5
7
9
So all filenames are matched, and the final number of their basenames is matched by group "num" of the regex.
I think there is something else going on in your approach: I would suggest changing to only get a single match per filename (and use Regex.Match rather than Matches to be consistent).

Related

Optional parts of regex pattern in vba

I am trying to build regex pattern for the text like that
numb contra: 1.29151306 number mafo: 66662308
numb contra 1.30789668number mafo 60.046483
numb contra/ 1.29154056 number mafo: 666692638
numb contra 137459625
mafo: 666692638
mafo: 666692638 numb contra/ 1.29154056
Here's the pattern I could build
contra?.\s+?(\d+\.?\d+)(.+mafo.?\s+(\d+\.?\d+))?
It works fine for all the lines except the last one. How can I implement all the possibilities to include the last line too?
Please have a look at this link
https://regex101.com/r/pSThAU/1
All is OK as for contra but not as for mafo
I think the key here is to make your regexp do less and your vba do more. What I think I see here is either the word 'mafo' or 'contra' and a number following. Don't know what order or whether each is present or how many times. So you can scan each of your strings for ALL occurrences with a regexp like this:
(?:^|[^A-Z])(?:(mafo)|(contra))[^A-Z]\s*(\d*\.?\d+)
Then process it with some VBA code like this that I created in Excel:
Sub BreakItUp()
Dim rg As RegExp, scanned As MatchCollection, eachMatch As Match, i As Long, col As Long
Set rg = New RegExp
rg.Pattern = "(?:^|[^A-Z])(?:(mafo)|(contra))[^A-Z]\s*(\d*\.?\d+)"
rg.IgnoreCase = True
rg.Global = True
i = 1
Do While (Not IsEmpty(ActiveSheet.Cells(i, 1).Value))
Set scanned = rg.Execute(ActiveSheet.Cells(i, 1).Value)
col = 2
For Each eachMatch In scanned
ActiveSheet.Cells(i, col).Value = eachMatch.SubMatches(0) & eachMatch.SubMatches(1)
ActiveSheet.Cells(i, col + 1).Value2 = "'" & eachMatch.SubMatches(2)
col = col + 2
Next eachMatch
i = i + 1
Loop
End Sub
That MatchCollection object will get one item for each Match that occurs and the subMatches array contains each capturing group. You should be able write your own logic within this processing loop to interpret what was extracted. When I ran it on your data it created all the fields in blue:
Notice I added a line to your data that had two contra entries and one mafo and it found all the occurrences. You should be able to modify this to interpret the meanings.

Parsing a time string using regex and converting to integer

I want to allow users to input a time in a masked text box and then validate that time and convert it if necessary for later saving.
I've tried a method of validating the time using only regex but honestly could not find very detailed answers. I decided to simply separate the string the user inputs into its base components and then convert the chunks of time into integers for easy comparison.
'''
Public Function CreateTimeString(TheTime As String, TheSuffix As String) As String 'wip
Dim Hour As String = "00"
Dim Minute As String = "00"
Dim inthour As Integer
Dim intminute As Integer
Dim pattern As String = "(?<hour>\d*?):(?<minute>\d*?)"
For Each m As Match In Regex.Matches(TheTime, pattern)
Hour = m.Groups("hour").Value
Minute = m.Groups("minute").Value
Next
inthour = Convert.ToInt32(Hour)
intminute = Convert.ToInt32(Minute)
TxtMeals.Text = Hour & ":" & Minute
End Function
'''
An error occurs when attempting to convert Minute string into an integer. Commenting this out and testing shows the Hour has been successfully converted. It appears that Minute cannot be found.
Example strings:
12:12
1:23
4:55
10:45
Also, if I change pattern by adding a space just before the last quotation mark neither are found and I would like to know why.
The m.Groups("minute").Value will not be found because you are using a non greedy match for \d*? for the minutes part there is no end boundary set like for example $ so it will match at least as possible which will be 0 times.
You could use:
(?<hour>\d+):(?<minute>\d+)
You might use a more precise match, for example for 12 hour time use:
(?<hour>1[0-2]|0?[1-9]):(?<minute>[0-5][0-9])
Or 24h time:
(?<hour>[01]?[0-9]|2[0-3]):(?<minute>[0-5][0-9])
You might opt to use anchors ^ and $ to assert the start and the end of the string.
Regex demo | vb.net demo

Regex Matching and Deleting/Replacing a string

So I am trying to parse through a file which has multiple "footers" (the file is an output that was designed for printing which my company wants to keep electronically stored...each footer is a new page and the new page is no longer needed as).
I am trying to look for and remove lines that look like:
1 of 2122 PRINTED 07/01/2013 04:46 Page : 1 of 11
2 of 2122 PRINTED 07/01/2013 04:46 Page: 2 of 11
3 of 2122 PRINTED 07/01/2013 04:46 Page: 3 of 11
and so on
I then want to replace the final line (which would read something like "2122 of 2122") with a "custom" footer.
I am using RegEx, but am very new to using it so how should my RegEx look in order to accomplish this? I plan on using the RegEx "count" function to find out when I've found the last line and then do a .replace on it.
I am using VB .NET, but can translate C# if required. How can I accomplish what I'm looking to do? Specifically I only care about matching/removing of a match so long as the # of matches > 1.
Here's one I created with RegExr:
/^(\d+\s+of\s+\d+)(?=\s+printed)/gim
It matches (number)(space)('of')(space)(number) at the beginning of a line, and only if it is followed by (space)('printed'), case insensitive. The /m flag turns ^ and $ into line-aware boundaries.
This is how I ended up doing it...
Private Function FixFooters(ByVal fileInput As String, Optional ByVal numberToLeaveAlone As Integer = 1) As String
Dim matchpattern As String = "^\d+\W+of\W+\d+\W+PRINTED.*$"
Dim myRegEx As New Regex(matchpattern, RegexOptions.IgnoreCase Or RegexOptions.Multiline)
Dim replacementstring As String = String.Empty
Dim matchCounter As Integer = myRegEx.Matches(fileInput).Count
If numberToLeaveAlone > matchCounter Then numberToLeaveAlone = matchCounter
Return myRegEx.Replace(fileInput, replacementstring, matchCounter - numberToLeaveAlone, 0)
End Function
I used myregextester.com to get the inital matchpattern. Since I wanted to leave the last footer alone (to manipulate it further later on) I created the numberToLeaveAlone variable to ensure we don't remove ALL of the variables. For the purposes of this program I made the default value 1, but that could be changed to zero (I only did it for readability in the calling code as I know I will ALWAYS want to leave one...but I do like to reuse code). It's fairly fast, I'm sure there are better ways out there, but this one made the most sense to me.

Check if line matches regex

I have a file that has been generated by a server - I have no control over how this file is generated or formatted. I need to check each line begins with a string of set length (in this case 21 numerical chars). If a line doesn't match that condition, I need to join it to the previous line and, after reading and correcting the whole file, save it. I am doing this for a lot of files in a directory.
So far I have:
Dim rgx As New Regex("^[0-9]{21}$")
Dim linesList As New List(Of String)(File.ReadAllLines(finfo.FullName))
If linesList(0).Contains("BlackBerry Messenger") Then
linesList.RemoveAt(0)
For i As Integer = 0 To linesList.Count
If Not rgx.IsMatch(i.ToString) Then
linesList.Concat(linesList(i-1))
End If
Next
End If
File.WriteAllLines(finfo.FullName, linesList.ToArray())[code]
There's a for statement before and after that code block to loop over all files in the source directory, which works fine.
Hope this isn't too bad to read :/
I didn't think your solution was any good, you were failing on concatenating the lines. Here's a different approach:
Dim rgx As New Regex("^[0-9]{21}")
Dim linesList As New List(Of String)(File.ReadAllLines(finfo.FullName))
' We will create a new list to store the new lines data
Dim newLinesList As New List(Of String)()
If linesList(0).Contains("BlackBerry Messenger") Then
Dim i As Integer = 1
Dim newLine As String
While i < linesList.Count
newLine = linesList(i)
i += 1
' Keep going until the "real" line is over
While i < linesList.Count AndAlso Not rgx.IsMatch(linesList(i))
newLine += linesList(i)
i += 1
End While
newLinesList.Add(newLine)
End While
End If
File.WriteAllLines(finfo.FullName, newLinesList.ToArray())

Keypress ISSUE VB.NET

I took many hours trying to solve this problem I have attempted, without success.
All I need is to validate a textbox:
Valid Chains:
10%
0%
1111111.12%
15.2%
10
2.3
Invalid Chains:
.%
12.%
.02%
%
123456789123.123
I need to validate the textbox with these valid chains, supporting the keypress event.
I tryed:
Private Sub prices_KeyPress(ByVal sender As Object, ByVal e As System.Windows.Forms.KeyPressEventArgs) Handles wholeprice_input_new_item.KeyPress, dozenprice_input_new_item.KeyPress, _
detailprice_input_new_item.KeyPress, costprice_input_new_item.KeyPress
Dim TxtB As TextBox = CType(sender, TextBox)
Dim fullText As String = TxtB.Text & e.KeyChar
Dim rex As Regex = New Regex("^[0-9]{1,9}([\.][0-9]{1,2})?[\%]?$ ")
If (Char.IsDigit(e.KeyChar) Or e.KeyChar.ToString() = "." Or e.KeyChar = CChar(ChrW(Keys.Back))) Then
If (fullText.Trim() <> "") Then
If (rex.IsMatch(fullText) = False And e.KeyChar <> CChar(ChrW(Keys.Back))) Then
e.Handled = True
MessageBox.Show("You are Not Allowed To Enter More then 2 Decimal!!")
End If
End If
Else
e.Handled = True
End If
End Sub
NOTE: The regex has to validate (Maximum 2 decimal places, and 9 integers) with an optional percent symbol.
Please help, I feel so frustrated trying to solve the problem without success
I think that you almost had the right answer. When I run your regex against the samples you supplied, they all fail. But if I remove the extra space at the end of the regex I get the expected successes and failures.
So currently your regex looks like this:
Dim rex As Regex = New Regex("^[0-9]{1,9}([\.][0-9]{1,2})?[\%]?$ ")
and it should look like
Dim rex As Regex = New Regex("^[0-9]{1,9}([\.][0-9]{1,2})?[\%]?$")
EDIT:
Ok I understand the issue more. The problem with the regex is that it will only allow a period if it is followed by one or two numbers. That works fine if you are evaluating the textbox value after someone has finished typing. But in your code, you are evaluating for each keypress, so you don't have a chance to type a number after the "."
I can see two possible solutions
Change the regex to allow 1. as a valid entry
Change when you evaluate the regex, perhaps trying to figure out a way to only evaluate the regex when the person has paused typing.
If you went with option 1, then we need to tweak the regex to something like this
"^[0-9]{1,9}((\.)|(\.[0-9]{1,2}(%)?)|(%))?$"
I changed the regex so that it will accept three optional endings to the text string (\.) will allow the string to end in a period , (\.[0-9]{1,2}(%)?) will allow the string to end period followed by one or two numbers and an optional percent sign, and (%) will allow the string to end in a percent sign. I broke the ending into the three options because I didn't want to allow something like 12.% to be valid. Also for this to work you will also need to add the percent sign to your first If statement
If (Char.IsDigit(e.KeyChar) Or e.KeyChar.ToString() = "." Or e.KeyChar.ToString() = "%" Or e.KeyChar = CChar(ChrW(Keys.Back))) Then
so that the regex runs when someone types the percent sign.