RegEx optional group matching - regex

I'm trying to figure out how to create a regex that would encompass both of the following lines:
02-09-16 08:57PM 24768 Invoice - Copy.docx
05-14-16 08:49PM <DIR> Bin
Both are the result of a directory listing. The first being a file which contains the file size. The second is a directory with no size but contains the type <Dir>.
This allows me to capture all of the data into named groups but the first line's size is capture into the Type field:
(?<Date>\S+)\s+(?<Time>\S+)\s+(?<Type>\S+)\s+(?<Name>.+)
If possible, I'd like to end of with both a Type and Size. I'm not sure how to look for both of these at the same time but ignore one or the other if one is found.
Update : Based on Wiktor's response I've update the Regex and gotten closer :
(?<Date>\S+)\s+(?<Time>\S+)\s+(?:(?<Type>\S+)|\d+)\s+(?<Name>.+)
Using this I can easily parse both lines. However first line 24768 end's up in the Type group. Is it possible to have both a Type and an additional Size group? Logic being something like If you run into characters ('<Dir>') for example, that is the Type; if you run into numbers (24768) that is the Size

Just group the type and size captures into a a non-capturing or-group:
^(?<Date>\S+)\s+(?<Time>\S+)\s+(?:(?<Size>\d+)|(?<Type>\S+))\s+(?<Name>.+)$
The size field will pick up the digits, else you get a type.

Related

Can regex be used to find this pattern?

I need to parse a large amount of data in a log file, ideally I can do this by splitting the file into a list where each entry in the list is an individual entry in the log.
Every time a log entry is made it is prefixed with a string following this pattern:
"4404: 21:42:07.433 - After this point there could be anything (including new line characters and such). However, as soon as the prefix repeats that indicates a new log entry."
4404 Can be any number, but is always then followed by a :.
21:42:07.433 is the 21 hours 42 mins 7 seconds 433 milliseconds.
I don't know much about regex, but is it possible to identify this pattern using it?
I figured something like this would work...
"*: [0-24]:[0:60]:[0:60].[0-1000] - *"
However, it just throws an exception and I fear I'm not on the right track at all.
List<string> split_content = Regex.Matches(file_content, #"*: [0-24]:[0:60]:[0:60].[0-1000] - *").Cast<Match>().Select(m => m.Value).ToList();
The following expression would split a string according to your pattern:
\d+: \d{2}:\d{2}:\d{2}\.\d{3}
Add a ^ in the beginning if your delimiting string always starts a line (and use the m flag for regex). Capturing the log chunks with a regex would be more elaborate, I'd suggest just splitting (with Regex.Split) if you have your log content in the memory all at once.

Getting Beyond Compare to Match Similar Lines Properly

I am using Beyond Compare 4.1.6 to diff text configuration files. There is one configuration parameter per line, and each line is formatted as follows:
:=
I would like to configure Beyond Compare such that it will align only lines when the : portion of the line is exactly the same in both files. Put differently, everything from the beginning of the line up to and including the colon must match exactly for the two lines to be aligned. Note that a colon cannot occur in , so the colon I want Beyond Compare to base its alignment decision on will always be the first colon in the line.
An example is:
# FILE 1
abcdefgh:string=5
# FILE 2
abcdefkh:string=5
Beyond Compare aligns these two lines even though I don't want it to.
I've been unable to coerce Beyond Compare to compare lines as desired by editing its grammar rules or by tweaking other features.
How may I get Beyond Compare to match lines as described above?
Thank you!
You can compare it with a table compare.
Then you must set the = as field separator:
When you did this, you have two columns and the first is the key columns (if not, you can define it).
After this you get the result you want (if I understood your question right):
If you need it often, you may store the setting in a file format.

How to format a WinMerge fllter to ignore part of the line

I would like WinMerge to compare the full text but exclude a variable substring.
Orientation="West" PhysicalAddress="2395226" DefFieldFrmt="Uf4d0" UnitCustomText="sec"
Orientation="West" PhysicalAddress="2395230" DefFieldFrmt="Uf4d1" UnitCustomText="sec"
In the lines above I want to ignore the PhysicalAddress="xxx" and locate the changed DefFieldFrmt="Uf4d1"
I have tried adding the filter:
PhysicalAddress=".*"
However this filters the complete line.
The actual text before and after the PhysicalAddress="xxx" will vary so I need a filter that says: match prefix and match suffix but ignore target variable substring.
Help please.
According to the documentation, is not possible to use the line filters for this:
When a rule matches any part of the line, the entire difference is ignored. Therefore, you cannot filter just part of a line.
However, since WinMerge's source code is on GitHub, it is possible to add a feature request for this to its list of issues.

Finding matches after a specific line in Perl/Notepad++

My problem is that I have a document that is split into sections, each section is noted by a single line header - [Header1], [Header2], etc. - and contains various types of data sets separated into individual lines, where each line is begun by a label indicating what type of data follows, like this:
[Header1]
data_label_type1 = 1,2,3
data_label_type2 = 1,2,3,4
data_label_type1 = 1,2,3,4,5
data_label_type3 = 1,2
Note the headers/sections are out of order, so Header1 doesn't always start a document and Header2 won't always follow.
A bit off topic, but the data sets are results from an experiment I'm mainting for a thesis.
I want to be able to capture type 1 data found only in the first section (under Header1) using a single regex function. After capturing it I was going to use replace and another function to convert the captured data to a different form.
Initially I was using the regex type1\h*=\h*([[:graph:]]*) but this only goes line by line, and I've got hundreds of documents - potentially tens of thousands of individal lines to catch.
I can use regex to convert my data well enough, but my problem lies in that I have no idea how capture type 1 data from Header1 exclusively. Any help, tips or pointers to start some experimenting would be really appreciated!
Regex apparently not capable of providing a solution, will use alternatives such as a parser instead.

TFileListBox Mask Issue

The definition for TFileListBox.Mask is:
Set Mask to a regular expression to limit the list box to files that
match the mask. The value of the mask is a file name that may include
wildcards. The asterisk (*) is a wildcard which matches any number of
arbitrary characters. The question mark (?) is a wildcard which
matches a single arbitrary character. The file mask *.* displays all
files, which is the default value.
The * wildcard works fine. However the ? wildcard does not seem to work. I am trying to filter data files that have 14 digits. Examples would be:
012345678909090.dat
012345678900123.dat
012345678901234.dat
012345678901235.dat
012345678901236.dat
If you were to set the mask to *.23?.dat the last four data files are returned. However, the second data file (012345678900123.dat) should not be returned if the ? wildcard is doing its job.
By the way, this "problem" occurs in Raize Components TRzFileListBox and I imagine all others that derive from TFileListBox, too.
Any help with this?
Thanks in advance.