TCL: How the regex for every line should look like? - regex

In TCL, in output I have something like this:
ABBAA 1 BAABA 1 DNS3 0 0 200 300 400 500 0 0
ABBAA 1 BAABA 1 DNS1 0 0 200 300 400 500 0 0
ABBAA 1 BAABA 1 DNS7 0 0 200 300 400 500 0 0
ABBAB 1 BAABB 1 DNS5 0 0 200 300 400 500 0 0
ABBAB 1 BAABB 1 DNS3 0 0 200 300 400 500 0 0
I would like to sort this table alike dataset by fourth column ascending (so the first one will be row with DNS1UP1, then DNS2UP2 etc.) I figured out that regexp will be easiest method by looking for string with "DNS.." in it. But my method doesn't work exacly how I thought, because it is matching only one line or no line at all.
My method:
regexp "ABB.*DNS1.*?\N"
ABB - match beginning of new line
.* - every character between ABB and DNS..
DNS1 - match the main looking for word
.* - every character between DNS... and new line symbol
?\n - non-greedy occurence of new line
Where am I wrong?

If you have a list of lines in such a regular format, you can just lsort them… with the right options. In particular, -dictionary is good for mixed text/numbers and -index 4 lets you choose the column to sort by.
set sortedLines [lsort -index 4 -dictionary $unsortedLines]
The only possible reasonable use of regexp in this would have been in preparing the data for the sort, but that string which you provided is already sortable (assuming you've done a split $data "\n" on it to actually convert it into a list of lines and are not just using a big ol' string).

Related

format text using regex

I have the following text
1
0
0
0
0
0
ASET LANCAR
Neraca
1
1
0
0
0
0
KAS DAN SETARA KAS
Neraca
1
1
1
0
0
0
Kas
 
Buku Besar
using regex how can I turn that text into like:
100000,ASET LANCAR,Neraca
110000,KAS DAN SETARA KAS,Neraca
111000,Kas,Buku Besar
in other words I want to turn the original string into comma separated value (CSV). honestly I have no idea about how the regex would look like.
You will need minimum two steps to achieve this.
First replace (?<=\d)\R(?=\d)|(\s){2,} with \1 and you will get following text,
100000
ASET LANCAR
Neraca
110000
KAS DAN SETARA KAS
Neraca
111000
Kas
Buku Besar
Once you have this text, you can use this regex (?<=\w)\R(?=[a-zA-Z]) and replace it with a comma , and you will get your desired following text,
100000,ASET LANCAR,Neraca
110000,KAS DAN SETARA KAS,Neraca
111000,Kas,Buku Besar
Initial text,
After first replace,
After second replace you have your desirable text

Match repeated groups after keyword using regex

VB2010 Using regex I cant seem to get this seemingly easy regex to work. I first look for a line with a keyword TRIPS that has my data and then from that line I want to extract repeated groups of data made up of an alpha code and then a number.
MODES 1 0 0
OVERH X 28 H 0 Z 198
TRIPS X 23 D 1 Z 198
ITEMSQ 1 0 0
COSTU P 16 E 180
CALLS 0 0
I have
^TRIPS (?<grp>[A-Z]\s{1,4}\d{1,3})
Which gives me one match and the first group "X 23". So I extend it by allowing it to match up to 4 groups.
^TRIPS (?<grp>[A-Z]\s{1,4}\d{1,3}){0,4}
but I get one match with still only one group.
You aren't allowing for white space between the groups. You need to do something like this:
^TRIPS ((?<grp>[A-Z]\s{1,4}\d{1,3})\s+){0,4}

Notepad++ search combination in lines

I am looking for a specific combination in a txt file that contains multiple lines (Notepad ++). The structure of a line I am looking for is as follows:
xxxxxx N N -1 -1 -1 N (end line)
So I first have an identifier of 6 or more characters, followed by 6 numbers (N) spaced by a tab. N can be values 1, 0 or -1.
I am looking for those lines that contain '-1' in position 3, 4 and 5. The other positions can take any of the 3 values.
I have searched online and applied searches such as:
\t-?\t-?\t-1\t-1\t-1\t-?
\t?.\t?.\t-1\t-1\t-1\t?.
t?.\t?.\t-1\t-1\t-1\t?.\n
\t-1\t-1\t-1\t?.\n
Yet, the last N in the line is not taken into account, so that if its value is 0 for example, that line will not be selected.
What is the way to write this search? I understand Notepad ++ is written in C++.
Can you try to follow this pattern?:
^([a-zA-Z0-9]{6,})\s*(-1|0|1)\s*(-1|0|1)\s*((-1\s*?){3})\s*(-1|0|1)\s?
https://regex101.com/r/yM5xD3/2
Explanation:
^: Start of the line.
([a-zA-Z0-9]{6,}): Any character six or more times.
\s*: space/tab/newLine zero o more times.
(-1|0|1): One of those numbers.
\s*: ...
(-1|0|1): One of those numbers.
((-1\s*?){3}): -1 one time followed by space/tab/newLine zero or more times. (The '?' means that the regex will try to get the less amount of \s as possible)
\s*: ..
(-1|0|1): ...
And the last \s?: looks for zero or one Space/tab/newLineCharacter
You can try the following regex:
^[a-zA-Z0-9]+\t(-1|0|1)\t(-1|0|1)\t[\-][1]\t[\-][1]\t[\-][1]\t(-1|0|1)$
I tried on the following sample and it worked for me.
xxxxxx 1 1 -1 -1 -1 1
xxxxxx 0 1 -1 -1 -1 0
test12 -1 1 -1 1 -1 0
xxxxxx 1 1 -1 -1 -1 0
test13 0 1 -1 -1 1 -1
Hope it helps.

REGEX and IP Addresses

The Exscript any_match() function uses regex to match patterns in strings and returns the results in a tuple.
I am attempting to match IP addresses in a traceroute output. It works for the most part, but for some reason returns some extra values (in addition to the targeted addresses). I would like some assistance in the correct regex to use that will return only the IP addresses without the extra values.
**Note:**I have googled and searched stackoverflow for regex patterns as well as studied the regex help page. This is the closest regex that's worked so far.
def ios_commands(job, host, conn):
conn.execute('terminal length 0')
conn.execute('tr {}'.format(DesAddr))
print('The results of the traceroute', repr(conn.response))
for hops in any_match(conn,r'(([0-9]{1,3}\.){3}[0-9]{1,3})'):
hop_addresses = list(hops)
OUTPUT
the string being searched
hostname>('The results of the traceroute', "'tr 192.33.12.4\\r\\nType escape sequence to abort.\\r\\nTracing the route to hostname (192.33.12.4)\\r\\nVRF info: (vrf in name/id, vrf out name/id)\\r\\n 1 hostname (192.32.0.174) 0 msec 0 msec 0 msec\\r\\n 2 hostname (192.32.0.190) 0 msec 0 msec 0 msec\\r\\n 3 192.33.226.225 [MPLS: Label 55 Exp 0] 0 msec 4 msec 0 msec\\r\\n 4 192.33.226.237 0 msec 0 msec 0 msec\\r\\n 5 hostname (192.33.12.4) 4 msec * 0 msec\\r\\nhostname>'")
['192.33.12.4', '12.'] #note the extra '12.' value
['192.33.12.4', '12.']
['192.32.0.174', '0.']
['192.32.0.190', '0.']
['192.33.226.225', '226.']
['192.33.226.237', '226.']
['192.33.12.4', '12.']
You have 2 matching groups in your pattern. The first one (and outer one) is for the whole IP address; and the second group is repeated thrice:
([0-9]{1,3}\.){3}
Use non-capturing groups:
((?:[0-9]{1,3}\.){3}[0-9]{1,3})

RegEx: Reject sub-portion of complicated expression

In the sample text below, I want to match groups of text (newlines and all) starting with a line defined by \nI.*' and including the subsequent lines starting with \nA, only if none of the intermediate lines contains "BOM=". I.e. in the example, I would want to match the first "device" and its following attributes, but not the second device, as shown in my comments (after #s).
I 657 device:THAT 2 1290 400 0 1 ' # Start matching here because no lines have "BOM="
A 1335 425 12 0 5 0 some text
A 1335 455 12 0 5 0 some text
A 1300 440 12 0 9 3 some text
A 1370 375 12 0 3 0 some text # Finish matching here
C 655 1 3 0
A 1370 450 12 0 3 3 #=2
C 740 2 4 0
A 1305 450 12 0 9 3 #=1
C 740 2 4 0
A 1305 450 12 0 9 3 #=1
I 318 device:THIS 2 300 1840 0 1 ' # Do not match again here because there's a line with "BOM="
A 320 1880 12 0 7 3 some text
A 320 1880 12 0 9 3 some text
A 380 1880 12 0 1 1 BOM=1,2
A 345 1865 12 0 5 0 some text
A 380 1830 12 0 3 0 some text
C 666 1 3 0
In the sample text, "some text" is various descriptors for electrical devices, e.g. "RATING=63MW", "REFDES=R123". It may contain whitespace but not newlines.
The furthest I've gotten yet is the expression
((\n|^)I((?!misc).)*?'\n)((A.*\n)*(A.*BOM=.*\n)(A.*\n)*)
which matches the opposite of what I want, i.e it finds the text blocks that DO contain BOM=. I thought I could switch this by changing (A.*BOM=.*\n) to (?!(A.*BOM=.*\n)) but this did not work.
I'm hoping to use this in Notepad++ when I'm done.
You can perhaps try this regex:
^I(?:(?!misc).)*'\n(?!(?:A.*\n)*?A.*BOM=)(?:A.*\n)*
regex101 demo
I added a third block where the BOM= is instead on a line starting with C, where the device being matched because BOM= is not on the same line as the consecutive lines beginning with A.
Multiline by default matches on every line on Notepad++, so it's usually not necessary to have (^|\n), but you can revert it if you need it.
I also kept (?:(?!misc).)* in because you had it in your expression, although it doesn't have to do anything with your sample data.
(?!(?:A.*\n)*?A.*BOM=) is what's making the match fail when there's a BOM= in the lines. It's a negative lookahead which will prevent a match only if A.*BOM= matches after any number of lines of (?:A.*\n)*? (i.e. lines beginning with A).