Grouping lines with a header using regex - regex

I'm trying to write a regex query that groups lines which start with a type of key as a header.
For example the key will be an line containing an 'A' followed by a number, I'm alternating bold lines to indicate a group. So the first 4 lines are one group, the next 2 a group etc. :
dd A3
This line is arbitrary
This line is also arbitrary
1234 Arbitrary
A9
This line is arbitrary
ff A3 d
A5ff
Hi there
Hello

This is what I ended up with that worked: .A[0-9].*\n((?!A[0-9]).|\n)

Related

Regex pattern to match "AA BB CC DD"

I have a hexadecimal string with space separator for each byte.
eg., A1 B2 C3 D4 E5 FF 00 11 22 33 44 ...
I would like to use a regex validator to verify the user input is correct or not?
How could I write the regular expression to achieve this goal?
Something like this:
^[A-F0-9]{2}( [A-F0-9]{2})*$
Explanation:
^ - anchor: string start
[A-F0-9]{2} - two symbols in either 0..9 or A..F range
( [A-F0-9]{2})* - followed by space and two 0..9 or A..F symbols zero or more times
$ - anchor: string end
If you allow a..f as valid hexadecimal symbols
^[A-Fa-f0-9]{2}( [A-Fa-f0-9]{2})*$
I would like to propose a solution based on DRY principle
(Don't Repeat Yourself).
Instead of writing the same pattern (as Dmitry proposed), you can:
Write the pattern for 2 hex digits as a capturing group - ([A-F0-9]{2}).
"Call" it again using (?1).
So the whole pattern can be ^([A-F0-9]{2})( (?1))*$.
There are also other variants of "calling" a capturing group, e.g.
(?-1) - call the preceding group or
(?&name) - call a named group.
For details see https://www.regular-expressions.info/subroutine.html

Add character to a some string in NotePad++

I have a big .txt file, I need to modified this file in NotePad++, find a line start with "1H" and add a number "2" at position 10 for this line. for example
1A 3333333333333
1B 4444444444444
1H 5555555555555
1A 6666666666666
1B 7777777777777
1H 8888888888888
I want the line in 1H to be modified by adding 2 at position 10. How can I do that in NotePad++?
I don't know how to combine the ^(1H) and ^(.{10}) together for the search part.
Find what: ^(1H.{7})(.)
Replace with: \12
This pattern requires a line that starts with 1H and 7 other characters. The parentheses make sure this 9-character string is stored as the first group. Then the next character, which is in the tenth position, is stored as the second group.
The full match is then replaced by group 1 (\1) and the character '2' to get the desired result.
1A 3333333333333
1B 4444444444444
1H 5555552555555
1A 6666666666666
1B 7777777777777
1H 8888882888888

Find repeating gps using regular expression

I work with text files, and I need to be able to see when the gps (last 3 columns of csv) "hangs up" for more than a few lines.
So for example, usually, part of a text file looks like this:
5451,1667,180007,35.7397387,97.8161897,375.8
5448,1053z,180006,35.7397407,97.8161814,375.7
5444,1667,180005,35.7397445,97.8161674,375.6
5439,1668,180004,35.7397483,97.8161526,375.5
5435,1669,180003,35.7397518,97.8161379,375.5
5431,1669,180002,35.7397554,97.8161269,375.6
5426,1054z,180001,35.7397584,97.8161115,375.6
5420,1670,175959,35.7397649,97.8160931,375.9
But sometimes there is an error with the gps and it looks like this:
36859,1598,202603.00,35.8867316,99.2515545,555.700
36859,1598,202608.00,35.8867316,99.2515545,555.700
36859,1142z,202610.00,35.8867316,99.2515545,555.700
36859,1597,202612.00,35.8867316,99.2515545,555.700
36859,1597,202614.00,35.8867316,99.2515545,555.700
36859,1596,202616.00,35.8867316,99.2515545,555.700
36859,1595,202618.00,35.8867316,99.2515545,555.700
I need to be able to figure out a way to search for matching strings of 7 different numbers, (the decimal portion of the gps) but so far I've only been able to figure out how to search for repeating #s or consecutive numbers.
Any ideas?
If you were to find such repetitions in an editor (such as Notepad++), you could use the following regex to find 4 or more repeating lines:
([^,]+(?:,[^,]+){2})\v+(?:(?:[^,]+,){3}\1(?:\v+|$)){3,}
To go a bit into detail
([^,]+(?:,[^,]+){2})\v+ is a group consisting of one or more non-commas followed by comma and another one or more non-commas followed by a vertical space (linebreak), that is not part of the group (e.g. 1,1,1\n)
(?:[^,]+,){3} matches one or more non-commas followed by comma, three times (your columns that don't have to be considered)
\1 is a backreference to group 1, matching if it contains exactly the same as group 1
(?:\v+|$) matches either another vertical whitespaces or the end of the text
{3,} for 3 or more repetitions - increase it if you want more
Here you can see, how it works
However, if you are using any programming language to check this, I wouldn't walk on the path of regex, as checking for those repetitions can be done a lot easier. Here is one example in Python, I hope you can adopt it for your needs:
oldcoords = [0,0,0]
lines = [line.rstrip('\n') for line in open(r'C:\temp\gps.csv')]
for line in lines:
gpscoords = line.split(',')[3:6]
if gpscoords == oldcoords:
repetitions += 1
else:
oldcoords = gpscoords
repetitions = 0
if repetitions == 4: #or however you define more than a few
print(', '.join(gpscoords) + ' is repeated')
If you can use perl, and if I understood you:
perl -ne 'm/^[^,]*,[^,]*,[^,]*,([^,]*,[^,]*,[^,]*$)/g; $current_line=$1; ++$line_number; if ($prev_line==$current_line){$equals++} else {if ($equals>=6){ print "Last three fields in lines ".($line_number-$equals-1)." to ".($line_number-1)." are equals to:\n$prev_line" } ; $equals=0}; $prev_line=$current_line' < onlyreplacethiswithyourfilepath should do the trick.
Sample output:
Last three fields in lines 1 to 7 are equals to:
35.8867316,99.2515545,555.700
Last three fields in lines 16 to 22 are equals to:
37.8782116,99.7825545,572.810
Last three fields in lines 31 to 44 are equals to:
36.6868916,77.2594245,581.358
Last three fields in lines 57 to 63 are equals to:
35.5128764,71.2874545,575.631

Find all substrings with at least one group

I try to find in a string all substring that meet the condition.
Let's say we've got string:
s = 'some text 1a 2a 3 xx sometext 1b yyy some text 2b.'
I need to apply search pattern {(one (group of words), two (another group of words), three (another group of words)), word}. First three positions are optional, but there should be at least one of them. If so, I need a word after them.
Output should be:
2a 1a 3 xx
1b yyy
2b
I wrote this expression:
find_it = re.compile(r"((?P<one>\b1a\s|\b1b\s)|" +
r"(?P<two>\b2a\s|\b2b\s)|" +
r"(?P<three>\b3\s|\b3b\s))+" +
r"(?P<word>\w+)?")
Every group contain set or different words (not 1a, 1b). And I can't mix them into one group. It should be None if group is empty. Obviously the result is wrong.
find_it.findall(s)
> 2a 1a 2a 3 xx
> 1b 1b yyy
I am grateful for your help!
You can use following regex :
>>> reg=re.compile('((?:(?:[12][ab]|3b?)\s?)+(?:\w+|\.))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '2b.']
Here I just concise your regex by using character class and modifier ?.The following regex is contain 2 part :
[12][ab]|3b?
[12][ab] will match 1a,1b,2a,2b and 3b? will match 3b and 3.
And if you don't want the dot at the end of 2b you can use following regex using a positive look ahead that is more general than preceding regex (because making \s optional is not a good idea in first group):
>>> reg=re.compile('((?:(?:[12][ab]|3b?)\s)+\w+|(?:(?:[12][ab]|3b?))+(?=\.|$))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '2b']
Also if your numbers and example substrings are just instances you can use [0-9][a-z] as a general regex :
>>> reg=re.compile('((?:[0-9][a-z]?\s)+\w+|(?:[0-9][a-z]?)+(?=\.|$))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '5h 9 7y examole', '2b']

Matching inner pattern an unlimited amount of times within outer pattern

Say I have the following pattern:
INDICATOR\s+([a-z0-9]+)
which would match for example:
INDICATOR AA or INDICATOR B3
I need to edit this pattern so it matches any instances of a string which starts with INDICATOR has a space and then has multiple matches of the inner pattern e.g.
INDICATOR AA A3 66 B8 34 CD
INDICATOR BG 4D CS
INDICATOR HG
Is it possible to do this?
Solution
With thanks to Gumbo I came up with the following regex which suits my requirements:
INDICATOR((\s+)?([,-])?(\s+)?([a-z0-9]+))+
Try this:
INDICATOR(\s+([a-z0-9]+))+
Here the repeating pattern is wrapped in a group and quantified using + to allow one or more repetitions of the expression inside the group. But you won’t get every match of the inner group with this but only the last match (or to be more specific: it depends on the implementation you’re using).