Regex expression matching block of lines - regex

I have this kind of file:
Analysis of its root cause:
Blablablablabla
blabablabkjhjk
kjbsqbdqbds
Details of the fix
blablabla
Analysis of its root cause:
fddsfsdfsdfdsfs
blnskdbbqbbb
xxxxggggggg
Details of the fix
blablabla
Analysis of its root cause is repeated x times in the file. I would like to get the block of text delimited by "Analysis of its root cause" and "Details of the fix".
Thanks a lot for your help.

I'm pretty sure there is some better way to do this, but that's what I could manage:
/(?(?<=Analysis of its root cause:\n)((.*\n)*)(?=Details of the fix\n))/gU
I'm using positive lookahead and lookbehind, and the following modifiers:
g - global - Don't return after first match
u - Ungreedy - Make quantifiers lazy
Try it online: https://regex101.com/r/xpz7pg/2

Not a regex answer, but using perl
Put your lines into a single file.
perl -e '$/="Analysis of its root cause:"; #Sets the record delimiter
while(<>){ #Iterates over the file, record by record
chomp; #Removes the delimiter
if ($_ =~ /\n(.*?)\nDetails of the fix\n(.*)\n/s){ #Matches strings between Details of the fix. . is allowed to match newline
print "ONE:$1TWO:$2"} # $1 is the analysis, $2 is the details
}'
file.txt
Output
ONE:Blablablablabla
blabablabkjhjk
kjbsqbdqbds
TWO:blablabla
ONE:fddsfsdfsdfdsfs
blnskdbbqbbb
xxxxggggggg
TWO:blablabla

Related

Regex does not match in Perl, while it does in other programs

I have the following string:
load Add 20 percent
to accommodate
I want to get to:
load Add 20 percent to accommodate
With, e.g., regex in sublime, this is easily done by:
Regex:
([a-z])\n\s([a-z])
Replace:
$1 $2
However, in Perl, if I input this command, (adapted to test if I can match the pattern in any case):
perl -pi.orig -e 's/[a-z]\n.+to/TEST/g' file
It doesn't match anything.
Does anyone know why Perl would be different in this case, and what the correct formulation of the Perl command should be?
By default, Perl -p flag read input lines one by one. You can't thus expect your regex to match anything after \n.
Instead, you want to read the whole input at once. You can do this by using the flag -0777 (this is documented in perlrun):
perl -0777 -pi.orig -e 's/([a-z])\n\s(to)/$1 $2/' file
Just trying to help and reminding below your initial proposal for perl regex:
perl -pi.orig -e 's/[a-z]\n.+to/TEST/g' file
Note that in perl regex, [a-z] will match only one character, NOT including any whitespace. Then as a start please include a repetition specifier and include capability to also 'eat' whitespaces. Also to keep the recognized (but 'eaten') 'to' in the replacement, you must put it again in the replacement string, like finally in the below example perl program:
$str = "load Add 20 percent
to accommodate";
print "before:\n$str\n";
$str =~ s/([ a-z]+)\n\s*to/\1 to/;
print "after:\n$str\n";
This program produces the below input:
before:
load Add 20 percent
to accommodate
after:
load Add 20 percent to accommodate
Then it looks like that if I understood well what you want to do, your regexp should better look like:
s/([ a-z]+)\n\s*to/\1 to/ (please note the leading whitespace before 'a-z').

perl multiline issue: need one liner to print last match before string in file

I have a log file like this:
2018-07-10 10:03:01: random text1
2018-07-10 10:03:02: random text2
2018-07-10 10:03:03: random text3
more text
and more
THIS IS MATCHED STRING
2018-07-10 10:03:04: random text4
I want to use a perl one-liner to find the most recent timestamp before "THIS IS MATCHED STRING".
I tried this:
perl -0777 -nle 'print "$1\n" while m/(\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d).+?THIS IS MATCHED STRING/sg'
But it matched the first timestamp, "2018-07-10 10:03:01" instead of the "2018-07-10 10:03:03" that I wanted. Obviously (at least I think), I don't have a good understanding of how the greedy/lazy matching is working.
Any help would be appreciated!
For a fairly elementary approach, which avoids an involved regex, process line by line and when a timestamp pattern is matched record it. Then as you run into pattern THIS... you will have had the (last) previous timestamp.
perl -wnE'
$ts = $1 if /(\d{4}-\d{2}-\d{2}[ ]\d{2}:\d{2}:\d{2})/;
say $ts // "no previous time stamp" if /THIS IS MATCHED STRING/;
' file.txt
If the timestamp is captured and saved with ($ts) = /.../ then failed matches on lines without it turn it undef, so it may not be there when THIS is found. Thus it is saved from $1 only once there is a match.
The defined-or (//) on $ts is used in case the file had no time stamps at all before THIS
You could use
^
(\d{4}-\d{2}-\d{2}\ \d+:\d+:\d+):
(?:(?!^\d{4})[\s\S])+?
\QTHIS IS MATCHED STRING\E
See a demo on regex101.com.

perl regex - anchors and pattern matching

I coded perl regex to extract the words after a certain anchor,
it seems like its not working. What am I doing wrong.
This is my actual output, I need to extract every number after groups keyword
$id cuser301 uid=2301(cuser301) gid=32(rpc) groups=32(rpc),1001(cgrp1),1002(cgrp2),1003(cgrp3),1004(cgrp4),1005(cgrp5),1006(cgrp6),1007(cgrp7),1008(cgrp8),1009(cgrp9),1010(cgrp10),1011(cgrp11),1012(cgrp12),1013(cgrp13),1014(cgrp14),1015(cgrp15),1016(cgrp16),1017(cgrp17),1018(cgrp18),1019(cgrp19),1020(cgrp20),1021(cgrp21),1022(cgrp22),1023(cgrp23),1024(cgrp24),1025(cgrp25),1026(cgrp26),1027(cgrp27),1028(cgrp28),1029(cgrp29),1030(cgrp30),1031(cgrp31),1032(cgrp32)
From the above, I run the id command and then would like to capture the numbers after groups Please help.
I am using the following.
my $check_groups = execute("\id $user"); #---> (execute is to run commands on the linux client, please ignore it)
my $new_groups = ('/^groups/',$check_groups); # ---> Now $new_groups should have all numbers after groups.
my $input = '$id cuser301 uid=2301(cuser301) gid=32(rpc) groups=32(rpc),1001(cgrp1),1002(cgrp2),1003(cgrp3),1004(cgrp4),1005(cgrp5),1006(cgrp6),1007(cgrp7),1008(cgrp8),1009(cgrp9),1010(cgrp10),1011(cgrp11),1012(cgrp12),1013(cgrp13),1014(cgrp14),1015(cgrp15),1016(cgrp16),1017(cgrp17),1018(cgrp18),1019(cgrp19),1020(cgrp20),1021(cgrp21),1022(cgrp22),1023(cgrp23),1024(cgrp24),1025(cgrp25),1026(cgrp26),1027(cgrp27),1028(cgrp28),1029(cgrp29),1030(cgrp30),1031(cgrp31),1032(cgrp32)';
print join ',', $input =~ /(?:.*groups=|\G.*?)\b([0-9]+)/g;
This is a common pattern; in more complicated cases where you want to ensure the \G branch only applies after the first non-zero-length match, you can use \G(?!\A) instead of just \G.
Try doing this :
$ echo <INPUT> | perl -ne 'print "$1," while /,(\d+)\(/g'
Check https://regex101.com/r/uZ9tO6/1

Perl MultiLine Regex Statement

I'm having some trouble getting a multiline regex statement to work for me.
Basically, I'm trying to remove empty lines following a ) ->. Matching the multiline section has been a bit tricky. Here's what I have so far:
perl -00 -p -i -e 's/\) ->(?=[^\n]*\n\n)$/\) ->\n/m' $filename
Here's my input/output:
Input:
setUp =
config: (cb) ->
randomFunction (cb)->
cb?()
nestedObject:
key: (cb) ->
cb?()
Output:
setUp =
config: (cb) ->
cb?()
nestedObject:
key: (cb) ->
cb?()
Just do line by line processing with a flip-flop range, removing all blank lines as the end condition:
perl -i -pe '/\) ->\s*$/...!s/^\s*$//' file.txt
Perhaps a little easier to read:
perl -i -pe 'm{\) ->\s*$}...!s/^\s*$//' file.txt
Move the newline characters out of the look-ahead. Try
s/\) ->(?=[^\n]*)\n\n/\) ->\n/mg;
Characters in the look-ahead are not replaced in a substitution.
(Actually, I don't see why you even need a look-ahead.
s/\) ->.*\n\n/\) ->\n/mg;
also does the job, and any non-zero length sequence that matched the look-ahead would also make the whole pattern match fail.)
You also may want to use the /g flag, since you want to do this substitution more than once in the document.
You can use this replacement:
s/\) ->\R\K\R+//g
\R is a shortcut for an atomic group that contains several common types of newlines
\K removes all on the left from match result

Perl - Unexptected behaviour with Regex in Array

I'm trying to match lines that have
"/foldera/folderb/folderc/folderd/file.ext##/main" + "/" + ANY_NUMBER:
so for example:
(.+)(main)(.\d)
The lines:
/foldera/folderb/folderc/folderd/file.ext##/main
/foldera/folderb/folderc/folderd/file.ext##/main/0
/foldera/folderb/folderc/folderd/file.ext##/main/1
/foldera/folderb/folderc/folderd/file.ext##/main/2
/foldera/folderb/folderc/folderd/file.ext##/main/3
/foldera/folderb/folderc/folderd/file.ext##/main/4
/foldera/folderb/folderc/folderd/file.ext##/main/5 (RLT-abcde, BLD-abcde, DEV-abcde)
/foldera/folderb/folderc/folderx/file12.ext##/main/0
/foldera/folderb/folderc/folderx/file12.ext##/main/1
/foldera/folderb/folderc/folderx/file12.ext##/main/2
/foldera/folderb/folderc/folderx/file12.ext##/main/3
/foldera/folderb/folderc/folderx/file12.ext##/main/4
/foldera/folderb/folderc/folderx/file12.ext##/main/5
/foldera/folderb/folderc/folderx/file12.ext##/main/6 (RLS-abcde-5.0, RLS-abcde-4.1)
While my regex matches the desired lines (I checked it at http://www.regexe.com/), in my Perl program it does not match
/foldera/folderb/folderc/folderd/file.ext##/main
but it does match:
/foldera/folderb/folderc/folderd/file.ext##/main/5 (RLT-abcde, BLD-abcde, DEV-abcde)
Here is the code:
use warnings;
use strict;
my #file_list = `find /folder -type f -name '*.ext'|xargs cleartool lsvtree -all`;
foreach my $file(#file_list){
if ($file=~m/(.+)(main)(.\d)/g){
print $file;
}
}
I'm pretty sure that I'm making a stupid mistake somewhere, but I just can't see it!
Thank you in advance for your advice.
P.S. I tried it under Perl 5.8 an Perl 5.18 with the same results, OS is Solaris.
Change
print $file;
to:
print "$MATCH\n";
so you only print the part of the line that was matched by the regexp.
You should also change \d to \d+, to allow for numbers with more than one digit.
Just after a quick look
/foldera/folderb/folderc/folderd/file.ext##/main
Has no number at the end. And \d requires the number ;-)
You may also find this site usefull: http://regexpal.com/
I think is not matching your line because your regex is explicitly looking for a digit at the end
Try changing your regex to be: (Note the curley brackets at the end)
(.+)(main)(.\d){0,1}
Or personally I would write it like this:
(.*?)main(\/\d*){0,1}
Hope this helps!