Printing All Regex Matches

Printing All Regex Matches - regex

I'm using the perl/sed commands below to capture and print regex matches, unfortunately, both only print the first match in a line, rather than all matches. How can I modify either or both commands to print all matches? Grep and Awk alternative commands are welcome.
perl -nle 'print "$1" if /.*([0|1]\.[0-9]{0,2}).*/'
sed -rne "s/.*([0|1]\.[0-9]{0,2})/\1/p"

Just use while with the /g modifier to the regex instead of an if. Also need to get rid of your needless use of .* around the regex.
perl -nle 'print $1 while /([0|1]\.[0-9]{0,2})/g'
Finally, [0|1] should probably just be reduced to [01], unless you want to match a | before the period.

perl -nle 'print for /([0|1]\.[0-9]{0,2})/g'

Related

Perl oneliner in bash: print matches from complex regexp

I have this complex regex
/"_outV":([0-9]+),"_inV":([0-9]+),"_label":"([a-z\/]+)",/
and I need to parse a file (which is all on one single line) and output only the matched groups like
print $1 $2 $3
Currently the only almost working onliner is
perl -pe 'while(m/"_outV":([0-9]+)\,"_inV":([0-9]+)\,"_label":"([a-z\/]+)\"\,/g){print "$1 $2 $3\n";}'
But it ends up echoing also the entire file at the end, after the matches.
How do I fix this?
I though that removing the -p option would make the trick, but it doesn't.

Looks good to me.
You need to replace the -p with -n and here is why.
A few finer points:
No need to backslash those , and ".
You can conveniently replace[0-9] with \d.
By using a different delimiter for the regex you won't need to escape the /.
End result optimized
perl -ne 'print "$1 $2 $3\n" while m{"_outV":(\d+),"_inV":(\d+),"_label":"([a-z/]+)",}g'

sed regex with alternative on Solaris doesn't work

Currently I'm trying to use sed with regex on Solaris but it doesn't work.
I need to show only lines matching to my regex.
sed -n -E '/^[a-zA-Z0-9]*$|^a_[a-zA-Z0-9]*$/p'
input file:
grtad
a_pitr
_aupa
a__as
baman
12353
ai345
ki_ag
-MXx2
!!!23
+_)#*
I want to show only lines matching to above regex:
grtad
a_pitr
baman
12353
ai345
Is there another way to use alternative? Is it possible in perl?
Thanks for any solutions.

With Perl
perl -ne 'print if /^(a_)?[a-zA-Z0-9]*$/' input.txt
The (a_)? matches a_ one-or-zero times, so optionally. It may or may not be there.
The (a_) also captures the match, what is not needed. So you can use (?:a_)? instead. The ?: makes () only group what is inside (so ? applies to the whole thing), but not remember it.

with grep
$ grep -xiE '(a_)?[a-z0-9]*' ip.txt
grtad
a_pitr
baman
12353
ai345
-x match whole line
-i ignore case
-E extended regex, if not available, use grep -xi '\(a_\)\?[a-z0-9]*'
(a_)? zero or one time match a_
[a-z0-9]* zero or more alphabets or numbers
With sed
sed -nE '/^(a_)?[a-zA-Z0-9]*$/p' ip.txt
or, with GNU sed
sed -nE '/^(a_)?[a-z0-9]*$/Ip' ip.txt

Perl one-liner to match all occurrences of regex

For multiple lines of text similar to this:
"views_panes","gw_hero_small_site_placement-panel_pane_1",1,"a:0:{}","a:10:{s:14:\"override_title\";i:1;s:19:\"override_title_text\";s:0:\"\";s:9:\"view_mode\";s:11:\"all_purpose\";s:11:\"image_style\";s:7:\"default\";s:13:\"style_options\";a:2:{s:10:\"show_image\";i:0;s:9:\"show_date\";i:0;}s:18:\"gw_display_options\";s:22:\"gw_all_purpose_sidebar\";s:13:\"show_readmore\";a:1:{s:18:\"show_readmore_link\";i:0;}s:14:\"readmore_title\";s:9:\"Read more\";s:13:\"readmore_link\";s:0:\"\";s:7:\"exposed\";a:1:{s:23:\"field_hero_sub_type_tid\";s:3:\"547\";}}","a:0:{}","a:1:{s:8:\"settings\";N;}","a:0:{}","a:0:{}",0,"s:0:\"\";"
I am looking to match all instances of (s:)(\d{1,}:)\"(string)\"; to get something like this:
s:14:override_title
s:18:show_readmore_link
s:3:547
This line with or without /g prints only the first instances:
perl -nle 'print "$1 $2 $3" if /(s:)(\d{1,}:)\\"(.*?)\\";/g' tmp.txt
s:14:override_title
I suppose I can try to put this in a perl script putting all matches into an array, but am hoping to do this using a one-liner (-: What am I missing?
Mac OS X 10.7.5, perl 5.12.3.

It's seem you have only line, so have a try with:
perl -nle 'print "$1 $2 $3" while(/.*?(s:)(\d{1,}:)\\"(.*?)\\";/g)' tmp.txt

Perl regex: remove everything (including line breaks) until a match is found

Apologies for the simple question. I don't clean text or use regex often.
I have a large number of text files in which I want to remove every line until my regex finds a match. There's usually about 15 lines of fluff before I find a match. I was hoping for a perl one-liner that would look like this:
perl -p -i -e "s/.*By.unanimous.vote//g" *.txt
But this doesn't work.
Thanks

Solution using the flip-flop operator:
perl -pi -e '$_="" unless /By.unanimous.vote/ .. 1' input-files
Shorter solution that also uses the x=!! pseudo operator:
per -pi -e '$_ x=!! (/By.unanimous.vote/ .. 1)' input-files

Have a try with:
If you want to get rid until the last By.unanimous.vote
perl -00 -pe "s/.*By.unanimous.vote//s" inputfile > outputfile
If you want to get rid until the first By.unanimous.vote
perl -00 -pe "s/.*?By.unanimous.vote//s" inputfile > outputfile

Try something like:
perl -pi -e "$a=1 if !$a && /By\.unanimous\.vote/i; s/.*//s if !$a" *.txt
Should remove the lines before the matched line. If you want to remove the matching line also you can do something like:
perl -pi -e "$a=1 if !$a && s/.*By\.unanimous\.vote.*//is; s/.*//s if !$a" *.txt
Shorter versions:
perl -pi -e "$a++if/By\.unanimous\.vote/i;$a||s/.*//s" *.txt
perl -pi -e "$a++if s/.*By\.unanimous\.vote.*//si;$a||s/.*//s" *.txt

You haven't said whether you want to keep the By.unanimous.vote part, but it sounds to me like you want:
s/[\s\S]*?(?=By\.unanimous\.vote)//
Note the missing g flag and the lazy *? quantifier, because you want to stop matching once you hit that string. This should preserve By.unanimous.vote and everything after it. The [\s\S] matches newlines. In Perl, you can also do this with:
s/.*?(?=By\.unanimous\.vote)//s

Solution using awk
awk '/.*By.unanimous.vote/{a=1} a==1{print}' input > output

Problem with perl multiline matching

I'm trying to use a perl one-liner to update some code that spans multiple lines and am seeing some strange behavior. Here's a simple text file that shows the problem I'm seeing:
ABCD START
STOP EFGH
I expected the following to work but it doesn't end up replacing anything:
perl -pi -e 's/START\s+STOP/REPLACE/s' input.txt
After doing some experimenting I found that the \s+ in the original regex will match the newline but not any of the whitespace on the 2nd line, and adding a second \s+ doesn't work either. So for now I'm doing the following workaround, which is to add an intermediate regex that only removes the newline:
perl -pi -e 's/START\s+/START/s' input.txt
This creates the following intermediate file:
ABCD START STOP EFGH
Then I can run the original regex (although the /s is no longer needed):
perl -pi -e 's/START\s+STOP/REPLACE/s' input.txt
This creates the final, desired file:
ABCD REPLACE EFGH
It seems like the intermediate step should not be necessary. Am I missing something?

You were close. You need either -00 or -0777:
perl -0777 -pi -e 's/START\s+/START/' input.txt

perl -p processes the file one line at a time. The regex you have is correct, but it is never matched against the multi-line string.
A simple strategy, assuming the file will fit in memory, is to read the whole thing (do this without -p):
$/ = undef;
$file = <>;
$file =~ s/START\s+STOP/REPLACE/sg;
print $file;
Note, I have added the /g modifier to specify global replacement.
As a shortcut for all that extra boilerplate, you can use your existing script with the -0777 option: perl -0777pi -e 's/START\s+STOP/REPLACE/sg'. Adding /g is still needed if you may need to make multiple replacements within the file.
A hiccup that you might run into, although not with this regex: if the regex were START.+STOP, and a file contains multiple START/STOP pairs, greedy matching of .+ will eat everything from the first START to the last STOP. You can use non-greedy matching (match as little as possible) with .+?.
If you want to use the ^ and $ anchors for line boundaries anywhere in the string, then you also need the /m regex modifier.

A relatively simple one-liner (reading the file in memory):
perl -pi -e 'BEGIN{undef $/;} s/START\s+STOP/REPLACE/sg;' input.txt
Another alternative (not so simple), not reading the file in memory:
perl -ni -e '$a.=$_; \
if ( $a =~ s/START\s+STOP/REPLACE/s ) { print $a; $a=""; } \
END{$a && print $a}' input.txt

perl -MFile::Slurp -e '$content = read_file(shift); $content =~ s/START\s+STOP/REPLACE/s; print $content' input.txt

Here's a one-liner that doesn't read the entire file into memory at once:
perl -i -ne 'if (($x = $last . $_) =~ s/START\n\s*STOP/REPLACE/) \
{ print $x; $last = ""; } else { print $last; $last = $_; } \
print $last if eof ARGV' input.txt

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Printing All Regex Matches - regex

Just use while with the /g modifier to the regex instead of an if. Also need to get rid of your needless use of .* around the regex. perl -nle 'print $1 while /([0|1]\.[0-9]{0,2})/g' Finally, [0|1] should probably just be reduced to [01], unless you want to match a | before the period.

perl -nle 'print for /([0|1]\.[0-9]{0,2})/g'

Related

Perl oneliner in bash: print matches from complex regexp

sed regex with alternative on Solaris doesn't work

Perl one-liner to match all occurrences of regex

Perl regex: remove everything (including line breaks) until a match is found

Problem with perl multiline matching

Categories

Resources