Perl script versus one-liner - differences in functionality with regex - regex

I have a perl program that takes the STDIN (piped from another bash command). The output from the bash command is quite large, about 200 lines. I want to take the entire input (multiple lines) and feed that to a one-liner perl script, but so far nothing i've tried has worked. Conversely, if I use the following perl (.pl file):
#!/usr/bin/perl
use strict;
my $regex = qr/{(?:\n|.)*}(?:\n)/p;
if ( <> =~ /$regex/g ) {
print "${^MATCH}\n";
}
And execute my bash command like this:
<bash command> | perl -0777 try_m_1.pl
It works. But as a one-liner, it doesn't work with the same regex/bash command. The result of the print command is nothing. I've tried it like this:
<bash command> | perl -0777 -e '/{(?:\n|.)*}(?:\n)/pg && print "$^MATCH";'
and this:
<bash command> | perl -0777 -e '/{(?:\n|.)*}(?:\n)/g; print "$1\n";'
And a bunch of other things, too many to list them all. I'm new to perl and only want to use it to get regex output from the text. If there's something better than perl to do this (I understand from reading around that sed wouldn't work for this?) feel free to suggest.
Update: based on #zdim answer, I tried the following, which worked:
<bash command> | perl -0777 -ne '/(\{(?:\n|.)*\}(?:\n))/s and print "$1\n"'
I guess my regex needed to be wrapped in () and the { curly braces needed to be escaped.

A one-liner needs -n (or -p) to process input, so that files are opened, streams attached, and a loop set up. It still needs that even as the -0777 unsets the input record separator, so the file is read at once; see Why use the -p|-n in slurp mode in perl one liner?
That regex matches either a newline or any character other than a newline, and there is a modifier for that, /s, with which . matches newline as well. Then that need be inside curly braces, which you need to escape in newer Perls. The newline that follows doesn't need grouping.
So altogether you'd have
<bash command> | perl -0777 -ne'/(\{(.*)\}\n)/s and print "$1\n"'

Related

Printing filename for a multi-line pattern found in multiple files

I am doing perl -pe along with grep to do a multi line grep. This is being done so that when "" is used as a line continuation letter, I need to join the line.
So my file is
record -field X \
-field Y
I am doing
perl -pe 's/\\\n/ /' a/b/c/*/records/*.rec | grep "\-field.*X.*\-field.*Y"
The problem with this is that it just gives me the grep result, without telling me which file had the issue. Is there a way around this. I need to know which files have this too.
I can do a foreach shell script, but was wondering if there is a one liner version of the same possibe
Once you are inside a Perl program why go to system's grep? Perl's tools are far more abundant, rounded, and usable than the shell's. One way
perl -0777 -nE'say "$ARGV: $_" for
grep { /\-field.*X.*\-field.*Y/ } split /\n/, s{\\\n}{ }gr' file-list
(broken into lines for readability)
We read the whole file into $_ ("slurp" it), so to be able to merge those particular lines, using the -0777 switch. That \\n is then substituted with a space and the resulting string returned (by virtue of the /r modifier), and split by \n to regenerate lines.
Then that list of lines is fed to grep with your desired pattern, and the ones that match the pattern are passed through. So then they are printed, prepended with the name of the currently processed file, available in the $ARGV variable.
The answer is to use ARGV[0]
perl -pe 'print "$ARGV[0]: ";s/\\\n/ /' a/b/c/*/records/*.rec | grep "\-field.*X.*\-field.*Y"

Delete blank line before a pattern. What's wrong? Currently using Perl but open to sed/AWK

In a long document, I want to selectively delete the particular newlines that precede the exact string \begin{enumerate*}, ideally with a one-liner in bash or zsh.
That is, I want to convert test.tex:
Text in paragraphs.
More text
\begin{enumerate*} \item thing
to
Text in paragraphs.
More text \begin{enumerate*} \item thing
with a one-liner like
cat test.tex | perl -p -e 's/\n(?=(\\begin\{enumerate\*\}))/ /'
or
cat test.tex | perl -p -e 's/\n\\begin\{enumerate\*\}/\\begin{enumerate*}/'
but I must be missing something because it doesn't make any change.
I also clearly don't need a regular expression here. If there's a way to do this with exact string matching instead of regex, I'd rather use that way. For instance, in R I could do this with sub("\n\\begin{enumerate*}", "\\begin{enumerate*}", fixed = TRUE).
You can use the -0 (digit zero) switch with Perl to specify the line separator. Traditionally -0777 is used to read the entire file
You also need to be careful about regex metacharacters in your search string. Characters like *, {, } and \ mean something special within a regex pattern, and you should escape them — usually with a \Q ... \E construct
Taking these points into account, this should work for you
perl -0777 -pe' s/\n+(?=\Q\begin{enumerate*}\E)/ / ' myfile
perl -p processes a file string by string, so you can't expect this regex to match.
I would recommend something like
perl -e '$text = join "", <>; $text =~ s/your_regex_here//; print $text' test.txt
Mind that it loads the whole file to memory.
Also, if you want to modify file immediately, you can't just say > test.txt, see this question.
I found a solution with sed (number 25 on this page) that doesn't read the entire file into memory:
sed -i bak -n '/^\\begin{enumerate\*}/{x;d;};1h;1!{x;p;};${x;p;}' test.tex
The downside is that this doesn't actually join the two lines; instead it produces
Text in paragraphs.
More text
\begin{enumerate*} \item thing
which is good enough for what I need (latex treats single newlines the same as regular spaces)

Copy matched regex to new file

I want to copy regex matched text to a new file.
<SHOPITEM>([\s\S]*?)<YEAR>2015<\/YEAR>([\s\S]*?)<\/SHOPITEM>
([\s\S]*?) = any text, any line
This works (I am able to find) in Sublime editor, but how this regex looks for sed/grep (or any other Unix tool)?
Usually sed and grep are used to search on lines not on multiline mode as is it still possible under certain conditions.
I would advise to use Perl which should be installed on your computer:
perl -p -e 'undef $/;$_=<>;print $& if /<SHOPITEM>([\s\S]*?)<YEAR>2015<\/YEAR>([\s\S]*?)<\/SHOPITEM>/i;'
Be aware that this regex won't work if you have nested <shopitem> tags or even multiple occurences. Instead use a XML parser.
Also you can write a Program that parse your xml file and this time it will capture all the matches.
myparser.pl:
#!/usr/bin/env perl
undef $/;
$_ = <>;
print while(/<(shopitem)>[\s\S]*<(year)>2015<\/\2>[\s\S]*<\/\1>/ig);
That you can execute:
$ chmod u+x myparser.pl
$ ./myparser.pl myfile.xml
I'm not the best scripter, but I think this should work:
grep "<SHOPITEM>" infile | grep "<YEAR>2015" | sed -e "s/<[^>]*>//g" | sed "s/2015/ /g" > outfile
Edit: I didn't match the regex, instead I got SHOPITEMs with YEAR 2015 tag and removed all the unwanted parts.
Edit: I'd do it this way, but I'm not sure it's the most elegant solution.

Perl match newline in `-0` mode

Question
Suppose I have a file like this:
I've got a loverly bunch of coconut trees.
Newlines!
Bahahaha
Newlines!
the end.
I'd like to replace an occurence of "Newlines!" that is surrounded by blank lines with (say) NEWLINES!. So, ideal output is:
I've got a loverly bunch of coconut trees.
NEWLINES!
Bahahaha
Newlines!
the end.
Attempts
Ignoring "surrounded by newlines", I can do:
perl -p -e 's#Newlines!#NEWLINES!#g' input.txt
Which replaces all occurences of "Newlines!" with "NEWLINES!".
Now I try to pick out only the "Newlines!" surrounded with \n:
perl -p -e 's#\nNewlines!\n#\nNEWLINES!\n#g' input.txt
No luck (note - I don't need the s switch because I'm not using . and I don't need the m switch because I'm not using ^and $; regardless, adding them doesn't make this work). Lookaheads/behinds don't work either:
perl -p -e 's#(?<=\n)Newlines!(?=\n)#NEWLINES!#g' input.txt
After a bit of searching, I see that perl reads in the file line-by-line (makes sense; sed does too). So, I use the -0 switch:
perl -0p -e 's#(?<=\n)Newlines!(?=\n)#NEWLINES!#g' input.txt
Of course this doesn't work -- -0 replaces new line characters with the null character.
So my question is -- how can I match this pattern (I'd prefer not to write any perl beyond the regex 's#pattern#replacement#flags' construct)?
Is it possible to match this null character? I did try:
perl -0p -e 's#(?<=\0)Newlines!(?=\0)#NEWLINES!#g' input.txt
to no effect.
Can anyone tell me how to match newlines in perl? Whether in -0 mode or not? Or should I use something like awk? (I started with sed but it doesn't seem to have lookahead/behind support even with -r. I went to perl because I'm not at all familiar with awk).
cheers.
(PS: this question is not what I'm after because their problem had to do with a .+ matching newline).
Following should work for you:
perl -0pe 's#(?<=\n\n)Newlines!(?=\n\n)#NEWLINES!#g'
I think they way you went about things caused you to combine possible solutions in a way that didn't work.
if you use the inline editing flag you can do it like this:
perl -0p -i.bk -e 's/\n\nNewlines!\n\n/\n\nNEWLINES!\n\n/g' input.txt
I have doubled the \n's to make sure you only get the ones with empty lines above and below.
If the file is small enough to be slurped into memory all at once:
perl -0777 -pe 's/\n\nNewlines!(?=\n\n)/\n\nNEWLINES!/g'
Otherwise, keep a buffer of the last three lines read:
perl -ne 'push #buffer, $_; $buffer[1] = "NEWLINES!\n" if #buffer == 3 && ' \
-e 'join("", #buffer) eq "\nNewlines!\n\n"; ' \
-e 'print shift #buffer if #buffer == 3; END { print #buffer }'

How do I replace multiple newlines with a single one with Perl's Regular Expressions?

I've got a document containing empty lines (\n\n). They can be removed with sed:
echo $'a\n\nb'|sed -e '/^$/d'
But how do I do that with an ordinary regular expression in perl? Anything like the following just shows no result at all.
echo $'a\n\nb'|perl -p -e 's/\n\n/\n/s'
You need to use s/^\n\z//. Input is read by line so you will never get more than one newline. Instead, eliminate lines that do not contain any other characters. You should invoke perl using
perl -ne 's/^\n\z//; print'
No need for the /s switch.
The narrower problem of not printing blank lines is more straightforward:
$(input) | perl -ne 'print if /\S/'
will output all lines except the ones that only contain whitespace.
The input is three separate lines, and perl with the -p option only processes one line at time.
The workaround is to tell perl to slurp in multiple lines of input at once. One way to do it is:
echo $'a\n\nb' | perl -pe 'BEGIN{$/=undef}; s/\n\n/\n/'
Here $/ is the record separator variable, which tells perl how to parse an input stream into lines.