Printing filename for a multi-line pattern found in multiple files - regex

I am doing perl -pe along with grep to do a multi line grep. This is being done so that when "" is used as a line continuation letter, I need to join the line.
So my file is
record -field X \
-field Y
I am doing
perl -pe 's/\\\n/ /' a/b/c/*/records/*.rec | grep "\-field.*X.*\-field.*Y"
The problem with this is that it just gives me the grep result, without telling me which file had the issue. Is there a way around this. I need to know which files have this too.
I can do a foreach shell script, but was wondering if there is a one liner version of the same possibe

Once you are inside a Perl program why go to system's grep? Perl's tools are far more abundant, rounded, and usable than the shell's. One way
perl -0777 -nE'say "$ARGV: $_" for
grep { /\-field.*X.*\-field.*Y/ } split /\n/, s{\\\n}{ }gr' file-list
(broken into lines for readability)
We read the whole file into $_ ("slurp" it), so to be able to merge those particular lines, using the -0777 switch. That \\n is then substituted with a space and the resulting string returned (by virtue of the /r modifier), and split by \n to regenerate lines.
Then that list of lines is fed to grep with your desired pattern, and the ones that match the pattern are passed through. So then they are printed, prepended with the name of the currently processed file, available in the $ARGV variable.

The answer is to use ARGV[0]
perl -pe 'print "$ARGV[0]: ";s/\\\n/ /' a/b/c/*/records/*.rec | grep "\-field.*X.*\-field.*Y"

Related

Perl script versus one-liner - differences in functionality with regex

I have a perl program that takes the STDIN (piped from another bash command). The output from the bash command is quite large, about 200 lines. I want to take the entire input (multiple lines) and feed that to a one-liner perl script, but so far nothing i've tried has worked. Conversely, if I use the following perl (.pl file):
#!/usr/bin/perl
use strict;
my $regex = qr/{(?:\n|.)*}(?:\n)/p;
if ( <> =~ /$regex/g ) {
print "${^MATCH}\n";
}
And execute my bash command like this:
<bash command> | perl -0777 try_m_1.pl
It works. But as a one-liner, it doesn't work with the same regex/bash command. The result of the print command is nothing. I've tried it like this:
<bash command> | perl -0777 -e '/{(?:\n|.)*}(?:\n)/pg && print "$^MATCH";'
and this:
<bash command> | perl -0777 -e '/{(?:\n|.)*}(?:\n)/g; print "$1\n";'
And a bunch of other things, too many to list them all. I'm new to perl and only want to use it to get regex output from the text. If there's something better than perl to do this (I understand from reading around that sed wouldn't work for this?) feel free to suggest.
Update: based on #zdim answer, I tried the following, which worked:
<bash command> | perl -0777 -ne '/(\{(?:\n|.)*\}(?:\n))/s and print "$1\n"'
I guess my regex needed to be wrapped in () and the { curly braces needed to be escaped.
A one-liner needs -n (or -p) to process input, so that files are opened, streams attached, and a loop set up. It still needs that even as the -0777 unsets the input record separator, so the file is read at once; see Why use the -p|-n in slurp mode in perl one liner?
That regex matches either a newline or any character other than a newline, and there is a modifier for that, /s, with which . matches newline as well. Then that need be inside curly braces, which you need to escape in newer Perls. The newline that follows doesn't need grouping.
So altogether you'd have
<bash command> | perl -0777 -ne'/(\{(.*)\}\n)/s and print "$1\n"'

Delete blank line before a pattern. What's wrong? Currently using Perl but open to sed/AWK

In a long document, I want to selectively delete the particular newlines that precede the exact string \begin{enumerate*}, ideally with a one-liner in bash or zsh.
That is, I want to convert test.tex:
Text in paragraphs.
More text
\begin{enumerate*} \item thing
to
Text in paragraphs.
More text \begin{enumerate*} \item thing
with a one-liner like
cat test.tex | perl -p -e 's/\n(?=(\\begin\{enumerate\*\}))/ /'
or
cat test.tex | perl -p -e 's/\n\\begin\{enumerate\*\}/\\begin{enumerate*}/'
but I must be missing something because it doesn't make any change.
I also clearly don't need a regular expression here. If there's a way to do this with exact string matching instead of regex, I'd rather use that way. For instance, in R I could do this with sub("\n\\begin{enumerate*}", "\\begin{enumerate*}", fixed = TRUE).
You can use the -0 (digit zero) switch with Perl to specify the line separator. Traditionally -0777 is used to read the entire file
You also need to be careful about regex metacharacters in your search string. Characters like *, {, } and \ mean something special within a regex pattern, and you should escape them — usually with a \Q ... \E construct
Taking these points into account, this should work for you
perl -0777 -pe' s/\n+(?=\Q\begin{enumerate*}\E)/ / ' myfile
perl -p processes a file string by string, so you can't expect this regex to match.
I would recommend something like
perl -e '$text = join "", <>; $text =~ s/your_regex_here//; print $text' test.txt
Mind that it loads the whole file to memory.
Also, if you want to modify file immediately, you can't just say > test.txt, see this question.
I found a solution with sed (number 25 on this page) that doesn't read the entire file into memory:
sed -i bak -n '/^\\begin{enumerate\*}/{x;d;};1h;1!{x;p;};${x;p;}' test.tex
The downside is that this doesn't actually join the two lines; instead it produces
Text in paragraphs.
More text
\begin{enumerate*} \item thing
which is good enough for what I need (latex treats single newlines the same as regular spaces)

Delete all characters/words that doesn't match a pattern

I have a text, without lines, and i want to delete all the characters that doesn't match a pattern:
The pattern would be from the word parameter until it finds }}. For example if i have this entry:
KHJLMNNamespaceparameter:{{"Hello i am here"}}NamespaceHSKFSAFSLLLJparameter:{{H}}...
I would like to delete everything and leave this in the file: parameter:{{"Hello i am here"}} parameter:{{H}}.
All i found out there is to delete a line that doesn't contain a pattern, but I am not able to find anything related with a huge file without /n(end of lines). It would be possible to do that using either sed, awk or Vi?
Thanks!
$ awk 'BEGIN{RS=ORS="}}"} sub(/.*parameter/,"parameter")' file
parameter:{{"Hello i am here"}}parameter:{{H}}
Note that this is gawk-specific due to the multi-char RS.
You can use this grep with -P (PCRE) regex:
grep -oP '.*?\Kparameter:\{\{.*?\}\}' file
parameter:{{"Hello i am here"}}
parameter:{{H}}
If perl is an option, you can do this:
perl -ne "my #wo = ($_ =~ /parameter:\{\{.*?\}\}/g); print join(' ',#wo);" your_text_file
In perl, the modifier *? is a non-greedy quantifier, such that it stops at the first encountered }}.
I think a perl expert can do this in one instruction, without a temporary array ...
EDIT: this command only outputs the wanted text on stdout. To change the file itself, use the switch -i when calling perl:
perl -i.bak -ne "my #wo = ($_ =~ /parameter:\{\{.*?\}\}/g); print join(' ',#wo);" your_text_file
A backup file is created with the extension .bak appended at the end, and the result is written in a file with the same name as the input filename. Note that you can get no backup file with the swtich -i alone, but some platforms don't allowed this. See doc perlrun for more information.

Perl match newline in `-0` mode

Question
Suppose I have a file like this:
I've got a loverly bunch of coconut trees.
Newlines!
Bahahaha
Newlines!
the end.
I'd like to replace an occurence of "Newlines!" that is surrounded by blank lines with (say) NEWLINES!. So, ideal output is:
I've got a loverly bunch of coconut trees.
NEWLINES!
Bahahaha
Newlines!
the end.
Attempts
Ignoring "surrounded by newlines", I can do:
perl -p -e 's#Newlines!#NEWLINES!#g' input.txt
Which replaces all occurences of "Newlines!" with "NEWLINES!".
Now I try to pick out only the "Newlines!" surrounded with \n:
perl -p -e 's#\nNewlines!\n#\nNEWLINES!\n#g' input.txt
No luck (note - I don't need the s switch because I'm not using . and I don't need the m switch because I'm not using ^and $; regardless, adding them doesn't make this work). Lookaheads/behinds don't work either:
perl -p -e 's#(?<=\n)Newlines!(?=\n)#NEWLINES!#g' input.txt
After a bit of searching, I see that perl reads in the file line-by-line (makes sense; sed does too). So, I use the -0 switch:
perl -0p -e 's#(?<=\n)Newlines!(?=\n)#NEWLINES!#g' input.txt
Of course this doesn't work -- -0 replaces new line characters with the null character.
So my question is -- how can I match this pattern (I'd prefer not to write any perl beyond the regex 's#pattern#replacement#flags' construct)?
Is it possible to match this null character? I did try:
perl -0p -e 's#(?<=\0)Newlines!(?=\0)#NEWLINES!#g' input.txt
to no effect.
Can anyone tell me how to match newlines in perl? Whether in -0 mode or not? Or should I use something like awk? (I started with sed but it doesn't seem to have lookahead/behind support even with -r. I went to perl because I'm not at all familiar with awk).
cheers.
(PS: this question is not what I'm after because their problem had to do with a .+ matching newline).
Following should work for you:
perl -0pe 's#(?<=\n\n)Newlines!(?=\n\n)#NEWLINES!#g'
I think they way you went about things caused you to combine possible solutions in a way that didn't work.
if you use the inline editing flag you can do it like this:
perl -0p -i.bk -e 's/\n\nNewlines!\n\n/\n\nNEWLINES!\n\n/g' input.txt
I have doubled the \n's to make sure you only get the ones with empty lines above and below.
If the file is small enough to be slurped into memory all at once:
perl -0777 -pe 's/\n\nNewlines!(?=\n\n)/\n\nNEWLINES!/g'
Otherwise, keep a buffer of the last three lines read:
perl -ne 'push #buffer, $_; $buffer[1] = "NEWLINES!\n" if #buffer == 3 && ' \
-e 'join("", #buffer) eq "\nNewlines!\n\n"; ' \
-e 'print shift #buffer if #buffer == 3; END { print #buffer }'

How do I replace multiple newlines with a single one with Perl's Regular Expressions?

I've got a document containing empty lines (\n\n). They can be removed with sed:
echo $'a\n\nb'|sed -e '/^$/d'
But how do I do that with an ordinary regular expression in perl? Anything like the following just shows no result at all.
echo $'a\n\nb'|perl -p -e 's/\n\n/\n/s'
You need to use s/^\n\z//. Input is read by line so you will never get more than one newline. Instead, eliminate lines that do not contain any other characters. You should invoke perl using
perl -ne 's/^\n\z//; print'
No need for the /s switch.
The narrower problem of not printing blank lines is more straightforward:
$(input) | perl -ne 'print if /\S/'
will output all lines except the ones that only contain whitespace.
The input is three separate lines, and perl with the -p option only processes one line at time.
The workaround is to tell perl to slurp in multiple lines of input at once. One way to do it is:
echo $'a\n\nb' | perl -pe 'BEGIN{$/=undef}; s/\n\n/\n/'
Here $/ is the record separator variable, which tells perl how to parse an input stream into lines.