Perl regex: remove everything (including line breaks) until a match is found - regex

Apologies for the simple question. I don't clean text or use regex often.
I have a large number of text files in which I want to remove every line until my regex finds a match. There's usually about 15 lines of fluff before I find a match. I was hoping for a perl one-liner that would look like this:
perl -p -i -e "s/.*By.unanimous.vote//g" *.txt
But this doesn't work.
Thanks

Solution using the flip-flop operator:
perl -pi -e '$_="" unless /By.unanimous.vote/ .. 1' input-files
Shorter solution that also uses the x=!! pseudo operator:
per -pi -e '$_ x=!! (/By.unanimous.vote/ .. 1)' input-files

Have a try with:
If you want to get rid until the last By.unanimous.vote
perl -00 -pe "s/.*By.unanimous.vote//s" inputfile > outputfile
If you want to get rid until the first By.unanimous.vote
perl -00 -pe "s/.*?By.unanimous.vote//s" inputfile > outputfile

Try something like:
perl -pi -e "$a=1 if !$a && /By\.unanimous\.vote/i; s/.*//s if !$a" *.txt
Should remove the lines before the matched line. If you want to remove the matching line also you can do something like:
perl -pi -e "$a=1 if !$a && s/.*By\.unanimous\.vote.*//is; s/.*//s if !$a" *.txt
Shorter versions:
perl -pi -e "$a++if/By\.unanimous\.vote/i;$a||s/.*//s" *.txt
perl -pi -e "$a++if s/.*By\.unanimous\.vote.*//si;$a||s/.*//s" *.txt

You haven't said whether you want to keep the By.unanimous.vote part, but it sounds to me like you want:
s/[\s\S]*?(?=By\.unanimous\.vote)//
Note the missing g flag and the lazy *? quantifier, because you want to stop matching once you hit that string. This should preserve By.unanimous.vote and everything after it. The [\s\S] matches newlines. In Perl, you can also do this with:
s/.*?(?=By\.unanimous\.vote)//s

Solution using awk
awk '/.*By.unanimous.vote/{a=1} a==1{print}' input > output

Related

Why does this regex work in grep but not sed?

I have two regular expressions:
$ grep -E '\-\- .*$' *.sql
$ sed -E '\-\- .*$' *.sql
(I am trying to grep lines in sql files that have comments and remove lines in sql files that have comments)
The grep command works using this regex; however, the sed returns the following error:
sed: -e expression #1, char 7: unterminated address regex
What am I doing incorrectly with sed?
(The space after the two hyphens is required for sql comments if you are unfamiliar with MySql comments of this type)
You're trying to use:
sed -E '\-\- .*$' *.sql
Here sed command is not correct because you're not really telling sed to do something.
It should be:
sed -n '/-- /p' *.sql
and equivalent grep would be:
grep -- '-- ' *.sql
or even better with a fixed string search:
grep -F -- '-- ' *.sql
Using -- to separate pattern and arguments in grep command.
There is no need to escape - in a regex if it is outside bracket expression (or character class) i.e. [...].
Based on comments below it seems OP's intent is to remove commented section in all *.sql files that start with 2 hyphens.
You may use this sed for that:
sed -i 's/-- .*//g' *.sql
The problem here is not the regex, the problem is that sed requires a command. The equivalent of your grep would be:
sed -n '/\-\- .*$/p'
You suppress output for non-matching lines -n ... you search (wrap your regex in slashes) and you print p (after the last slash).
P.S.: As Anub pointed out, escaping the hyphens - inside the regex is unnecessary.
You are trying to use sed's \cregexpc syntax where with \-<...> you are telling sed the delimiter character you want use is a dash -, but you didn't terminate it where it should be: \-<...>- also add d command to delete those lines.
sed '\-\-\-.*$-d' infile
see man sed about that:
\cregexpc
Match lines matching the regular expression regexp. The c may be any character.
if default / was used this was not required so:
sed '/--.*$/d' infile
or simply:
sed '/^--/d' infile
and more accurately:
sed '/^[[:blank:]]*--/d' infile

sed regex with alternative on Solaris doesn't work

Currently I'm trying to use sed with regex on Solaris but it doesn't work.
I need to show only lines matching to my regex.
sed -n -E '/^[a-zA-Z0-9]*$|^a_[a-zA-Z0-9]*$/p'
input file:
grtad
a_pitr
_aupa
a__as
baman
12353
ai345
ki_ag
-MXx2
!!!23
+_)#*
I want to show only lines matching to above regex:
grtad
a_pitr
baman
12353
ai345
Is there another way to use alternative? Is it possible in perl?
Thanks for any solutions.
With Perl
perl -ne 'print if /^(a_)?[a-zA-Z0-9]*$/' input.txt
The (a_)? matches a_ one-or-zero times, so optionally. It may or may not be there.
The (a_) also captures the match, what is not needed. So you can use (?:a_)? instead. The ?: makes () only group what is inside (so ? applies to the whole thing), but not remember it.
with grep
$ grep -xiE '(a_)?[a-z0-9]*' ip.txt
grtad
a_pitr
baman
12353
ai345
-x match whole line
-i ignore case
-E extended regex, if not available, use grep -xi '\(a_\)\?[a-z0-9]*'
(a_)? zero or one time match a_
[a-z0-9]* zero or more alphabets or numbers
With sed
sed -nE '/^(a_)?[a-zA-Z0-9]*$/p' ip.txt
or, with GNU sed
sed -nE '/^(a_)?[a-z0-9]*$/Ip' ip.txt

Perl - Replace pattern only in lines matching another pattern

I'm doing an in-place search & replace with Perl. I need to replace all words in all lines that contain another word. For instance, remove all const only in lines containing PMPI_. With sed I can do:
sed -i "/PMPI_/ s/const//g" file.c
However I need multi-line capabilities and sed doesn't seem to be the right tool for the job. I'm using Perl for everything else anyway. I tried
perl -pi -e "/PMPI_/ s/const//g" file.c
And other variations with no success. I could only find vim regex equivalents searching this site.
The syntax is:
perl -pi -e "s/const//g if /PMPI_/" file
Note: you say you need multiline capabilities. I don't think you are looking for the slurp mode (that loads the whole file), but you could also work by paragraphs with the -00 option:
echo 'PMPI_ const
const const' | perl -00 -p -e "s/const//g if /PMPI_/"

perl regex find and replace with variable from bash

I have this regex: 'src=\d' which match all src attributes who start with a number in a file. I need to store it within a variable, cut src= out and write from there a new $string with \d concatenated to it: $string . $d. Is it possible to store only \d in a variable with a single command line? How to use cut and variable in a command line with perl? Is it possible?
perl -pi -w -e 's/src="\d+/src="http:\/\/website.com\/\d+/g’ file.tsv
I'm not exactly sure what you mean, but I think you want something like this, where the () brackets store the number and the $1 replaces it back.
perl -pi -w -e 's/src="(\d+)/src="http:\/\/website.com\/$1/g’ file.tsv
And you can avoid the so-called 'leaning toothpick syndrome by selecting a different delimiter for the s/// operation like s{}{}
perl -pi -w -e 's{src="(\d+)}{src="http://website.com/$1}g’ file.tsv

Perl match newline in `-0` mode

Question
Suppose I have a file like this:
I've got a loverly bunch of coconut trees.
Newlines!
Bahahaha
Newlines!
the end.
I'd like to replace an occurence of "Newlines!" that is surrounded by blank lines with (say) NEWLINES!. So, ideal output is:
I've got a loverly bunch of coconut trees.
NEWLINES!
Bahahaha
Newlines!
the end.
Attempts
Ignoring "surrounded by newlines", I can do:
perl -p -e 's#Newlines!#NEWLINES!#g' input.txt
Which replaces all occurences of "Newlines!" with "NEWLINES!".
Now I try to pick out only the "Newlines!" surrounded with \n:
perl -p -e 's#\nNewlines!\n#\nNEWLINES!\n#g' input.txt
No luck (note - I don't need the s switch because I'm not using . and I don't need the m switch because I'm not using ^and $; regardless, adding them doesn't make this work). Lookaheads/behinds don't work either:
perl -p -e 's#(?<=\n)Newlines!(?=\n)#NEWLINES!#g' input.txt
After a bit of searching, I see that perl reads in the file line-by-line (makes sense; sed does too). So, I use the -0 switch:
perl -0p -e 's#(?<=\n)Newlines!(?=\n)#NEWLINES!#g' input.txt
Of course this doesn't work -- -0 replaces new line characters with the null character.
So my question is -- how can I match this pattern (I'd prefer not to write any perl beyond the regex 's#pattern#replacement#flags' construct)?
Is it possible to match this null character? I did try:
perl -0p -e 's#(?<=\0)Newlines!(?=\0)#NEWLINES!#g' input.txt
to no effect.
Can anyone tell me how to match newlines in perl? Whether in -0 mode or not? Or should I use something like awk? (I started with sed but it doesn't seem to have lookahead/behind support even with -r. I went to perl because I'm not at all familiar with awk).
cheers.
(PS: this question is not what I'm after because their problem had to do with a .+ matching newline).
Following should work for you:
perl -0pe 's#(?<=\n\n)Newlines!(?=\n\n)#NEWLINES!#g'
I think they way you went about things caused you to combine possible solutions in a way that didn't work.
if you use the inline editing flag you can do it like this:
perl -0p -i.bk -e 's/\n\nNewlines!\n\n/\n\nNEWLINES!\n\n/g' input.txt
I have doubled the \n's to make sure you only get the ones with empty lines above and below.
If the file is small enough to be slurped into memory all at once:
perl -0777 -pe 's/\n\nNewlines!(?=\n\n)/\n\nNEWLINES!/g'
Otherwise, keep a buffer of the last three lines read:
perl -ne 'push #buffer, $_; $buffer[1] = "NEWLINES!\n" if #buffer == 3 && ' \
-e 'join("", #buffer) eq "\nNewlines!\n\n"; ' \
-e 'print shift #buffer if #buffer == 3; END { print #buffer }'