Perl - Replace pattern only in lines matching another pattern - regex

I'm doing an in-place search & replace with Perl. I need to replace all words in all lines that contain another word. For instance, remove all const only in lines containing PMPI_. With sed I can do:
sed -i "/PMPI_/ s/const//g" file.c
However I need multi-line capabilities and sed doesn't seem to be the right tool for the job. I'm using Perl for everything else anyway. I tried
perl -pi -e "/PMPI_/ s/const//g" file.c
And other variations with no success. I could only find vim regex equivalents searching this site.

The syntax is:
perl -pi -e "s/const//g if /PMPI_/" file
Note: you say you need multiline capabilities. I don't think you are looking for the slurp mode (that loads the whole file), but you could also work by paragraphs with the -00 option:
echo 'PMPI_ const
const const' | perl -00 -p -e "s/const//g if /PMPI_/"

Related

Why does this regex work in grep but not sed?

I have two regular expressions:
$ grep -E '\-\- .*$' *.sql
$ sed -E '\-\- .*$' *.sql
(I am trying to grep lines in sql files that have comments and remove lines in sql files that have comments)
The grep command works using this regex; however, the sed returns the following error:
sed: -e expression #1, char 7: unterminated address regex
What am I doing incorrectly with sed?
(The space after the two hyphens is required for sql comments if you are unfamiliar with MySql comments of this type)
You're trying to use:
sed -E '\-\- .*$' *.sql
Here sed command is not correct because you're not really telling sed to do something.
It should be:
sed -n '/-- /p' *.sql
and equivalent grep would be:
grep -- '-- ' *.sql
or even better with a fixed string search:
grep -F -- '-- ' *.sql
Using -- to separate pattern and arguments in grep command.
There is no need to escape - in a regex if it is outside bracket expression (or character class) i.e. [...].
Based on comments below it seems OP's intent is to remove commented section in all *.sql files that start with 2 hyphens.
You may use this sed for that:
sed -i 's/-- .*//g' *.sql
The problem here is not the regex, the problem is that sed requires a command. The equivalent of your grep would be:
sed -n '/\-\- .*$/p'
You suppress output for non-matching lines -n ... you search (wrap your regex in slashes) and you print p (after the last slash).
P.S.: As Anub pointed out, escaping the hyphens - inside the regex is unnecessary.
You are trying to use sed's \cregexpc syntax where with \-<...> you are telling sed the delimiter character you want use is a dash -, but you didn't terminate it where it should be: \-<...>- also add d command to delete those lines.
sed '\-\-\-.*$-d' infile
see man sed about that:
\cregexpc
Match lines matching the regular expression regexp. The c may be any character.
if default / was used this was not required so:
sed '/--.*$/d' infile
or simply:
sed '/^--/d' infile
and more accurately:
sed '/^[[:blank:]]*--/d' infile

sed regex with alternative on Solaris doesn't work

Currently I'm trying to use sed with regex on Solaris but it doesn't work.
I need to show only lines matching to my regex.
sed -n -E '/^[a-zA-Z0-9]*$|^a_[a-zA-Z0-9]*$/p'
input file:
grtad
a_pitr
_aupa
a__as
baman
12353
ai345
ki_ag
-MXx2
!!!23
+_)#*
I want to show only lines matching to above regex:
grtad
a_pitr
baman
12353
ai345
Is there another way to use alternative? Is it possible in perl?
Thanks for any solutions.
With Perl
perl -ne 'print if /^(a_)?[a-zA-Z0-9]*$/' input.txt
The (a_)? matches a_ one-or-zero times, so optionally. It may or may not be there.
The (a_) also captures the match, what is not needed. So you can use (?:a_)? instead. The ?: makes () only group what is inside (so ? applies to the whole thing), but not remember it.
with grep
$ grep -xiE '(a_)?[a-z0-9]*' ip.txt
grtad
a_pitr
baman
12353
ai345
-x match whole line
-i ignore case
-E extended regex, if not available, use grep -xi '\(a_\)\?[a-z0-9]*'
(a_)? zero or one time match a_
[a-z0-9]* zero or more alphabets or numbers
With sed
sed -nE '/^(a_)?[a-zA-Z0-9]*$/p' ip.txt
or, with GNU sed
sed -nE '/^(a_)?[a-z0-9]*$/Ip' ip.txt

Retrieve value of attribute in bash

I have a list of lines:
<some_random_text="someval" my_val_="0.4" some_random_text_1="someval_">
<some_random_text="someval" my_val_="0.8" some_random_text_1="someval_">
<some_random_text="someval" my_val_="1.2" some_random_text_1="someval_">
and so on.
From each line, I want to return the numeric value given after my_val_. How can I do this in bash?
Within this very rigid structure, what you want to do is quite easy using sed:
sed 's/.*my_val_="\([0-9.]\{1,\}\)".*/\1/' file
or using extended regular expressions:
sed -r 's/.*my_val_="([0-9.]+)".*/\1/' file
This captures the part you're interested in (the digits and dots between the quotes) and uses them to replace the contents of the line.
As mentioned in the comments (thanks), the switch to enable extended regular expressions differs between versions of sed. Out of habit, I tend to use -r but some implementations (such as BSD sed on OSX) work with -E instead. Others work with either -r or -E but neither option is defined by the standard.
This could also be done in native bash (although I wouldn't recommend it...):
re='my_val_="([0-9.]+)"'
while read -r line; do
[[ $line =~ $re ]] && echo "${BASH_REMATCH[1]}"
done < file
=~ is the regex match operator. The captured digits and dots are stored in element 1 of the special array BASH_REMATCH.
The sed and bash approaches are subtly different, as the sed version will print all lines in the file, even if they don't match the pattern. If this is a problem, you can add the -n switch and a p at the end of the command to print matching lines:
sed -nr 's/.*my_val_="([0-9.]+)".*/\1/p' file
With grep:
grep -oP 'my_val_="\K[^"]*' filename
-o so that grep only prints only the match, -P so that Perl-compatible regexes are used.
The \K in the regex removes from the match everything that was matched by the part of the regex that came before it; this has the effect of a lookbehind: only non-quote characters that come directly after my_val_=" are matched.

Copy matched regex to new file

I want to copy regex matched text to a new file.
<SHOPITEM>([\s\S]*?)<YEAR>2015<\/YEAR>([\s\S]*?)<\/SHOPITEM>
([\s\S]*?) = any text, any line
This works (I am able to find) in Sublime editor, but how this regex looks for sed/grep (or any other Unix tool)?
Usually sed and grep are used to search on lines not on multiline mode as is it still possible under certain conditions.
I would advise to use Perl which should be installed on your computer:
perl -p -e 'undef $/;$_=<>;print $& if /<SHOPITEM>([\s\S]*?)<YEAR>2015<\/YEAR>([\s\S]*?)<\/SHOPITEM>/i;'
Be aware that this regex won't work if you have nested <shopitem> tags or even multiple occurences. Instead use a XML parser.
Also you can write a Program that parse your xml file and this time it will capture all the matches.
myparser.pl:
#!/usr/bin/env perl
undef $/;
$_ = <>;
print while(/<(shopitem)>[\s\S]*<(year)>2015<\/\2>[\s\S]*<\/\1>/ig);
That you can execute:
$ chmod u+x myparser.pl
$ ./myparser.pl myfile.xml
I'm not the best scripter, but I think this should work:
grep "<SHOPITEM>" infile | grep "<YEAR>2015" | sed -e "s/<[^>]*>//g" | sed "s/2015/ /g" > outfile
Edit: I didn't match the regex, instead I got SHOPITEMs with YEAR 2015 tag and removed all the unwanted parts.
Edit: I'd do it this way, but I'm not sure it's the most elegant solution.

Perl regex: remove everything (including line breaks) until a match is found

Apologies for the simple question. I don't clean text or use regex often.
I have a large number of text files in which I want to remove every line until my regex finds a match. There's usually about 15 lines of fluff before I find a match. I was hoping for a perl one-liner that would look like this:
perl -p -i -e "s/.*By.unanimous.vote//g" *.txt
But this doesn't work.
Thanks
Solution using the flip-flop operator:
perl -pi -e '$_="" unless /By.unanimous.vote/ .. 1' input-files
Shorter solution that also uses the x=!! pseudo operator:
per -pi -e '$_ x=!! (/By.unanimous.vote/ .. 1)' input-files
Have a try with:
If you want to get rid until the last By.unanimous.vote
perl -00 -pe "s/.*By.unanimous.vote//s" inputfile > outputfile
If you want to get rid until the first By.unanimous.vote
perl -00 -pe "s/.*?By.unanimous.vote//s" inputfile > outputfile
Try something like:
perl -pi -e "$a=1 if !$a && /By\.unanimous\.vote/i; s/.*//s if !$a" *.txt
Should remove the lines before the matched line. If you want to remove the matching line also you can do something like:
perl -pi -e "$a=1 if !$a && s/.*By\.unanimous\.vote.*//is; s/.*//s if !$a" *.txt
Shorter versions:
perl -pi -e "$a++if/By\.unanimous\.vote/i;$a||s/.*//s" *.txt
perl -pi -e "$a++if s/.*By\.unanimous\.vote.*//si;$a||s/.*//s" *.txt
You haven't said whether you want to keep the By.unanimous.vote part, but it sounds to me like you want:
s/[\s\S]*?(?=By\.unanimous\.vote)//
Note the missing g flag and the lazy *? quantifier, because you want to stop matching once you hit that string. This should preserve By.unanimous.vote and everything after it. The [\s\S] matches newlines. In Perl, you can also do this with:
s/.*?(?=By\.unanimous\.vote)//s
Solution using awk
awk '/.*By.unanimous.vote/{a=1} a==1{print}' input > output