perl -pe regex problem - regex

I use perl to check some text input for a regex pattern, but one pattern doesn't work with perl -pe.
Following pattern doesn't work with the command call:
s![a-zA-Z]+ +(?:.*?)/(?:.*)Comp-(.*)/.*!$1!
I use the linux shell. Following call I use to test my regex:
cat test | perl -pe 's![a-zA-Z]+ +(?:.*?)/(?:.*)Comp-(.*)/.*!$1!'
File test:
A MaintanceGie?\195?\159mannFlock/System/Comp-Database.cpp
A MaintanceGie?\195?\159mannFlock/System/Comp-Cache/abc.h
Result:
A MaintanceGie?\195?\159mannFlock/System/Comp-Database.cpp
Cache
How can I remove the first result?
Thanks for any advice.

That last slash after "Comp-(.*)" may be what's doing it. Your file content in the "Database" doesn't have a slash. Try replacing Comp-(.*)/.* with Comp-(.*)[/.].* so you can match either the subdirectory or the file extension.

$ cat input
A MaintanceGie?\195?\159mannFlock/System/Comp-Database.cpp
A MaintanceGie?\195?\159mannFlock/System/Comp-Cache/abc.h
$ perl -ne 'print if s![a-zA-Z]+ +(?:.*?)/(?:.*)Comp-(.*)/.*!$1!' input
Cache

The problem is in last slash character in the regex. Instead of escaping the dot, it is just normal slash character, which is missing from input string. Try this:
s![a-zA-Z]+ +(?:.*?)/(?:.*)Comp-(.*)[./].*!$1!
Edit: Updated to match new input data and added another option:
On the other hand, your replacement regex might be replaced by something like:
perl -ne 'print "$1\n" if /Comp-(.*?)[.\/]/'
Then there is no need to parse full line with whatever it contains.

\s match whitespace (spaces, tabs, and line breaks) and '+' means one or more characters. In this case '\s+' would mean search for one or more whitespaces.
cat test
A MaintanceGie?\195?\159mannFlock/System/Comp-Database.cpp
A MaintanceGie?\195?\159mannFlock/System/Comp-Cache/abc.h
perl -ne 'print "$1\n" if /\w+?\d+?\d+\w+\/\w+\/Comp-(\w+)[\/]/' test

Related

Adding blank line spaces before and after pattern 'string' match

I am trying to add 5 blank line spaces in a text file (text.txt) before and after string pattern matches. I used the following to get spaces after the 'string' match which worked for me-
sed '/string/{G;G;G;G;G;}' text.txt
I want to apply the same sed command to obtain 5 blank lines before the 'string' Here I don't want spaces, but rather blank lines before and after them. Any suggestions?
sed -r 's/(^.*)(string)(.*$)/\1\n\n\n\n\n\2\n\n\n\n\n\3/' text.txt
Use -r or -E to allow regular expressions, split likes into three sections and then substitute the line for the first section, 5 new lines, the second section, 5 new lines and then finally the third section.
Use this Perl one-liner:
perl -pe 's/string/\n\n\n\n\n$&\n\n\n\n\n/' text.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
s/PATTERN/REPLACEMENT/ : change PATTERN to REPLACEMENT.
$& : matched pattern.
\n : newline character.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start
For a single string match:
$ sed -e '/string/{ s/^/\n\n\n\n\n/; s/$/\n\n\n\n\n/ }' text.txt
For multiple strings, assuming same requirements:
$ sed -E '/(string1|string2|string3)/{ s/^/\n\n\n\n\n/; s/$/\n\n\n\n\n/ }' text.txt
This might work for you:
sed '/string/{G;s/\(string\)\(.*\)\(.\)/\3\3\3\3\3\1\3\3\3\3\3\2/}' file
Match on string, append an empty line, pattern match using the newline to separate the match by 5 lines either side.
And an awk version:
awk '{if(/string1|string2|.../){printf "\n\n\n\n\n%s\n\n\n\n\n",$0}else{print}}' file

Perl oneliner in bash: print matches from complex regexp

I have this complex regex
/"_outV":([0-9]+),"_inV":([0-9]+),"_label":"([a-z\/]+)",/
and I need to parse a file (which is all on one single line) and output only the matched groups like
print $1 $2 $3
Currently the only almost working onliner is
perl -pe 'while(m/"_outV":([0-9]+)\,"_inV":([0-9]+)\,"_label":"([a-z\/]+)\"\,/g){print "$1 $2 $3\n";}'
But it ends up echoing also the entire file at the end, after the matches.
How do I fix this?
I though that removing the -p option would make the trick, but it doesn't.
Looks good to me.
You need to replace the -p with -n and here is why.
A few finer points:
No need to backslash those , and ".
You can conveniently replace[0-9] with \d.
By using a different delimiter for the regex you won't need to escape the /.
End result optimized
perl -ne 'print "$1 $2 $3\n" while m{"_outV":(\d+),"_inV":(\d+),"_label":"([a-z/]+)",}g'

Perl regex start of line anchor fails

I want to change the 'this' on the second row of the example, but it's not happening. Most grateful for an idea where I'm going wrong.
echo "not this but
this one" > test.txt
perl -0777 -i -pe 's/^this/rhinoceros/igs' test.txt
cat test.txt
not this but
this one
You have all the wrong modifiers on your substitution. You presumably only want to make a single change, so the /g is unnecessary; the text to be matched is exactly this, so the /i is unnecessary, and you have no dot . characters in your pattern, so /s doesn't do anything
What you do need is a /m (multi-line) modifier so that the ^ matches the beginning of lines in the middle of the string, as well as just at the start of the string
This should work for you
perl -0777 -i -pe 's/^this/rhinoceros/m' test.txt
You are using the /s flag, you actually want the /m flag.
perl -0777 -i -pe 's/^this/rhinoceros/igm' test.txt
s treats the whole string as a single line, whereas m matches over multiple lines.
Edit: See http://perldoc.perl.org/perlre.html for a more detailed treatment of the s and m modifiers that #ThisSuitIsBlackNot comments on below. For practical purposes, s "treat[s] string as single line".

How can I use regex to exclude lines with extra characters?

I have a bunch of email addresses:
abc#google.com
bdc#yahoo.com
\\ske#google.com
I'd like to delete the bolded line because there is extra character in the string other than # . and letters. How do I do this ?
Through awk,
$ awk '/^\w+#\w+/{print}' file
abc#google.com
bdc#yahoo.com
Awk searches for the lines which starts with one or more word character followed by an # symbol and again followed by one or more word characters. If it founds any, then prints the whole line.
This line \\ske#google.com wouldn't starts with a word character, so it not get printed.
You can use this sed:
sed -i.bak -n '/^[[:alnum:]]*#/p' file
You can use vim to take care of it too:
vim -c 'v/^[[:alnum:]]*#/d' -c 'wq' file
You could also use a perl module:
perl -ne 'use Email::Valid; print if Email::Valid->address($_)'

Perl replace regular expression

In file.txt I have a line like this:
^/string string_to_be_replaced
that must become:
^/string replaced_string
So I created a perl script:
perl -pi -e "s[\^/string$(.*)$] [\^/string$(.*) replaced_string]" /file.txt
But the problem is that perl doesn't find the line starting with ^/string
perl -pi -e "s[\^/string .+$][^/string replaced_string]" /file.txt
or
perl -pi -e "s[\^/string \K.+$][replaced_string]" /file.txt
The first portion of your regex, [\^/string$(.)$], looks for a line containing "^/string", followed by an end-of-line, followed by a single character (which is captured), and finally another end-of-line. Since the file is processed line-by-line (looking at only one line at a time), a pattern containing two end-of-lines can never match.
You probably want:
s[\^/string string_to_be_replaced][^/string replaced_string]
assuming that you have only a single string_to_be_replaced and only want to replace it when it's preceded by string. If not, then more details would be useful.