Perl oneliner in bash: print matches from complex regexp - regex

I have this complex regex
/"_outV":([0-9]+),"_inV":([0-9]+),"_label":"([a-z\/]+)",/
and I need to parse a file (which is all on one single line) and output only the matched groups like
print $1 $2 $3
Currently the only almost working onliner is
perl -pe 'while(m/"_outV":([0-9]+)\,"_inV":([0-9]+)\,"_label":"([a-z\/]+)\"\,/g){print "$1 $2 $3\n";}'
But it ends up echoing also the entire file at the end, after the matches.
How do I fix this?
I though that removing the -p option would make the trick, but it doesn't.

Looks good to me.
You need to replace the -p with -n and here is why.
A few finer points:
No need to backslash those , and ".
You can conveniently replace[0-9] with \d.
By using a different delimiter for the regex you won't need to escape the /.
End result optimized
perl -ne 'print "$1 $2 $3\n" while m{"_outV":(\d+),"_inV":(\d+),"_label":"([a-z/]+)",}g'

Related

Delete all characters/words that doesn't match a pattern

I have a text, without lines, and i want to delete all the characters that doesn't match a pattern:
The pattern would be from the word parameter until it finds }}. For example if i have this entry:
KHJLMNNamespaceparameter:{{"Hello i am here"}}NamespaceHSKFSAFSLLLJparameter:{{H}}...
I would like to delete everything and leave this in the file: parameter:{{"Hello i am here"}} parameter:{{H}}.
All i found out there is to delete a line that doesn't contain a pattern, but I am not able to find anything related with a huge file without /n(end of lines). It would be possible to do that using either sed, awk or Vi?
Thanks!
$ awk 'BEGIN{RS=ORS="}}"} sub(/.*parameter/,"parameter")' file
parameter:{{"Hello i am here"}}parameter:{{H}}
Note that this is gawk-specific due to the multi-char RS.
You can use this grep with -P (PCRE) regex:
grep -oP '.*?\Kparameter:\{\{.*?\}\}' file
parameter:{{"Hello i am here"}}
parameter:{{H}}
If perl is an option, you can do this:
perl -ne "my #wo = ($_ =~ /parameter:\{\{.*?\}\}/g); print join(' ',#wo);" your_text_file
In perl, the modifier *? is a non-greedy quantifier, such that it stops at the first encountered }}.
I think a perl expert can do this in one instruction, without a temporary array ...
EDIT: this command only outputs the wanted text on stdout. To change the file itself, use the switch -i when calling perl:
perl -i.bak -ne "my #wo = ($_ =~ /parameter:\{\{.*?\}\}/g); print join(' ',#wo);" your_text_file
A backup file is created with the extension .bak appended at the end, and the result is written in a file with the same name as the input filename. Note that you can get no backup file with the swtich -i alone, but some platforms don't allowed this. See doc perlrun for more information.

Printing All Regex Matches

I'm using the perl/sed commands below to capture and print regex matches, unfortunately, both only print the first match in a line, rather than all matches. How can I modify either or both commands to print all matches? Grep and Awk alternative commands are welcome.
perl -nle 'print "$1" if /.*([0|1]\.[0-9]{0,2}).*/'
sed -rne "s/.*([0|1]\.[0-9]{0,2})/\1/p"
Just use while with the /g modifier to the regex instead of an if. Also need to get rid of your needless use of .* around the regex.
perl -nle 'print $1 while /([0|1]\.[0-9]{0,2})/g'
Finally, [0|1] should probably just be reduced to [01], unless you want to match a | before the period.
perl -nle 'print for /([0|1]\.[0-9]{0,2})/g'

Perl match newline in `-0` mode

Question
Suppose I have a file like this:
I've got a loverly bunch of coconut trees.
Newlines!
Bahahaha
Newlines!
the end.
I'd like to replace an occurence of "Newlines!" that is surrounded by blank lines with (say) NEWLINES!. So, ideal output is:
I've got a loverly bunch of coconut trees.
NEWLINES!
Bahahaha
Newlines!
the end.
Attempts
Ignoring "surrounded by newlines", I can do:
perl -p -e 's#Newlines!#NEWLINES!#g' input.txt
Which replaces all occurences of "Newlines!" with "NEWLINES!".
Now I try to pick out only the "Newlines!" surrounded with \n:
perl -p -e 's#\nNewlines!\n#\nNEWLINES!\n#g' input.txt
No luck (note - I don't need the s switch because I'm not using . and I don't need the m switch because I'm not using ^and $; regardless, adding them doesn't make this work). Lookaheads/behinds don't work either:
perl -p -e 's#(?<=\n)Newlines!(?=\n)#NEWLINES!#g' input.txt
After a bit of searching, I see that perl reads in the file line-by-line (makes sense; sed does too). So, I use the -0 switch:
perl -0p -e 's#(?<=\n)Newlines!(?=\n)#NEWLINES!#g' input.txt
Of course this doesn't work -- -0 replaces new line characters with the null character.
So my question is -- how can I match this pattern (I'd prefer not to write any perl beyond the regex 's#pattern#replacement#flags' construct)?
Is it possible to match this null character? I did try:
perl -0p -e 's#(?<=\0)Newlines!(?=\0)#NEWLINES!#g' input.txt
to no effect.
Can anyone tell me how to match newlines in perl? Whether in -0 mode or not? Or should I use something like awk? (I started with sed but it doesn't seem to have lookahead/behind support even with -r. I went to perl because I'm not at all familiar with awk).
cheers.
(PS: this question is not what I'm after because their problem had to do with a .+ matching newline).
Following should work for you:
perl -0pe 's#(?<=\n\n)Newlines!(?=\n\n)#NEWLINES!#g'
I think they way you went about things caused you to combine possible solutions in a way that didn't work.
if you use the inline editing flag you can do it like this:
perl -0p -i.bk -e 's/\n\nNewlines!\n\n/\n\nNEWLINES!\n\n/g' input.txt
I have doubled the \n's to make sure you only get the ones with empty lines above and below.
If the file is small enough to be slurped into memory all at once:
perl -0777 -pe 's/\n\nNewlines!(?=\n\n)/\n\nNEWLINES!/g'
Otherwise, keep a buffer of the last three lines read:
perl -ne 'push #buffer, $_; $buffer[1] = "NEWLINES!\n" if #buffer == 3 && ' \
-e 'join("", #buffer) eq "\nNewlines!\n\n"; ' \
-e 'print shift #buffer if #buffer == 3; END { print #buffer }'

perl -pe regex problem

I use perl to check some text input for a regex pattern, but one pattern doesn't work with perl -pe.
Following pattern doesn't work with the command call:
s![a-zA-Z]+ +(?:.*?)/(?:.*)Comp-(.*)/.*!$1!
I use the linux shell. Following call I use to test my regex:
cat test | perl -pe 's![a-zA-Z]+ +(?:.*?)/(?:.*)Comp-(.*)/.*!$1!'
File test:
A MaintanceGie?\195?\159mannFlock/System/Comp-Database.cpp
A MaintanceGie?\195?\159mannFlock/System/Comp-Cache/abc.h
Result:
A MaintanceGie?\195?\159mannFlock/System/Comp-Database.cpp
Cache
How can I remove the first result?
Thanks for any advice.
That last slash after "Comp-(.*)" may be what's doing it. Your file content in the "Database" doesn't have a slash. Try replacing Comp-(.*)/.* with Comp-(.*)[/.].* so you can match either the subdirectory or the file extension.
$ cat input
A MaintanceGie?\195?\159mannFlock/System/Comp-Database.cpp
A MaintanceGie?\195?\159mannFlock/System/Comp-Cache/abc.h
$ perl -ne 'print if s![a-zA-Z]+ +(?:.*?)/(?:.*)Comp-(.*)/.*!$1!' input
Cache
The problem is in last slash character in the regex. Instead of escaping the dot, it is just normal slash character, which is missing from input string. Try this:
s![a-zA-Z]+ +(?:.*?)/(?:.*)Comp-(.*)[./].*!$1!
Edit: Updated to match new input data and added another option:
On the other hand, your replacement regex might be replaced by something like:
perl -ne 'print "$1\n" if /Comp-(.*?)[.\/]/'
Then there is no need to parse full line with whatever it contains.
\s match whitespace (spaces, tabs, and line breaks) and '+' means one or more characters. In this case '\s+' would mean search for one or more whitespaces.
cat test
A MaintanceGie?\195?\159mannFlock/System/Comp-Database.cpp
A MaintanceGie?\195?\159mannFlock/System/Comp-Cache/abc.h
perl -ne 'print "$1\n" if /\w+?\d+?\d+\w+\/\w+\/Comp-(\w+)[\/]/' test

How to use grep to get anything just after `name=`?

I’m stuck in trying to grep anything just after name=, include only spaces and alphanumeric.
e.g.:
name=some value here
I get
some value here
I’m totally newb in this, the following grep match everything including the name=.
grep 'name=.*' filename
Any help is much appreciated.
As detailed here, you want a positive lookbehind clause, such as:
grep -P '(?<=name=)[ A-Za-z0-9]*' filename
The -P makes grep use the Perl dialect, otherwise you'd probably need to escape the parentheses. You can also, as noted elsewhere, append the -o parameter to print out only what is matched. The part in brackets specifies that you want alphanumerics and spaces.
The advantage of using a positive lookbehind clause is that the "name=" text is not part of the match. If grep highlights matched text, it will only highlight the alphanumeric (and spaces) part. The -o parameter will also not display the "name=" part. And, if you transition this to another program like sed that might capture the text and do something with it, you won't be capturing the "name=" part, although you can also do that by using capturing parenthess.
Try this:
sed -n 's/^name=//p' filename
It tells sed to print nothing (-n) by default, substitute your prefix with nothing, and print if the substitution occurs.
Bonus: if you really need it to only match entries with only spaces and alphanumerics, you can do that too:
sed -n 's/^name=\([ 0-9a-zA-Z]*$\)/\1/p' filename
Here we've added a pattern to match spaces and alphanumerics only until the end of the line ($), and if we match we substitute the group in parentheses and print.
gawk
echo "name=some value here" | awk -F"=" '/name=/ { print $2}'
or with bash
str="name=some value here"
IFS="="
set -- $str
echo $1
unset IFS
or
str="name=some value here"
str=${str/name=/}
grep does not extract like you expect. What you need is
grep "name=" file.txt | cut -d'=' -f1-
grep will print the entire line where it matches the pattern. To print only the pattern matched, use the grep -o option. You'll probably also need to use sed to remove the name= part of the pattern.
grep -o 'name=[0-9a-zA-Z ]' myfile | sed /^name=/d