RegEx Find and Replace Sentence - regex

I'm looking for a way to find and replace a sentence using regex. The regex should be able to find a sentence of any length. I can get the entire sentence with .* but that doesn't allow it to replace with \1.
FIND:
"QUESTION1" = "What is the day satellite called?"
"ANSWER1" = "The sun"
REPLACE:
<key>What is the day satellite called?</key>
<key>The sun</key>

You need to use capturing groups. So that you can refer the captured groups through back-reference.
Regex:
.*(?<= \")([^"]*).*
Replacement string:
<key>\1</key>
DEMO

Find using the following expression (modifiers required: g and m):
^[^=]+= "(.*?)"$
and then replace them using:
<key>$1</key>
or
<key>\1</key>

using perl:
> cat temp
"QUESTION1" = "What is the day satellite called?"
"ANSWER1" = "The sun"
> perl -lne 'print "<key>".$1."<\/key>" if(/\".*?\".*?\"(.*?)\"/)' temp
<key>What is the day satellite called?</key>
<key>The sun</key>
>

Perl One-Liner
A compact approach: search for (?m)"([^"]+)"$
Replace with <key>$1</key> if you want <key>What is the day satellite called?</key>
or
Replace: "<key>$1</key>" if you want "<key>What is the day satellite called?</key>"
With a perl one-liner:
perl -pe 's!(?m)"([^"]+)"$!<key>$1</key>!g' yourfile

For those who are coming from google search, you can play with this great tool here to find the right regex expression to use: http://regexr.com/

Related

perl multiline issue: need one liner to print last match before string in file

I have a log file like this:
2018-07-10 10:03:01: random text1
2018-07-10 10:03:02: random text2
2018-07-10 10:03:03: random text3
more text
and more
THIS IS MATCHED STRING
2018-07-10 10:03:04: random text4
I want to use a perl one-liner to find the most recent timestamp before "THIS IS MATCHED STRING".
I tried this:
perl -0777 -nle 'print "$1\n" while m/(\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d).+?THIS IS MATCHED STRING/sg'
But it matched the first timestamp, "2018-07-10 10:03:01" instead of the "2018-07-10 10:03:03" that I wanted. Obviously (at least I think), I don't have a good understanding of how the greedy/lazy matching is working.
Any help would be appreciated!
For a fairly elementary approach, which avoids an involved regex, process line by line and when a timestamp pattern is matched record it. Then as you run into pattern THIS... you will have had the (last) previous timestamp.
perl -wnE'
$ts = $1 if /(\d{4}-\d{2}-\d{2}[ ]\d{2}:\d{2}:\d{2})/;
say $ts // "no previous time stamp" if /THIS IS MATCHED STRING/;
' file.txt
If the timestamp is captured and saved with ($ts) = /.../ then failed matches on lines without it turn it undef, so it may not be there when THIS is found. Thus it is saved from $1 only once there is a match.
The defined-or (//) on $ts is used in case the file had no time stamps at all before THIS
You could use
^
(\d{4}-\d{2}-\d{2}\ \d+:\d+:\d+):
(?:(?!^\d{4})[\s\S])+?
\QTHIS IS MATCHED STRING\E
See a demo on regex101.com.

perl regex - anchors and pattern matching

I coded perl regex to extract the words after a certain anchor,
it seems like its not working. What am I doing wrong.
This is my actual output, I need to extract every number after groups keyword
$id cuser301 uid=2301(cuser301) gid=32(rpc) groups=32(rpc),1001(cgrp1),1002(cgrp2),1003(cgrp3),1004(cgrp4),1005(cgrp5),1006(cgrp6),1007(cgrp7),1008(cgrp8),1009(cgrp9),1010(cgrp10),1011(cgrp11),1012(cgrp12),1013(cgrp13),1014(cgrp14),1015(cgrp15),1016(cgrp16),1017(cgrp17),1018(cgrp18),1019(cgrp19),1020(cgrp20),1021(cgrp21),1022(cgrp22),1023(cgrp23),1024(cgrp24),1025(cgrp25),1026(cgrp26),1027(cgrp27),1028(cgrp28),1029(cgrp29),1030(cgrp30),1031(cgrp31),1032(cgrp32)
From the above, I run the id command and then would like to capture the numbers after groups Please help.
I am using the following.
my $check_groups = execute("\id $user"); #---> (execute is to run commands on the linux client, please ignore it)
my $new_groups = ('/^groups/',$check_groups); # ---> Now $new_groups should have all numbers after groups.
my $input = '$id cuser301 uid=2301(cuser301) gid=32(rpc) groups=32(rpc),1001(cgrp1),1002(cgrp2),1003(cgrp3),1004(cgrp4),1005(cgrp5),1006(cgrp6),1007(cgrp7),1008(cgrp8),1009(cgrp9),1010(cgrp10),1011(cgrp11),1012(cgrp12),1013(cgrp13),1014(cgrp14),1015(cgrp15),1016(cgrp16),1017(cgrp17),1018(cgrp18),1019(cgrp19),1020(cgrp20),1021(cgrp21),1022(cgrp22),1023(cgrp23),1024(cgrp24),1025(cgrp25),1026(cgrp26),1027(cgrp27),1028(cgrp28),1029(cgrp29),1030(cgrp30),1031(cgrp31),1032(cgrp32)';
print join ',', $input =~ /(?:.*groups=|\G.*?)\b([0-9]+)/g;
This is a common pattern; in more complicated cases where you want to ensure the \G branch only applies after the first non-zero-length match, you can use \G(?!\A) instead of just \G.
Try doing this :
$ echo <INPUT> | perl -ne 'print "$1," while /,(\d+)\(/g'
Check https://regex101.com/r/uZ9tO6/1

Perl - Unexptected behaviour with Regex in Array

I'm trying to match lines that have
"/foldera/folderb/folderc/folderd/file.ext##/main" + "/" + ANY_NUMBER:
so for example:
(.+)(main)(.\d)
The lines:
/foldera/folderb/folderc/folderd/file.ext##/main
/foldera/folderb/folderc/folderd/file.ext##/main/0
/foldera/folderb/folderc/folderd/file.ext##/main/1
/foldera/folderb/folderc/folderd/file.ext##/main/2
/foldera/folderb/folderc/folderd/file.ext##/main/3
/foldera/folderb/folderc/folderd/file.ext##/main/4
/foldera/folderb/folderc/folderd/file.ext##/main/5 (RLT-abcde, BLD-abcde, DEV-abcde)
/foldera/folderb/folderc/folderx/file12.ext##/main/0
/foldera/folderb/folderc/folderx/file12.ext##/main/1
/foldera/folderb/folderc/folderx/file12.ext##/main/2
/foldera/folderb/folderc/folderx/file12.ext##/main/3
/foldera/folderb/folderc/folderx/file12.ext##/main/4
/foldera/folderb/folderc/folderx/file12.ext##/main/5
/foldera/folderb/folderc/folderx/file12.ext##/main/6 (RLS-abcde-5.0, RLS-abcde-4.1)
While my regex matches the desired lines (I checked it at http://www.regexe.com/), in my Perl program it does not match
/foldera/folderb/folderc/folderd/file.ext##/main
but it does match:
/foldera/folderb/folderc/folderd/file.ext##/main/5 (RLT-abcde, BLD-abcde, DEV-abcde)
Here is the code:
use warnings;
use strict;
my #file_list = `find /folder -type f -name '*.ext'|xargs cleartool lsvtree -all`;
foreach my $file(#file_list){
if ($file=~m/(.+)(main)(.\d)/g){
print $file;
}
}
I'm pretty sure that I'm making a stupid mistake somewhere, but I just can't see it!
Thank you in advance for your advice.
P.S. I tried it under Perl 5.8 an Perl 5.18 with the same results, OS is Solaris.
Change
print $file;
to:
print "$MATCH\n";
so you only print the part of the line that was matched by the regexp.
You should also change \d to \d+, to allow for numbers with more than one digit.
Just after a quick look
/foldera/folderb/folderc/folderd/file.ext##/main
Has no number at the end. And \d requires the number ;-)
You may also find this site usefull: http://regexpal.com/
I think is not matching your line because your regex is explicitly looking for a digit at the end
Try changing your regex to be: (Note the curley brackets at the end)
(.+)(main)(.\d){0,1}
Or personally I would write it like this:
(.*?)main(\/\d*){0,1}
Hope this helps!

regex search with renumbering in replace

I have a file with anywhere between a dozen and hundreds of matches on the search
/playOrder="(\d+)"/
These are in the index file of an ePub ebook, in case anyone is wondering.
Is it possible to have a perl regex replace what finds all these, and "magically" renumber them all to a sequence, starting from 1?
Posting comment as answer, as requested by OP:
perl -pe 's/playOrder="\K\d+"/++$i . q(")/ge' infile > outfile
This one-liner is using a replacement field which is created by evaluation, creating a sequence like 1", 2"...
Further optimization can be made if using a lookahead assertion instead of inserting a new double quote ":
perl -pe 's/playOrder="\K\d+(?=")/++$i/ge' infile > outfile

Bash/PHP extract URL from HTML via regex

Is there any easy way to extract this URL in bash/or PHP?
http://shop.image-site.com/images/2/format2013/fullies/kju_product.png
From this HTML code?
<a href="javascript: open_window_zoom('http://shop.image-site.com/image.php?image=http://shop.image-site.com/images/2/format2013/fullies/kju_product.png&pID=31777&download=kju.png&name=13011 KELLYS Kju: 490mm (19.5")',550,366);">
With perl you could do a match and a capture
perl -n -e 'print "$1\n" if (m/image=(.*?)\&/);'
This captures everything between image= and the next & and prints it $1.
For more on regular expressions, see perlre or http://www.regular-expressions.info/
In bash, you can try the following:
sed 's/.*image=\(http:\/\/[^&]*\).*/\1/g'
Update:
The solution above performs substitution rather than extraction. The line containing the pattern (required url) is replaced by the pattern itself. However, the substitution isn't in-place.
Whichever way you decide to dress it up, you could simply split with the delimiter equal to ?image= and then split the second token you receive (i.e. result[1]) with a simple & delimiter. The first result from that split is your answer.
However, a pure regex match would look something like: m#image=(a-z0-9\:/\.\-)&#i. You can take that regex and put it wherever you want to get your result stored in $1. Despite what a lot of people think, you do not have to match the beginning of a line and the end of a line to match a result.
Try doing this :
xmllint --html --xpath '//a/#href' file://file.html |
grep -oP 'image=\Khttp://.*?\.png'
You can use an URL instead of a local file :
http://domain.tld/path
Or if you had already extracted the line to parse in the $string variable :
grep -oP 'image=\Khttp://.*?\.png' <<< "$string"