Multiple replacement in sed - regex

Is there a way to replace multiple captured groups and replace it with the value of the captured groups from a key-value format (delimited by =) in sed?
Sorry, that question is confusing so here is an example
What I have:
aaa="src is $src$ user is $user$!" src="over there" user="jason"
What I want in the end:
aaa="src is over there user is jason!"
I don't want to hardcode the position of the $var$ because they could change.

sed ':again
s/\$\([[:alnum:]]\{1,\}\)\$\(.*\) \1="\([^"]*\)"/\3\2/g
t again
' YourFile
As you see, sed is absolut not interesting doing this kind of task ... even with element on several line it can work with few modification and it doesn not need a quick a dirty 6 complex line of higher powerfull languages.
principe:
create a label (for a futur goto)
search an occurence of a patterne between $ $ and take his associate content and replace pattern with it and following string but without the pattern content definition
if it occur, try once again by restarting at the label reference of the script. If not print and treat next line

This is a quick & dirty way to solve it using perl. It could fail in some ways (spaces, escapes double quotes, ...), but it would get the job done for most simple cases:
perl -ne '
## Get each key and value.
#captures = m/(\S+)=("[^"]+")/g;
## Extract first two elements as in the original string.
$output = join q|=|, splice #captures, 0, 2;
## Use a hash for a better look-up, and remove double quotes
## from values.
%replacements = #captures;
%replacements = map { $_ => substr $replacements{$_}, 1, -1 } keys %replacements;
## Use a regex to look-up into the hash for the replacements strings.
$output =~ s/\$([^\$]+)\$/$replacements{$1}/g;
printf qq|%s\n|, $output;
' infile
It yields:
aaa="src is over there user is jason!"

Related

Perl regex - print only modified line (like sed -n 's///p')

I have a command that outputs text in the following format:
misc1=poiuyt
var1=qwerty
var2=asdfgh
var3=zxcvbn
misc2=lkjhgf
etc. I need to get the values for var1, var2, and var3 into variables in a perl script.
If I were writing a shell script, I'd do this:
OUTPUT=$(command | grep '^var-')
VAR1=$(echo "${OUTPUT}" | sed -ne 's/^var1=\(.*\)$/\1/p')
VAR2=$(echo "${OUTPUT}" | sed -ne 's/^var2=\(.*\)$/\1/p')
VAR3=$(echo "${OUTPUT}" | sed -ne 's/^var3=\(.*\)$/\1/p')
That populates OUTPUT with the basic content that I want (so I don't have to run the original command multiple times), and then I can pull out each value using sed VAR1 = 'qwerty', etc.
I've worked with perl in the past, but I'm pretty rusty. Here's the best I've been able to come up with:
my $output = `command | grep '^var'`;
(my $var1 = $output) =~ s/\bvar1=(.*)\b/$1/m;
print $var1
This correctly matches and references the value for var1, but it also returns the unmatched lines, so $var1 equals this:
qwerty
var2=asdfgh
var3=zxcvbn
With sed I'm able to tell it to print only the modified lines. Is there a way to do something similar with in perl? I can't find the equivalent of sed's p modifier in perl.
Conversely, is there a better way to extract those substrings from each line? I'm sure I could match match each line and split the contents or something like that, but was trying to stick with regex since that's how I'd typically solve this outside of perl.
Appreciate any guidance. I'm sure I'm missing something relatively simple.
One way
my #values = map { /\bvar(?:1|2|3)\s*=\s*(.*)/ ? $1 : () } qx(command);
The qx operator ("backticks") returns a list of all lines of output when used in list context, here imposed by map. (In a scalar context it returns all output in a string, possibly multiline.) Then map extracts wanted values: the ternary operator in it returns the capture, or an empty list when there is no match (so filtering out such lines). Please adjust the regex as suitable.
Or one can break this up, taking all output, then filtering needed lines, then parsing them. That allows for more nuanced, staged processing. And then there are libraries for managing external commands that make more involved work much nicer.
A comment on the Perl attempt shown in the question
Since the backticks is assigned to a scalar it is in scalar context and thus returns all output in a string, here multiline. Then the following regex, which replaces var1=(.*) with $1, leaves the next two lines since . does not match a newline so .* stops at the first newline character.
So you'd need to amend that regex to match all the rest so to replace it all with the capture $1. But then for other variables the pattern would have to be different. Or, could replace the input string with all three var-values, but then you'd have a string with those three values in it.
So altogether: using the substitution here (s///) isn't suitable -- just use matching, m//.
Since in list context the match operator also returns all matches another way is
my #values = qx(command) =~ /\bvar(?:1|2|3)\s*=\s*(.*)/g;
Now being bound to a regex, qx is in scalar context and so it returns a (here multiline) string, which is then matched by regex. With /g modifier the pattern keeps being matched through that string, capturing all wanted values (and nothing else). The fact that . doesn't match a newline so .* stops at the first newline character is now useful.
Again, please adjust the regex as suitable to yoru real problem.
Another need came up, to capture both the actual names of variables and their values. Then add capturing parens around names, and assign to a hash
my %val = map { /\b(var(?:1|2|3))\s*=\s*(.*)/ ? ($1, $2) : () } qx(command);
or
my %val = qx(command) =~ /\b(var(?:1|2|3))\s*=\s*(.*)/g;
Now the map for each line of output from command returns a pair of var-name + value, and a list of such pairs can be assigned to a hash. The same goes with subsequent matches (under /g) in the second case..
In scalar context, s/// and s///g return whether it found a match or not. So you can use
print $s if $s =~ s///;

Why isn't this regex executing?

I'm attempting to convert my personal wiki from Foswiki to Markdown files and then to a JAMstack deployment. Foswiki uses flat files and stores metadata in the following format:
%META:TOPICINFO{author="TeotiNathaniel" comment="reprev" date="1571215308" format="1.1" reprev="13" version="14"}%
I want to use a git repo for versioning and will worry about linking that to article metatada later. At this point I simply want to convert these blocks to something that looks like this:
---
author: Teoti Nathaniel
revdate: 1539108277
---
After a bit of tweaking I have constructed the following regex:
author\=\['"\]\(\\w\+\)\['"\]\(\?\:\.\*\)date\=\['"\]\(\\w\+\)\['"\]
According to regex101 this works and my two capture groups contain the desired results. Attempting to actually run it:
perl -0777 -pe 's/author\=\['"\]\(\\w\+\)\['"\]\(\?\:\.\*\)date\=\['"\]\(\\w\+\)\['"\]/author: $1\nrevdate: $2/gms' somefile.txt
gets me only this:
>
My previous attempt (which breaks if the details aren't in a specific order) looked like this and executed correctly:
perl -0777 -pe 's/%META:TOPICINFO\{author="(.*)"\ date="(.*)"\ format="(.*)"\ (.*)\}\%/author:$1 \nrevdate:$2/gms' somefile.txt
I think that this is an escape character problem but can't figure it out. I even went and found this tool to make sure that they are correct.
Brute-forcing my way to understanding here is feeling both inefficient and frustrating, so I'm asking the community for help.
The first major problem is that you're trying to use a single quote (') in the program, when the program is being passed to the shell in single quotes.
Escape any instance of ' in the program by using '\''. You could also use \x27 if the quote happens to be a single double-quoted string literal or regex literal (as is the case of every instance in your program).
perl -0777pe's/author=['\''"].../.../gs'
perl -0777pe's/author=[\x27"].../.../gs'
I would try to break it down into a clean data structure then process it. By seperating the data processing to printing, you can modifiy to add extra data later. It also makes it far more readable. Please see the example below
#!/usr/bin/env perl
use strict;
use warnings;
## yaml to print the data, not required for operation
use YAML::XS qw(Dump);
my $yaml;
my #lines = '%META:TOPICINFO{author="TeotiNathaniel" comment="reprev" date="1571215308" format="1.1" reprev="13" version="14"}%';
for my $str (#lines )
{
### split line into component parts
my ( $type , $subject , $data ) = $str =~ /\%(.*?):(.*?)\{(.*)\}\%/;
## break data in {} into a hash
my %info = map( split(/=/), split(/\s+/, $data) );
## strip quotes if any exist
s/^"(.*)"$/$1/ for values %info;
#add to data structure
$yaml->{$type}{$subject} = \%info;
}
## yaml to print the data, not required for operation
print Dump($yaml);
## loop data and print
for my $t (keys %{ $yaml } ) {
for my $s (keys %{ $yaml->{$t} } ) {
print "-----------\n";
print "author: ".$yaml->{$t}{$s}{"author"}."\n";
print "date: ".$yaml->{$t}{$s}{"date"}."\n";
}
}
Ok, I kept fooling around with it by reducing the execution to a single term and expanding. I soon got to here:
$ perl -0777 -pe 's/author=['\"]\(\\w\+\)['"](?:.*)date=\['\"\]\(\\w\+\)\['\"\]/author\: \$1\\nrevdate\: \$2/gms' somefile.txt
Unmatched [ in regex; marked by <-- HERE in m/author=["](\w+)["](?:.*)date=\["](\w+)[ <-- HERE \"\]/ at -e line 1.
This eventually got me to here:
perl -0777 -pe 's/author=['\"]\(\\w\+\)['"](?:.*)date=['\"]\(\\w\+\)['\"]/\nauthor\ $1\nrevdate\:$2\n/gms' somefile.txt
Which produces a messy output but works. (Note: Output is proof-of-concept and this can now be used within a Python script to programattically generate Markdown metadata.
Thanks for being my rubber duckie, StackOverflow. Hopefully this is useful to someone, somewhere, somewhen.

Replace all commas between two quotes in a bash script

I need that all "," between two " are replaced with ";" within a bash script. I'm close, but hours on the internet and stackoverflow led me to this:
echo ',,Lung,,"Lobular, each.|lungs, right.",false,,,,"organ, left.",,,,,' | sed -r ':a;s/(".*?),(.*?")/\1;\2/;ta'
With the result:
,,Lung,,"Lobular; each.|lungs; right.";false;;;;"organ; left.",,,,,
Correct would be:
,,Lung,,"Lobular; each.|lungs; right.",false,,,,"organ; left.",,,,,
Not sure how you want to deal with lines that have an odd number of double quotes (eg, the double quoted string spans multiple lines), but perhaps:
awk '!(NR%2){gsub(",",";")} 1' RS=\" ORS=\"
This simply treats " as the record separator and does the replacement only on odd numbered records. Seems to work as desired. (Or, rather, it works as you seem to desire!)
As oguz points out in a comment, this prints an extra " at the end. That can be fixed with:
awk '!(NR%2){gsub(",",";")} {printf RFS $0} {RFS="\""}' RS=\"
which is a bit uglier but more correct . (or, rather, less incorrect!) If your input stream ends with a ", that quote will be truncated. If, however, your input is terminated by a newline rather than a ", this will do what you want.
OTOH, you might just want to do:
perl -wpE 'BEGIN{$/=\1}; y/,/;/ if $in; $in = ! $in if $_ eq "\""'
Which reads one character and uses a simple state machine. ($_ is the current character, so $in = ! $in changes state when a double quote is seen and the transliteration only happens when $in is non-zero.)
If you /really/ wanted to use sed, you could do a whole line replace and include a clause like ^(([^"]*"[^"*]")*[^"]*) at the beginning of your existing expression in order to ensure that the matched quotes are "odd".

what regex to extract all data except within <> in perl?

I have string
Message <Network=Data Center> All Verified
I need to extract all string except one in angular brackets
I tried
m/(?![^<]*\\>)/s
Not giving desired result.
Removing <..> regions
It's easier to remove the <..> parts from the string and then deal with the remaining string.
Try this oneliner:
cat file | perl -pne 's/<[^>]*?>//g;'
For your sample input, this is the output:
Message All Verified
Notice the non-greedy quantifier ? is used in the regex. Also, because this is a oneliner, the s/// search-and-replace construct is applied to $_ implicit variable (which is a line from standard input). So after search & replace has run in this oneliner, the $_ will be altered(there will be no <..> regions in it). Also the -p was used in order to print the variable $_ after running the block of code. You can read more about Perl commandline switches in perlrun.
This is one solution. Below there is another one:
Capturing regions outside of <..>
On the other hand, you can(if you want) match the parts outside of the <..> regions.
In order to do that let's build a regex. First, we want a < or > free region. The following regex matches just that
$p = ([^<>]*).
Next, we want to match everything before <, and for that we can write (?:$p<) and everything after >, and that's (?:>$p).
Now if we assemble all those parts together we get (?:>$p)|(?:$p<).
Notice that (?:) is a non-capturing group.
So now there are two capturing groups (the two $p you see above) but only one will match at a time, so some of the captures will be undef. We'll have to filter those out.
Finally, we can assemble all the captures, and we're done.
cat file | perl -ne '$p="([^<>]*)";#x=grep{defined} m{(?:>$p)|(?:$p<)}g; print join(" ",#x)."\n";'
Parse::Yapp parser
You might think that using Parser::Yapp for this particular problem is a bit too much(usually, if you have something complicated to parse, you would use a grammar and a parser generator), but .. why not.. :)
Ok, so we need a grammar, here's one right here grammar_file.yp:
#header
%%
#rules
expression:
| exterior '<' interior '>' exterior
| exterior
;
exterior:
| TOK { $_[0]->YYData->{DATA} .= $_[1]; }
| expression
;
interior: TOK;
%%
#footer
sub Error { my ($parser)=shift; }
sub Lexer {
use Data::Dumper;
my($parser)=shift;
$parser->YYData->{INPUT} or return('',undef);
#$parser->YYData->{INPUT}=~s/^\s+//;
for ($parser->YYData->{INPUT}) {
return ('TOK',$1) if(s/^([^<>]+)//);
return ( $1,$1) if(s/^([<>])//);
};
}
You will notice in the grammar above that the interior is completely ignored, and only the terminals from exterior are collected.
Here's a small program that will use the parser(MyParser.pm generated from grammar_file.yp) parse.pl:
#!/usr/bin/env perl
use strict;
use warnings;
use MyParser;
my $parser=MyParser->new;
$parser->YYData->{INPUT} = "Message <Network=Data Center> All Verified";
my $value=$parser->YYParse(
yylex => \&MyParser::Lexer,
yyerror => \&MyParser::Error,
#yydebug => 0x1F,
);
my $nberr=$parser->YYNberr();
my $data=$parser->YYData->{DATA};
print "Result=$data"
And now a Makefile and we're done:
generate_parser_module:
yapp -m MyParser grammar_file.yp;
run:
perl parse.pl
all: generate_parser_module
Note
Some more Parser generators can be found here
Regexp::Grammars
Parse::RecDescent
Marpa::XS or Marpa::R2
You can do it other way: just remove the string in the angular brackets:
s#<.*>##
Or if > is not allowed:
s#<[^>]*>##
You can use sed for that:
cat yourfile |sed 's/<.*>//g' > newfile
If you need perl:
perl -i -pe "s/<.*?>//g" yourfile
Here is a compact approach. The following regex will capture your strings into Group 1:
<[^>]+>|([^<>]*)
What we are interested in here is not the overall match, but just the Group 1 matches.
So we need to iterate over Group 1 matches. I don't code in Perl, but following a recipe from the perlretut tutorial, this should do it:
while ($x =~ /<[^>]+>|([^<>]*)/g) {
print "$1","\n";
}
Please give it a try and let me know if it works for you.

Perl: Grabbing the nth and mth delimited words from each line in a file

Because of the more tedious way of adding hosts to be monitored in Nagios (it requires defining a host object, as opposed to the previous program which only required the IP and hostname), I figured it'd be best to automate this, and it'd be a great time to learn Perl, because all I know at the moment is C/C++ and Java.
The file I read from looks like this:
xxx.xxx.xxx.xxx hostname #comments. i.dont. care. about
All I want are the first 2 bunches of characters. These are obviously space delimited, but for the sake of generality, it might as well be anything. To make it more general, why not the first and third, or fourth and tenth? Surely there must be some regex action involved, but I'll leave that tag off for the moment, just in case.
The one-liner is great, if you're not writing more Perl to handle the result.
More generally though, in the context of a larger Perl program, you would either write a custom regular expression, for example:
if($line =~ m/(\S+)\s+(\S+)/) {
$ip = $1;
$hostname = $2;
}
... or you would use the split operator.
my #arr = split(/ /, $line);
$ip = $arr[0];
$hostname = $arr[1];
Either way, add logic to check for invalid input.
Let's turn this into code golf! Based on David's excellent answer, here's mine:
perl -ane 'print "#F[0,1]\n";'
Edit: A real golf submission would look more like this (shaving off five strokes):
perl -ape '$_="#F[0,1]
"'
but that's less readable for this question's purposes. :-P
Here's a general solution (if we step away from code-golfing a bit).
#!/usr/bin/perl -n
chop; # strip newline (in case next line doesn't strip it)
s/#.*//; # strip comments
next unless /\S/; # don't process line if it has nothing (left)
#fields = (split)[0,1]; # split line, and get wanted fields
print join(' ', #fields), "\n";
Normally split splits by whitespace. If that's not what you want (e.g., parsing /etc/passwd), you can pass a delimiter as a regex:
#fields = (split /:/)[0,2,4..6];
Of course, if you're parsing colon-delimited files, chances are also good that such files don't have comments and you don't have to strip them.
A simple one-liner is
perl -nae 'print "$F[0] $F[1]\n";'
you can change the delimiter with -F
David Nehme said:
perl -nae 'print "$F[0] $F[1}\n";
which uses the -a switch. I had to look that one up:
-a turns on autosplit mode when used with a -n or -p. An implicit split
command to the #F array is done as the first thing inside the implicit
while loop produced by the -n or -p.
you learn something every day. -n causes each line to be passed to
LINE:
while (<>) {
... # your program goes here
}
And finally -e is a way to directly enter a single line of a program. You can have more than -e. Most of this was a rip of the perlrun(1) manpage.
Since ray asked, I thought I'd rewrite my whole program without using Perl's implicitness (except the use of <ARGV>; that's hard to write out by hand). This will probably make Python people happier (braces notwithstanding :-P):
while (my $line = <ARGV>) {
chop $line;
$line =~ s/#.*//;
next unless $line =~ /\S/;
#fields = (split ' ', $line)[0,1];
print join(' ', #fields), "\n";
}
Is there anything I missed? Hopefully not. The ARGV filehandle is special. It causes each named file on the command line to be read, unless none are specified, in which case it reads standard input.
Edit: Oh, I forgot. split ' ' is magical too, unlike split / /. The latter just matches a space. The former matches any amount of any whitespace. This magical behaviour is used by default if no pattern is specified for split. (Some would say, but what about /\s+/? ' ' and /\s+/ are similar, except for how whitespace at the beginning of a line is treated. So ' ' really is magical.)
The moral of the story is, Perl is great if you like lots of magical behaviour. If you don't have a bar of it, use Python. :-P
To Find Nth to Mth Character In Line No. L --- Example For Finding Label
#echo off
REM Next line = Set command value to a file OR Just Choose Your File By Skipping The Line
vol E: > %temp%\justtmp.txt
REM Vol E: = Find Volume Lable Of Drive E
REM Next Line to choose line line no. +0 = line no. 1
for /f "usebackq delims=" %%a in (`more +0 %temp%\justtmp.txt`) DO (set findstringline=%%a& goto :nextstep)
:nextstep
REM Next line to read nth to mth Character here 22th Character to 40th Character
set result=%findstringline:~22,40%
echo %result%
pause
exit /b
Save as find label.cmd
The Result Will Be Your Drive E Label
Enjoy