Matching string in perl which is parsed with awk - regex

I did next with awk:
awk '/secon/ {print $1}' vladtest.sh |cut -c2-6
How to match this string in perl loop on appropriate way:
my $var1= `awk '/secon/ {print $1}' vladtest.sh |cut -c2-6`
if ($var1 eq "secon"){
print "OK";
} else {
print "FALSE"
}

First of all, your problem is the fact that the result of `...` includes the newline. So $var1 is not "secon", but "secon\n". You could deal with that any number of ways - wrapping chomp(...) around the whole assignment to $var1, or changing the right side of the eq to include the "\n", or using a regex instead: if ($var1 =~ /^secon$/) works with or without the chomp, because $ matches at a string-final newline if present.
Second of all, you're using about two programs too many here:
my $var1= `awk '/secon/ {print $1}' vladtest.sh |cut -c2-6`
I mean,awk can do anything cut can do, and perl can do anything either can do. When I'm typing one-liners at the shell prompt, I'm a big fan of awk and cut, simply for their economy of expression; within their respective specialities, their command lines are generally shorter than the equivalent (even using perl -a etc). But since you're already in a Perl program, there's no reason to shell out to something else here.
Here's an example drop-in replacement for the above:
my $var1;
{
open(my $fh, '<', 'vladtest.sh');
while (<$fh>) {
$var1 .= substr(split(' ')[0],1,5)."\n" if /secon/;
}
}
But you don't need to go through all that if you just want to detect if there's a match.
{open(my $fh, '<', 'vladtest.sh');
print 'NOT ' unless grep { /^\s*\Ssecon/ } <$fh>;
say 'OK';} # or print "OK\n"; if you don't have 5.10 extensions enabled.

My awk's a bit rusty, but you're grabbing $1 then looking for characters 2 through 6 to match 'secon'. If any line matches this, then print "OK", otherwise "NOT OK". Here's a perl only solution:
use 5.10.0;
while (<>) {
if (/^\s*\Ssecon/) {
say 'OK';
exit 0;
}
}
say 'NOT OK';
exit 1;
The regexp is made up of:
^ match the start of each line, followed by
\s* zero or more whitespace characters, followed by
\S one non-whitespace character, followed by
secon the literal string that you're interested in
As soon as we have a line that matches, we can print 'OK' and exit. Otherwise we'll fall through the end of the loop and print 'NOT OK'.
The use 5.10.0; line is needed so you can use say().

There are better ways to implement this request, but follow your current idea , change the first line to resolve the issue.
my $var1= `awk '$1~/secon/ {print substr($1,2,5)}' vladtest.sh`

Related

Regex to search, replace and repeat while there's a match?

I'm trying to come up with a way to do the following. Given a line of text like (actual Fortran77 code, these declarations begin at column 7)
CHARACTER FOO*1, BAR*2, OTHER*6
replace it by
CHARACTER*1 FOO
CHARACTER*2 BAR
CHARACTER*6 OTHER
I tried this
perl -pe '1 while s/^[^*][ ]+CHARACTER[ ]+([a-z-A-Z0-9_]+)\*([0-9]+)(,?)(.*)/ CHARACTER\*$2 $1\n CHARACTER $4/' foo.f
The [^*] bit is intentional (I ignore lines with * at column 1). The output I get is
CHARACTER*1 FOO
CHARACTER BAR*2, OTHER*6
I already see I'll have a problem when only CHARACTER OTHER*6 remains, but I haven't even got there yet. Any help with also this problem would be appreciated.
What should I do to end having the original line replaced by the three lines? Thanks.
perl -ple 'if (/^( +CHARACTER) +(\w+\*\d+(?:, *\w+\*\d+)*) *$/) { my $p = $1; $_ = join "\n", map /(\w+)\*(\d+)/ ? "$p*$2 $1" : die("wtf: $_"), split /,/, $2; }' foo.f
Man, this looks awful in one line. Formatted:
perl -ple '
if (/^( +CHARACTER) +(\w+\*\d+(?:, *\w+\*\d+)*) *$/) {
my $p = $1;
$_ =
join "\n",
map /(\w+)\*(\d+)/ ? "$p*$2 $1" : die("wtf: $_"),
split /,/,
$2;
}
' foo.f
First thing that came to my mind:
# Open file containing the line and loop through lines (I put only one line)
open (INFH, '<', "num.txt");
while(<INFH>){
#arr = split(',',$_);
foreach $arrEle (#arr){
if ($arrEle =~ /CHARACTER/){
$arrEle =~ s/(\s+)(\w+)(\s+)(\w+)(\*)(\d+)/$1$2$5$6$3$4/;
}else{
$arrEle =~ s/(\s+)(\w+)(\*)(\d+)/\tCHARACTER$3$4$1$2/;
}
print "$arrEle\n";
}
}
I will update with more details as we discuss in the comments section.
This should work with a line any big, like this:
CHARACTER FOO*1, BAR*2, OTHER*6, OTHERS*7, OTHERS*8 and so on
Your original try is just missing one thing: ^ by default only matches at the beginning of the string, but you want to match at the beginning of a line in the middle of the string. Change your s/.../.../ to s/.../.../m.
Then you still have to deal with the final substitution; either make the , mandatory to match and add a second substitution to deal with the move of the final *length or use /e and make what's being substituted an expression that does different things when $3 is empty.

how to to extract all text of the form "<key>=<value>" from a log file

Hi I have a requirement where I need to pull text of the form - = from a large log file.
log file consists of data like this:
[accountNumber=0, email=tom.cruise#gmail.com, firstName=Tom, lastName= , message=Hello How are you doing today ?
The output I expect is:
accountNumber=0
email=tom.cruise#gmail.com
firstName=Tom
etc.
Can anyone please help ? Also please explain the solution so that I can extend it to cater to my similar needs.
I wrote a one-liner for this:
perl -nle 's/^\[//; for (split(/,/)){s/(?:^\s+|\s+$)//g; print}' input.txt
I also made another line of input to test with:
Matt#MattPC ~/perl/testing/13
$ cat input.txt
[accountNumber=0, email=tom.cruise#gmail.com, firstName=Tom, lastName= , message=Hello How are you doing today ?
[accountNumber=2, email=john.smith#gmail.com, firstName=John, lastName= , message=What is up with you?
Here is the output:
Matt#MattPC ~/perl/testing/13
$ perl -nle 's/^\[//; for (split(/,/)){s/(?:^\s+|\s+$)//g; print}' input.txt
accountNumber=0
email=tom.cruise#gmail.com
firstName=Tom
lastName=
message=Hello How are you doing today ?
accountNumber=2
email=john.smith#gmail.com
firstName=John
lastName=
message=What is up with you?
Explanation:
Expanded code:
perl -nle '
s/^\[//;
for (split(/,/)){
s/(?:^\s+|\s+$)//g;
print
}'
input.txt
Line by line explanation:
perl -nle calls perl with the command line options -n, -l, and -e. The -n adds a while loop around the program like this:
LINE:
while (<>) {
... # your program goes here
}
The -l adds a newline at the end of every print. And the -e specifies my code which will be in single quotes (').
s/^\[//; removes the first [ if there is one. This searches and replaces on $_ which is equal to the line.
for (split(/,/)){ begins the for loop which will loop through the array returned by split(/,/). The split will split $_ since it was called with just one argument, and it will split on ,. $_ was equal to the line, but inside the for loop, $_ still get set to the element of the array we are on.
s/(?:^\s+|\s+$)//g; this line removes leading and trailing white space.
print will print $_ followed by a newline, which is our string=value.
}' close the for loop and finish the '.
input.txt provide input to the program.
Going off your specific data and desired output, you could try the following:
use strict;
use warnings;
open my $fh, '<', 'file.txt' or die "Can't open file $!";
my $data = do { local $/; <$fh> };
my #matches = $data =~ /(\w+=\S+),/g;
print join "\n", #matches;
Working Demo
Perl One-Liner
Use this:
perl -0777 -ne 'while(m/[^ ,=]+=[^,]*/g){print "$&\n";}' yourfile
Assuming that each line of the log ends with a closing square bracket, you can use this:
#!/usr/bin/perl
use strict;
use warnings;
my $line = '[accountNumber=0, email=tom.cruise#gmail.com, firstName=Tom, lastName= , message=Hello How are you doing today ?]';
while($line =~ /([^][,\s][^],]*?)\s*[],]/g) {
print $1 . "\n";
}

sed delete 1st line and remove leading/trailing white spaces

I am trying to delete the 1st line and removing leading and trailing white spaces in the subsequent lines using sed
If I have something like
line1
line2
line3
It should print
line2
line3
So I tried this command on unix shell:
sed '1d;s/^ [ \t]*//;s/[ \t]*$//' file.txt
and it works as expected.
When I try the same in my perl script:
my #templates = `sed '1d;s/^ [ \t]*//;s/[ \t]*$//' $MY_FILE`;
It gives me this message "sed: -e expression #1, char 10: unterminated `s' command" and doesn't print anything. Can someone tell me where I am going wrong
Why would you invoke Sed from Perl anyway? Replacing the sed with the equivalent Perl code is just a few well-planned keystrokes.
my #templates;
if (open (M, '<', $MY_FILE)) {
#templates = map { s/(?:^\s*|\s*$)//g; $_ } <M>;
shift #templates;
close M;
} else { # die horribly? }
The backticks work like double-quotes. Perl interpolates variables inside them, as you already know due to your use of $MY_FILE. What you may not know is that $/ is actually a variable, the input record separator (by default a newline character). The same is true for the backslashes before the tab character. Here Perl will interpret \t for you and replace it with the tab character. You'll need a second backslash so that sed sees \t instead of an actual tab character. The latter might work as well, though.
Consider to use safe pipe open instead of backticks, to avoid problems with escaping. For example:
my #templates = do {
open my $fh, "|-", 'sed', '1d;s/^ [ \t]*//;s/[ \t]*$//', $MY_FILE
or die $!;
local $/;
<$fh>;
};
You have a typo in your expression. You need a semicolon between the 2 substitution statements. You should use the following instead:
my #templates = `sed '1d;s/^ [ \\t]*//;s/[ \\t]*\$//' $MY_FILE`;
escaping $ and \ as suggested in the other answer. I should note that it also worked for me without escaping \ as it was replaced by a literal tab.
As others have mentioned, I would recommend you do this only in Perl, or only in Sed, because there's really no reason to use both for this task. Using Sed in Perl will mean you have to worry about escaping, quoting and capturing the output (unless reading from a pipe) somehow. Obviously, all that complicates things and it also makes the code very ugly.
Here is a Perl one-liner that will handle your reformatting:
perl -le 'my $line = <>; while (<>) { chomp; s/^\s*|\s*$//; print $_; }' file.txt
Basically, you just take the first line and store in a variable that won't be used, then process the rest of the lines. Below is a small script version that you can add to your existing script.
#!/usr/bin/env perl
use strict;
use warnings;
my $usage = "$0 infile";
my $infile = shift or die $usage;
open my $in, '<', $infile or die "Could not open file: $infile";
my $first = <$in>;
while (<$in>) {
chomp;
s/^\s*|\s*$//;
# process your data here, or just print...
print $_, "\n";
}
close $in;
This can also be down with awk
awk 'NR>1 {$1=$1;print}' file
line2
line3

How do I return all characters that begin and end with certain characters in Perl (Or C++)?

note: I'm running Perl 5 on Linux
I'm currently doing a project where I have to input a few words and then return words that begin with "d" and end with "e". I'm not using a pre-done list, for example I input into the console Done, Dish, Dome, and Death. I want it to return Done and Dome, but not the other words. I hope to receive help how to do this in Perl, but C++ would help if Perl doesn't work out.
perl -ne ' print if /^d/i && /e$/i ' < words
Since you are using Linux, it may be simpler to use grep(1):
grep -i '^d.*e$' < words
That's almost trivial in Perl:
$ perl -nE 'say "ok" if /^d.*e$/i'
Done
ok
Dish
Dome
ok
Death
It reads from STDIN and says ok if the line matched. This is useful while debugging regular expressions. You just want to output matching lines, so you could simply replace say "ok" by say
$ perl -nlE 'say if /^d.*e$/i' words
while words is the filename of your words file. It magically reads its lines. Short explanation of that regular expression match:
^ # start of the line
d # the literal character 'd' (case-insensitive because of the i switch)
.* # everything allowed here
$ # end of the line
Not often I answer perl questions, but I think this does the trick.
my #words = ...;
#words = grep(/^d.*e$/i, #words);
grep uses a regular expression to filter the words.
How about:
#!/usr/bin/perl -Tw
use strict;
use warnings;
for my $word (#ARGV) {
if ( $word =~ m{\A d .* e \z}xmsi ) {
print "$word\n";
}
}

Multi platform script perl or awk

I am trying to match records in following format:
(-,username,domain1.co.uk)\
(-,username,domain2.co.uk)
either awk or perl must be used. I am using cygwin and wrote following code which works and matches both above entries:
awk 'BEGIN {musr="(-,username,[^)]+.co.uk)"} {if ($0~musr) print $0}' netgroup
But if I try to modify this regexp to be more specific the output is nothing:
1st: match record then last backslash and then match newline:
"(-,username,[^)]+.co.uk)\\$"
2nd: match new line immediatelly after record without backslash:
"(-,username,[^)]+.co.uk)$"
So I decided to rewrite script into perl, hoping that perl can deal with backslashes and end of line symbols. For this purpose I used a2p this way:
echo 'BEGIN {musr="(-,username,[^)]+.co.uk)"} {if ($0~musr) print $0}' | a2p.exe
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if $running_under_some_shell;
# this emulates #! processing on NIH machines.
# (remove #! line above if indigestible)
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
# process any FOO=bar switches
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
$musr = '(-,username,[^)]+.co.uk)';
while (<>) {
chomp; # strip record separator
if ($_ =~ $musr) {
print $_;
}
}
This generated perl script also matches both entries, however if I try modify this script to more specific I get the following errors:
1st:
$musr = "(-,username,[^)]+.co.uk)\\";
Trailing \ in regex m/(-,username,[^)]+.co.uk)\/ at perlmatch.pl line 18, <> line 1.
2nd:
$musr = "(-,username,[^)]+.co.uk)$";
Final $ should be \$ or $name at perlmatch.pl line 14, within string
syntax error at perlmatch.pl line 14, near "= "(-,username,[^)]+.co.uk)$""
Execution of perlmatch.pl aborted due to compilation errors.
3rd:
$musr = "(-,username,[^)]+.co.uk)\$";
[the output is nothing]
What I am doing wrong ? My question is also pointing to fact that if somebody needs to use script on several platforms (aix, solaris, linux) than using perl should be better approach that dealing with (non)GNU utils and various (g|n)awk versions etc. Regards
Your problems arise from string quoting in Perl.
$musr = "(-,username,[^)]+.co.uk)\\"; replaces \\ with a single backslash when the string is created. But you would need to pass two backslashes to the regex. So you would have to put four in when you create the string.
$musr = "(-,username,[^)]+.co.uk)$"; tries to perform variable interpolation within the string.
In addition, parentheses should be escaped, as John Kugelman noted.
The solution is to use Perl's built-in delimiters for regular expressions, rather than normal quoted strings. The simple way is to put it right into your loop:
while (<>) {
chomp; # strip record separator
if ($_ =~ /\(-,username,[^)]+.co.uk\)$/) {
print $_;
}
}
If you do need to put the pattern into a variable first, use the special qr//
operator.
my $musr = qr/\(-,username,[^)]+.co.uk\)$/;
while (<>) {
chomp; # strip record separator
if ($_ =~ $musr) {
print $_;
}
}
(-,username,[^)]+.co.uk)\\$
The problem here is not with the backslash at the end of the line, it's the parentheses. Parentheses are used for grouping. You need to escape them to match literal ( ) characters. You should also escape the dots so they match literal dots instead of "any character".
$ awk '/\(-,username,[^)]+\.co\.uk\)$/ {print}' netgroup
(-,username,domain2.co.uk)
$ awk '/\(-,username,[^)]+\.co\.uk\)\\$/ {print}' netgroup
(-,username,domain1.co.uk)\
If you stick with plain awk and don't use [gn]awk-specific features awk is very portable. More portable than perl is, I would think.
Parentheses must be escaped. Otherwise they group expressions. To be more specific, match an optional backslash at the end of the line (Backslashes are doubled because as string they must be escaped too).
awk 'BEGIN {musr="\\(-,username,[^)]+.co.uk\\)\\\\?$"} {if ($0~musr) print $0}' netgroup