sed delete 1st line and remove leading/trailing white spaces - regex

I am trying to delete the 1st line and removing leading and trailing white spaces in the subsequent lines using sed
If I have something like
line1
line2
line3
It should print
line2
line3
So I tried this command on unix shell:
sed '1d;s/^ [ \t]*//;s/[ \t]*$//' file.txt
and it works as expected.
When I try the same in my perl script:
my #templates = `sed '1d;s/^ [ \t]*//;s/[ \t]*$//' $MY_FILE`;
It gives me this message "sed: -e expression #1, char 10: unterminated `s' command" and doesn't print anything. Can someone tell me where I am going wrong

Why would you invoke Sed from Perl anyway? Replacing the sed with the equivalent Perl code is just a few well-planned keystrokes.
my #templates;
if (open (M, '<', $MY_FILE)) {
#templates = map { s/(?:^\s*|\s*$)//g; $_ } <M>;
shift #templates;
close M;
} else { # die horribly? }

The backticks work like double-quotes. Perl interpolates variables inside them, as you already know due to your use of $MY_FILE. What you may not know is that $/ is actually a variable, the input record separator (by default a newline character). The same is true for the backslashes before the tab character. Here Perl will interpret \t for you and replace it with the tab character. You'll need a second backslash so that sed sees \t instead of an actual tab character. The latter might work as well, though.

Consider to use safe pipe open instead of backticks, to avoid problems with escaping. For example:
my #templates = do {
open my $fh, "|-", 'sed', '1d;s/^ [ \t]*//;s/[ \t]*$//', $MY_FILE
or die $!;
local $/;
<$fh>;
};

You have a typo in your expression. You need a semicolon between the 2 substitution statements. You should use the following instead:
my #templates = `sed '1d;s/^ [ \\t]*//;s/[ \\t]*\$//' $MY_FILE`;
escaping $ and \ as suggested in the other answer. I should note that it also worked for me without escaping \ as it was replaced by a literal tab.

As others have mentioned, I would recommend you do this only in Perl, or only in Sed, because there's really no reason to use both for this task. Using Sed in Perl will mean you have to worry about escaping, quoting and capturing the output (unless reading from a pipe) somehow. Obviously, all that complicates things and it also makes the code very ugly.
Here is a Perl one-liner that will handle your reformatting:
perl -le 'my $line = <>; while (<>) { chomp; s/^\s*|\s*$//; print $_; }' file.txt
Basically, you just take the first line and store in a variable that won't be used, then process the rest of the lines. Below is a small script version that you can add to your existing script.
#!/usr/bin/env perl
use strict;
use warnings;
my $usage = "$0 infile";
my $infile = shift or die $usage;
open my $in, '<', $infile or die "Could not open file: $infile";
my $first = <$in>;
while (<$in>) {
chomp;
s/^\s*|\s*$//;
# process your data here, or just print...
print $_, "\n";
}
close $in;

This can also be down with awk
awk 'NR>1 {$1=$1;print}' file
line2
line3

Related

how to to extract all text of the form "<key>=<value>" from a log file

Hi I have a requirement where I need to pull text of the form - = from a large log file.
log file consists of data like this:
[accountNumber=0, email=tom.cruise#gmail.com, firstName=Tom, lastName= , message=Hello How are you doing today ?
The output I expect is:
accountNumber=0
email=tom.cruise#gmail.com
firstName=Tom
etc.
Can anyone please help ? Also please explain the solution so that I can extend it to cater to my similar needs.
I wrote a one-liner for this:
perl -nle 's/^\[//; for (split(/,/)){s/(?:^\s+|\s+$)//g; print}' input.txt
I also made another line of input to test with:
Matt#MattPC ~/perl/testing/13
$ cat input.txt
[accountNumber=0, email=tom.cruise#gmail.com, firstName=Tom, lastName= , message=Hello How are you doing today ?
[accountNumber=2, email=john.smith#gmail.com, firstName=John, lastName= , message=What is up with you?
Here is the output:
Matt#MattPC ~/perl/testing/13
$ perl -nle 's/^\[//; for (split(/,/)){s/(?:^\s+|\s+$)//g; print}' input.txt
accountNumber=0
email=tom.cruise#gmail.com
firstName=Tom
lastName=
message=Hello How are you doing today ?
accountNumber=2
email=john.smith#gmail.com
firstName=John
lastName=
message=What is up with you?
Explanation:
Expanded code:
perl -nle '
s/^\[//;
for (split(/,/)){
s/(?:^\s+|\s+$)//g;
print
}'
input.txt
Line by line explanation:
perl -nle calls perl with the command line options -n, -l, and -e. The -n adds a while loop around the program like this:
LINE:
while (<>) {
... # your program goes here
}
The -l adds a newline at the end of every print. And the -e specifies my code which will be in single quotes (').
s/^\[//; removes the first [ if there is one. This searches and replaces on $_ which is equal to the line.
for (split(/,/)){ begins the for loop which will loop through the array returned by split(/,/). The split will split $_ since it was called with just one argument, and it will split on ,. $_ was equal to the line, but inside the for loop, $_ still get set to the element of the array we are on.
s/(?:^\s+|\s+$)//g; this line removes leading and trailing white space.
print will print $_ followed by a newline, which is our string=value.
}' close the for loop and finish the '.
input.txt provide input to the program.
Going off your specific data and desired output, you could try the following:
use strict;
use warnings;
open my $fh, '<', 'file.txt' or die "Can't open file $!";
my $data = do { local $/; <$fh> };
my #matches = $data =~ /(\w+=\S+),/g;
print join "\n", #matches;
Working Demo
Perl One-Liner
Use this:
perl -0777 -ne 'while(m/[^ ,=]+=[^,]*/g){print "$&\n";}' yourfile
Assuming that each line of the log ends with a closing square bracket, you can use this:
#!/usr/bin/perl
use strict;
use warnings;
my $line = '[accountNumber=0, email=tom.cruise#gmail.com, firstName=Tom, lastName= , message=Hello How are you doing today ?]';
while($line =~ /([^][,\s][^],]*?)\s*[],]/g) {
print $1 . "\n";
}

Matching string in perl which is parsed with awk

I did next with awk:
awk '/secon/ {print $1}' vladtest.sh |cut -c2-6
How to match this string in perl loop on appropriate way:
my $var1= `awk '/secon/ {print $1}' vladtest.sh |cut -c2-6`
if ($var1 eq "secon"){
print "OK";
} else {
print "FALSE"
}
First of all, your problem is the fact that the result of `...` includes the newline. So $var1 is not "secon", but "secon\n". You could deal with that any number of ways - wrapping chomp(...) around the whole assignment to $var1, or changing the right side of the eq to include the "\n", or using a regex instead: if ($var1 =~ /^secon$/) works with or without the chomp, because $ matches at a string-final newline if present.
Second of all, you're using about two programs too many here:
my $var1= `awk '/secon/ {print $1}' vladtest.sh |cut -c2-6`
I mean,awk can do anything cut can do, and perl can do anything either can do. When I'm typing one-liners at the shell prompt, I'm a big fan of awk and cut, simply for their economy of expression; within their respective specialities, their command lines are generally shorter than the equivalent (even using perl -a etc). But since you're already in a Perl program, there's no reason to shell out to something else here.
Here's an example drop-in replacement for the above:
my $var1;
{
open(my $fh, '<', 'vladtest.sh');
while (<$fh>) {
$var1 .= substr(split(' ')[0],1,5)."\n" if /secon/;
}
}
But you don't need to go through all that if you just want to detect if there's a match.
{open(my $fh, '<', 'vladtest.sh');
print 'NOT ' unless grep { /^\s*\Ssecon/ } <$fh>;
say 'OK';} # or print "OK\n"; if you don't have 5.10 extensions enabled.
My awk's a bit rusty, but you're grabbing $1 then looking for characters 2 through 6 to match 'secon'. If any line matches this, then print "OK", otherwise "NOT OK". Here's a perl only solution:
use 5.10.0;
while (<>) {
if (/^\s*\Ssecon/) {
say 'OK';
exit 0;
}
}
say 'NOT OK';
exit 1;
The regexp is made up of:
^ match the start of each line, followed by
\s* zero or more whitespace characters, followed by
\S one non-whitespace character, followed by
secon the literal string that you're interested in
As soon as we have a line that matches, we can print 'OK' and exit. Otherwise we'll fall through the end of the loop and print 'NOT OK'.
The use 5.10.0; line is needed so you can use say().
There are better ways to implement this request, but follow your current idea , change the first line to resolve the issue.
my $var1= `awk '$1~/secon/ {print substr($1,2,5)}' vladtest.sh`

Multi platform script perl or awk

I am trying to match records in following format:
(-,username,domain1.co.uk)\
(-,username,domain2.co.uk)
either awk or perl must be used. I am using cygwin and wrote following code which works and matches both above entries:
awk 'BEGIN {musr="(-,username,[^)]+.co.uk)"} {if ($0~musr) print $0}' netgroup
But if I try to modify this regexp to be more specific the output is nothing:
1st: match record then last backslash and then match newline:
"(-,username,[^)]+.co.uk)\\$"
2nd: match new line immediatelly after record without backslash:
"(-,username,[^)]+.co.uk)$"
So I decided to rewrite script into perl, hoping that perl can deal with backslashes and end of line symbols. For this purpose I used a2p this way:
echo 'BEGIN {musr="(-,username,[^)]+.co.uk)"} {if ($0~musr) print $0}' | a2p.exe
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if $running_under_some_shell;
# this emulates #! processing on NIH machines.
# (remove #! line above if indigestible)
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
# process any FOO=bar switches
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
$musr = '(-,username,[^)]+.co.uk)';
while (<>) {
chomp; # strip record separator
if ($_ =~ $musr) {
print $_;
}
}
This generated perl script also matches both entries, however if I try modify this script to more specific I get the following errors:
1st:
$musr = "(-,username,[^)]+.co.uk)\\";
Trailing \ in regex m/(-,username,[^)]+.co.uk)\/ at perlmatch.pl line 18, <> line 1.
2nd:
$musr = "(-,username,[^)]+.co.uk)$";
Final $ should be \$ or $name at perlmatch.pl line 14, within string
syntax error at perlmatch.pl line 14, near "= "(-,username,[^)]+.co.uk)$""
Execution of perlmatch.pl aborted due to compilation errors.
3rd:
$musr = "(-,username,[^)]+.co.uk)\$";
[the output is nothing]
What I am doing wrong ? My question is also pointing to fact that if somebody needs to use script on several platforms (aix, solaris, linux) than using perl should be better approach that dealing with (non)GNU utils and various (g|n)awk versions etc. Regards
Your problems arise from string quoting in Perl.
$musr = "(-,username,[^)]+.co.uk)\\"; replaces \\ with a single backslash when the string is created. But you would need to pass two backslashes to the regex. So you would have to put four in when you create the string.
$musr = "(-,username,[^)]+.co.uk)$"; tries to perform variable interpolation within the string.
In addition, parentheses should be escaped, as John Kugelman noted.
The solution is to use Perl's built-in delimiters for regular expressions, rather than normal quoted strings. The simple way is to put it right into your loop:
while (<>) {
chomp; # strip record separator
if ($_ =~ /\(-,username,[^)]+.co.uk\)$/) {
print $_;
}
}
If you do need to put the pattern into a variable first, use the special qr//
operator.
my $musr = qr/\(-,username,[^)]+.co.uk\)$/;
while (<>) {
chomp; # strip record separator
if ($_ =~ $musr) {
print $_;
}
}
(-,username,[^)]+.co.uk)\\$
The problem here is not with the backslash at the end of the line, it's the parentheses. Parentheses are used for grouping. You need to escape them to match literal ( ) characters. You should also escape the dots so they match literal dots instead of "any character".
$ awk '/\(-,username,[^)]+\.co\.uk\)$/ {print}' netgroup
(-,username,domain2.co.uk)
$ awk '/\(-,username,[^)]+\.co\.uk\)\\$/ {print}' netgroup
(-,username,domain1.co.uk)\
If you stick with plain awk and don't use [gn]awk-specific features awk is very portable. More portable than perl is, I would think.
Parentheses must be escaped. Otherwise they group expressions. To be more specific, match an optional backslash at the end of the line (Backslashes are doubled because as string they must be escaped too).
awk 'BEGIN {musr="\\(-,username,[^)]+.co.uk\\)\\\\?$"} {if ($0~musr) print $0}' netgroup

Removing CRLF (0D 0A) from string in Perl

I've got a Perl script which consumes an XML file on Linux and occasionally there are CRLF (Hex 0D0A, Dos new lines) in some of the node values which.
The system which produces the XML file writes it all as a single line, and it looks as if it occasionally decides that this is too long and writes a CRLF into one of the data elements. Unfortunately there's nothing I can do about the providing system.
I just need to remove these from the string before I process it.
I've tried all sorts of regex replacement using the perl char classes, hex values, all sorts and nothing seems to work.
I've even run the input file through dos2unix before processing and I still can't get rid of the erroneous characters.
Does anyone have any ideas?
Many Thanks,
Typical, After battling for about 2 hours, I solved it within 5 minutes of asking the question..
$output =~ s/[\x0A\x0D]//g;
Finally got it.
$output =~ tr/\x{d}\x{a}//d;
These are both whitespace characters, so if the terminators are always at the end, you can right-trim with
$output =~ s/\s+\z//;
A few options:
1. Replace all occurrences of cr/lf with lf: $output =~ s/\r\n/\n/g; #instead of \r\n might want to use \012\015
2. Remove all trailing whitespace: output =~ s/\s+$//g;
3. Slurp and split:
#!/usr/bin/perl -w
use strict;
use LWP::Simple;
sub main{
createfile();
outputfile();
}
main();
sub createfile{
(my $file = $0)=~ s/\.pl/\.txt/;
open my $fh, ">", $file;
print $fh "1\n2\r\n3\n4\r\n5";
close $fh;
}
sub outputfile{
(my $filei = $0)=~ s/\.pl/\.txt/;
(my $fileo = $0)=~ s/\.pl/out\.txt/;
open my $fin, "<", $filei;
local $/; # slurp the file
my $text = <$fin>; # store the text
my #text = split(/(?:\r\n|\n)/, $text); # split on dos or unix newlines
close $fin;
local $" = ", "; # change array scalar separator
open my $fout, ">", $fileo;
print $fout "#text"; # should output numbers separated by comma space
close $fout;
}

Using Perl, how can I replace newlines with commas?

I gave up on sed and I've heard it is better in Perl.
I would like a script that can be called from the 'unix' command line and converts DOS line endings CRLF from the input file and replaces them with commas in the output file:
like
myconvert infile > outfile
where infile was:
1
2
3
and would result in outfile:
1,2,3
I would prefer more explicit code with some minimal comments over "the shortest possible solution", so I can learn from it, I have no perl experience.
In shell, you can do it in many ways:
cat input | xargs echo | tr ' ' ,
or
perl -pe 's/\r?\n/,/' input > output
I know you wanted this to be longer, but I don't really see the point of writing multi line script to solve such simple task - simple regexp (in case of perl solution) is fully workable, and it's not something artificially shortened - it's the type of code that I would use on daily basis to solve the issue at hand.
#!/bin/perl
while(<>) { # Read from stdin one line at a time
s:\r\n:,:g; # Replace CRLF in current line with comma
print; # Write out the new line
}
use strict;
use warnings;
my $infile = $ARGV[0] or die "$0 Usage:\n\t$0 <input file>\n\n";
open(my $in_fh , '<' , $infile) or die "$0 Error: Couldn't open $infile for reading: $!\n";
my $file_contents;
{
local $/; # slurp in the entire file. Limit change to $/ to enclosing block.
$file_contents = <$in_fh>
}
close($in_fh) or die "$0 Error: Couldn't close $infile after reading: $!\n";
# change DOS line endings to commas
$file_contents =~ s/\r\n/,/g;
$file_contents =~ s/,$//; # get rid of last comma
# finally output the resulting string to STDOUT
print $file_contents . "\n";
Your question text and example output were not consistent. If you're converting all line endings to commas, you will end up with an extra comma at the end, from the last line ending. But you example shows only commas between the numbers. I assumed you wanted the code output to match your example and that the question text was incorrect, however if you want the last comma just remove the line with the comment "get rid of last comma".
If any command is not clear, http://perldoc.perl.org/ is your friend (there is a search box at the top right corner).
It's as simple as:
tr '\n' , <infile >outfile
Avoid slurping, don't tack on a trailing comma and print out a well-formed text file (all lines must end in newlines):
#!/usr/bin/perl
use strict;
use warnings;
my $line = <>;
while ( 1 ) {
my $next = <>;
s{(?:\015\012?|\012)+$}{} for $line, $next;
if ( length $next ) {
print $line, q{,};
$line = $next;
}
else {
print $line, "\n";
last;
}
}
__END__
Personally I would avoid having to look a line ahead (as in Sinar's answer). Sometimes you need to but I have sometimes done things wrong in processing the last line.
use strict;
use warnings;
my $outputcomma = 0; # No comma before first line
while ( <> )
{
print ',' if $outputcomma ;
$outputcomma = 1 ; # output commas from now on
s/\r?\n$// ;
print ;
}
print "\n" ;
BTW: In sed, it would be:
sed ':a;{N;s/\r\n/,/;ba}' infile > outfile
with Perl
$\ = "\n"; # set output record separator
$, = ',';
$/ = "\n\n";
while (<>) {
chomp;
#f = split('\s+', $_);
print join($,,#f);
}
in unix, you can also use tools such as awk or tr
awk 'BEGIN{OFS=",";RS=""}{$1=$1}1' file
or
tr "\n" "," < file