Perl Parse $1 from ARGV - regex

Is it possible to parse a $1 from ARGV in Perl?
I am trying to parse a passed in $1 from ARGV, but its treating it as a literal string, rather than the $1.
This is just a simple script trying to show proof of concept.
#!/usr/bin/perl
my $from = '^(.*/)([^/]*)$';
my $dir = "/var/foo/baz";
$dir =~ /$from/;
# This works and prints the expected output I would like
print "Dir $1\n";
print "File $2\n";
# This is printing a literal '$1' and '$2'
print "Dir $ARGV[0]\n";
print "File $ARGV[1]\n";
This is what I am running and my output:
$ ./test.pl '$1' '$2'
Dir /var/foo/
File baz
Dir $1
File $2

You want to use String::Substitution.
$ perl -E'
use String::Substitution qw( interpolate_match_vars last_match_vars );
my ($str, $pat, $x_template, $y_template) = #ARGV;
$str =~ $pat
or die("Didn'\''t match\n");
my $x = interpolate_match_vars($x_template, last_match_vars());
my $y = interpolate_match_vars($y_template, last_match_vars());
say "x: $x";
say "y: $y";
' \
abcdef '(.)c(.)' '$1' '$2'
x: b
y: d

Related

Delete everything except what matches a regex pattern

I have a string and I want to replace everything but the pattern.
Right now I know what I want to do is
$line =~ s/[A-Z]{4}[0-9]{4}//g;
but inverted so that it replaces everything with nothing except the pattern.
It is a bit unclear what you are asking, but you may wish to try something like the following, which captures the pattern and then replaces the line with the capture:
#!/usr/bin/env perl
use warnings;
use strict;
my #lines = (
'HELLO WORLD',
'HELLO ABCD1234 WORLD',
'HELLOABCD1234WORLD',
'H E L LO ABCD1234 WORLD',
);
my $re_match = qr([A-Z]{4}[0-9]{4});
for my $line (#lines) {
print "$line => ";
if ($line =~ $re_match) {
$line =~ s|^.*($re_match).*$|$1|;
print $line . "\n";
} else {
print "does not match pattern $re_match \n";
}
}
Output
HELLO WORLD => does not match pattern (?^:[A-Z]{4}[0-9]{4})
HELLO ABCD1234 WORLD => ABCD1234
HELLOABCD1234WORLD => ABCD1234
H E L LO ABCD1234 WORLD => ABCD1234
perl -E '$_="xxABCD1234xxABCD1234xx"; #m = $_ =~ /[A-Z]{4}[0-9]{4}/g; #m and $_ = join "", #m; say'
Output:
ABCD1234ABCD1234

translating awk script into perl

I'm trying to translate this code into perl.
gawk '/^>c/ {OUT=substr($0,2) ".fa";print " ">OUT}; OUT{print >OUT}' your_input
Can someone help me?
Perl has a utility to do this for you called a2p. If your script is call script.awk then you would run:
$ a2p script.awk
Which produces:
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if $running_under_some_shell;
# this emulates #! processing on NIH machines.
# (remove #! line above if indigestible)
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
# process any FOO=bar switches
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
while (<>) {
chomp; # strip record separator
if (/^>c/) {
$OUT = substr($_, (2)-1) . '.fa';
&Pick('>', $OUT) &&
(print $fh ' ');
}
;
if ($OUT) {
&Pick('>', $OUT) &&
(print $fh $_);
}
}
sub Pick {
local($mode,$name,$pipe) = #_;
$fh = $name;
open($name,$mode.$name.$pipe) unless $openammeamme}++;
}
To save this to a file, use redirection:
$ a2p script.awk > script.pl
Perl also provides a tool for converting sed scripts: s2p.
#!/usr/bin/perl
my ($outf,$OUT) ;
while(<>){
if(/^>(c.*)/){ $OUT = "$1.fa";
close($outf) if $outf;
open($outf,">",$OUT);
print OUT " \n"}
if($outf){ print $outf $_ }
}
if input is:
>caaa
sdf
sdff
>cbbb
ew
ew
Creats 2 files:
==> caaa.fa <==
>caaa
sdf
sdff
==> cbbb.fa <==
>cbbb
ew
ew
This perl one liner should be equivalent of that awk command:
perl -ane 'if($F[0] =~ /^>c/){$OUT=substr($F[0],1).".fa"; if(OUT==null) {open(OUT,">$OUT");} print OUT " \n"} if ($OUT){print OUT $_} END{close(OUT)}' file
Indented command line:
perl -ane 'if ($F[0] =~ /^>c/) {
$OUT = substr($F[0], 1).".fa";
if (OUT==null) { open(OUT, ">$OUT") }
print OUT " \n"
}
if ($OUT) {
print OUT $_
}
END{close(OUT)
}' file

match using regex in perl

HI I am trying to exract some data from a text file in perl. My file looks like this
Name:John
FirstName:Smith
Name:Alice
FirstName:Meyers
....
I want my string to look like John Smith and Alice Meyers
I tried something like this but I'm stuck and I don't know how to continue
while (<INPUT>) {
if (/^[Name]/) {
$match =~ /(:)(.*?)(\n) /
$string = $string.$2;
}
if (/^[FirstName]/) {
$match =~ /(:)(.*?)(\n)/
$string = $string.$2;
}
}
What I try to do is that when I match Name or FirstName to copy to content between : and \n but I get confused which is $1 and $2
This will put you first and last names in a hash:
use strict;
use warnings;
use Data::Dumper;
open my $in, '<', 'in.txt';
my (%data, $names, $firstname);
while(<$in>){
chomp;
($names) = /Name:(.*)/ if /^Name/;
($firstname) = /FirstName:(.*)/ if /^FirstName/;
$data{$names} = $firstname;
}
print Dumper \%data;
Through perl one-liner,
$ perl -0777 -pe 's/(?m).*?Name:([^\n]*)\nFirstName:([^\n]*).*/\1 \2/g' file
John Smith
Alice Meyers
while (<INPUT>) {
/^([A-Za-z])+\:\s*(.*)$/;
if ($1 eq 'Name') {
$surname = $2;
} elsif ($1 eq 'FirstName') {
$completeName = $2 . " " . $surname;
} else {
/* Error */
}
}
You might want to add some error handling, e.g. make sure that a Name is always followed by a FirstName and so on.
$1 $2 $3 .. $N , it's the capture result of () inside regex.
If you do something like that , you cant avoid using $1 like variables.
my ($matched1,$matched2) = $text =~ /(.*):(.*)/
my $names = [];
my $name = '';
while(my $row = <>){
$row =~ /:(.*)/;
$name = $name.' '.$1;
push(#$names,$name) if $name =~ / /;
$name = '' if $name =~ / /;
}
`while(<>){
}
`
open (FH,'abc.txt');
my(%hash,#array);
map{$_=~s/.*?://g;chomp($_);push(#array,$_)} <FH>;
%hash=#array;
print Dumper \%hash;

Find text enclosed by # and replace the inside

The problem:
Find pieces of text in a file enclosed by # and replace the inside
Input:
#abc# abc #ABC#
cba #cba CBA#
Deisred output:
абц abc АБЦ
cba цба ЦБА
I have the following:
#!/usr/bin/perl
use strict;
use warnings;
use Encode;
my $output;
open FILE,"<", 'test.txt';
while (<FILE>) {
chomp(my #chars = split(//, $_));
for (#chars) {
my #char;
$_ =~ s/a/chr(0x430)/eg;
$_ =~ s/b/chr(0x431)/eg;
$_ =~ s/c/chr(0x446)/eg;
$_ =~ s/d/chr(0x434)/eg;
$_ =~ s/e/chr(0x435)/eg;
$_ =~ s/A/chr(0x410)/eg;
$_ =~ s/B/chr(0x411)/eg;
$_ =~ s/C/chr(0x426)/eg;
push #char, $_;
$output = join "", #char;
print encode("utf-8",$output);}
print "\n";
}
close FILE;
But I'm stuck on how to process further
Thanks for help in advance!
Kluther
Here my solution. (you will fixed it, yes. It is prototype)
for (my $data = <DATA>){
$data=~s/[#]([\s\w]+)[#]/func($1)/ge;
print $data;
# while($data=~m/[#]([\s\w]+)[#]/g){
# print "marked: ",$1,"\n";
# print "position:", pos();
# }
# print "not marked: ";
}
sub func{
#do your magic here ;)
return "<< #_ >>";
}
__DATA__
#abc# abc #ABC# cba #cba CBA#
What happens here?
First, I read data. You can do it yourself.
for (my $data = <DATA>){...}
Next, I need to search your pattern and replace it.
What should I do?
Use substition operator: s/pattern/replace/
But in interesting form:
s/pattern/func($1)/ge
Key g mean Global Search
Key e mean Evaluate
So, I think, that you need to write your own func function ;)
Maybe better to use transliteration operator: tr/listOfSymbolsToBeReplaced/listOfSymbolsThatBePlacedInstead/
With minimal changes to your algorithm you need to keep track of whether you are inside the #marks or not. so add something like this
my $bConvert = 0;
chomp(my #chars = split(//, $_));
for (#chars) {
my $char = $_;
if (/#/) {
$bConvert = ($bConvert + 1) % 2;
next;
}
elsif ($bConvert) {
$char =~ s/a/chr(0x430)/eg;
$char =~ s/b/chr(0x431)/eg;
$char =~ s/c/chr(0x446)/eg;
$char =~ s/d/chr(0x434)/eg;
$char =~ s/e/chr(0x435)/eg;
$char =~ s/A/chr(0x410)/eg;
$char =~ s/B/chr(0x411)/eg;
$char =~ s/C/chr(0x426)/eg;
}
print encode("utf-8",$char);
}
Try this after $output is processed.
$output =~ s/\#//g;
my #split_output = split(//, $output);
$output = "";
my $len = scalar(#split_output) ;
while ($len--) {
$output .= shift(#split_output);
}
print $output;
It can be done with a single regex and no splitting of the string:
use strict;
use warnings;
use Encode;
my %chars = (
a => chr(0x430),
b => chr(0x431),
c => chr(0x446),
d => chr(0x434),
e => chr(0x435),
A => chr(0x410),
B => chr(0x411),
C => chr(0x426),
);
my $regex = '(' . join ('|', keys %chars) . ')';
while (<DATA>) {
1 while ($_ =~ s|\#(?!\s)[^#]*?\K$regex(?=[^#]*(?!\s)\#)|$chars{$1}|eg);
print encode("utf-8",$_);
}
It does require repeated runs of the regex due to the overlapping nature of the matches.

How can I count the amount of spaces at the start of a string in Perl?

How can I count the amount of spaces at the start of a string in Perl?
I now have:
$temp = rtrim($line[0]);
$count = ($temp =~ tr/^ //);
But that gives me the count of all spaces.
$str =~ /^(\s*)/;
my $count = length( $1 );
If you just want actual spaces (instead of whitespace), then that would be:
$str =~ /^( *)/;
Edit: The reason why tr doesn't work is it's not a regular expression operator. What you're doing with $count = ( $temp =~ tr/^ // ); is replacing all instances of ^ and with itself (see comment below by cjm), then counting up how many replacements you've done. tr doesn't see ^ as "hey this is the beginning of the string pseudo-character" it sees it as "hey this is a ^".
You can get the offset of a match using #-. If you search for a non-whitespace character, this will be the number of whitespace characters at the start of the string:
#!/usr/bin/perl
use strict;
use warnings;
for my $s ("foo bar", " foo bar", " foo bar", " ") {
my $count = $s =~ /\S/ ? $-[0] : length $s;
print "'$s' has $count whitespace characters at its start\n";
}
Or, even better, use #+ to find the end of the whitespace:
#!/usr/bin/perl
use strict;
use warnings;
for my $s ("foo bar", " foo bar", " foo bar", " ") {
$s =~ /^\s*/;
print "$+[0] '$s'\n";
}
Here's a script that does this for every line of stdin. The relevant snippet of code is the first in the body of the loop.
#!/usr/bin/perl
while ($x = <>) {
$s = length(($x =~ m/^( +)/)[0]);
print $s, ":", $x, "\n";
}
tr/// is not a regex operator. However, you can use s///:
use strict; use warnings;
my $t = (my $s = " \t\n sdklsdjfkl");
my $n = 0;
++$n while $s =~ s{^\s}{};
print "$n \\s characters were removed from \$s\n";
$n = ( $t =~ s{^(\s*)}{} ) && length $1;
print "$n \\s characters were removed from \$t\n";
Since the regexp matcher returns the parenthesed matches when called in a list context, CanSpice's answer can be written in a single statement:
$count = length( ($line[0] =~ /^( *)/)[0] );
This prints amount of white space
echo " hello" |perl -lane 's/^(\s+)(.*)+$/length($1)/e; print'
3