Finding partial matches - regex

Say, I have lines like:
SOMETHING.AA.AA.DARKSIDE
BLaH.AA.AA.Blah
I want to find for each line $before = $1; $after = $2; of the $middle = ”AA”
Such that for example for line 1 I get:
$before= “SOMETHING.”
$After = “.AA.DARKSIDE”
And also
$before= “SOMETHING.AA”
$After = “.DARKSIDE”
My code looks like this:
$middle = “AA”;
foreach (#lines){
$line = $_;
while ($line =~m/^(.+)$middle(.+)$/g){
$before = $1;
$after = $2;
}
}
Is there a simple way to change regex in my while?
PS: $middle will be a variable so i cannot hardcode it.
Thank you for help.

Why do you want to use regexes for that?
($before, $after) = split(/$middle\.$middle/, $line);
And then use $before and $after each once without and once with $middle concatenated to the end and start of the string respectively.

Related

Perl regex to replace part of one string with a portion of another

I have a need in Perl to replace a section of one string with most of another. :-) This needs be done for multiple pairs of strings.
For example, I need to replace
"/root_vdm_2/fs_clsnymigration"
within
/root_vdm_2/fs_clsnymigration/CLSNYMIGRATION/NY_HQ_S1
with
rfsn_clsnymigration
so that I end up with
/rfsn_clsnymigration/CLSNYMIGRATION/NY_HQ_S1
(without the leading "/root_vdm_2" part) ... but I am sufficiently sleep-deprived to have lost sight of how to accomplish this.
Help ?
Try this regex:
^\/root_vdm_2\/fs_clsnymigration
Substitute with:
\/rfsn_clsnymigration
example:
$string = "/root_vdm_2/fs_clsnymigration/CLSNYMIGRATION/NY_HQ_S1";
$string=~s/^\/root_vdm_2\/fs_clsnymigration/\/rfsn_clsnymigration/;
print $string;
Output:
/rfsn_clsnymigration/CLSNYMIGRATION/NY_HQ_S1
EDIT 1
$string = "/root_vdm_2/fs_clsnymigration/CLSNYMIGRATION/NY_HQ_S1/LISU,rfsn_clsnymigration
/root_vdm_2/fs_users/users/Marketing,rfsw_users
/root_vdm_3/fs_sandi/sandi_users,rfsw_sandi
/root_vdm_3/fs_pci/Analytics,rfsw_pci
/root_vdm_4/fs_camnt01/camnt01/AV,rfsw_camnt01
/root_vdm_1/fs_stcloud01/sfa,rfss_stcloud01
/root_vdm_3/fs_stcloud03/data4,rfss_stcloud03
/root_vdm_2/fs_stcloud02/depart1,rfss_stcloud02";
$string=~s/^\/root_vdm_.\/fs_[^\/]*/\/rfsn_clsnymigration/gm;
print $string;
Output:
/rfsn_clsnymigration/CLSNYMIGRATION/NY_HQ_S1/LISU,rfsn_clsnymigration
/rfsn_clsnymigration/users/Marketing,rfsw_users
/rfsn_clsnymigration/sandi_users,rfsw_sandi
/rfsn_clsnymigration/Analytics,rfsw_pci
/rfsn_clsnymigration/camnt01/AV,rfsw_camnt01
/rfsn_clsnymigration/sfa,rfss_stcloud01
/rfsn_clsnymigration/data4,rfss_stcloud03
/rfsn_clsnymigration/depart1,rfss_stcloud02
use strict;
use warnings;
while (<DATA>) {
chomp;
my ($lhs, $rhs) = split(/,/, $_, 2);
my #parts = split(/\//, $lhs);
splice(#parts, 1, 2, $rhs);
print join('/', #parts) . "\n";
}
__DATA__
/root_vdm_2/fs_clsnymigration/CLSNYMIGRATION/NY_HQ_S1/LISU,rfsn_clsnymigration
/root_vdm_2/fs_users/users/Marketing,rfsw_users
/root_vdm_3/fs_sandi/sandi_users,rfsw_sandi
/root_vdm_3/fs_pci/Analytics,rfsw_pci
/root_vdm_4/fs_camnt01/camnt01/AV,rfsw_camnt01
/root_vdm_1/fs_stcloud01/sfa,rfss_stcloud01
/root_vdm_3/fs_stcloud03/data4,rfss_stcloud03
/root_vdm_2/fs_stcloud02/depart1,rfss_stcloud02
My challenge was to replace part of $string1 with all of $string2, split on the commas.
/root_vdm_2/fs_clsnymigration/CLSNYMIGRATION/NY_HQ_S1/LISU,rfsn_clsnymigration
/root_vdm_2/fs_users/users/Marketing,rfsw_users
/root_vdm_3/fs_sandi/sandi_users,rfsw_sandi
/root_vdm_3/fs_pci/Analytics,rfsw_pci
/root_vdm_4/fs_camnt01/camnt01/AV,rfsw_camnt01
/root_vdm_1/fs_stcloud01/sfa,rfss_stcloud01
/root_vdm_3/fs_stcloud03/data4,rfss_stcloud03
/root_vdm_2/fs_stcloud02/depart1,rfss_stcloud02
The difficulty I saw initially was how to replace /root_vdm_2/fs_clsnymigration with rfsn_clsnymigration, and I allowed myself to think that a regexp was the best approach.
Although far less eloquent, this gets the job done:
foreach $line (#lines) {
chop $line;
($orig,$replica) = split /\,/, $line;
chop substr $orig, 0, 1;
#pathparts = split /\//, $orig;
$rootvdm = shift #pathparts;
#pathparts[0] = $replica;
$newpath = "/" . join ('/', #pathparts);
print " here's \$newpath:$newpath\n";
}
... which yields something like this:
here's $newpath:/rfsn_clsnymigration/CLSNYMIGRATION/NY_HQ_S1/LISU
here's $newpath:/rfsw_users/users/Marketing
here's $newpath:/rfsw_sandi/sandi_users
here's $newpath:/rfsw_pci/Analytics
here's $newpath:/rfsw_camnt01/camnt01/AV
here's $newpath:/rfss_stcloud01/sfa
here's $newpath:/rfss_stcloud03/data4
here's $newpath:/rfss_stcloud02/depart1

How to find values and use their subtraction in a search/replace with a perl script

I need to get values from lines like <fpage>327</fpage> and <lpage>335</lpage> and use their difference to replace NaNin a line with <page-count count="NaN"/>.
sample input file:
...many lines
<volume>74</volume>
<issue>3</issue>
<fpage>327</fpage>
<lpage>335</lpage>
...many lines
<counts><fig-count count="3"/><table-count count="2"/><equation-count count="0"/><ref-count count="37"/><page-count count="0"/></counts>
...many lines
sample output file desired:
...many lines
<volume>74</volume>
<issue>3</issue>
<fpage>327</fpage>
<lpage>335</lpage>
...many lines
<counts><fig-count count="3"/><table-count count="2"/><equation-count count="0"/><ref-count count="37"/><page-count count="8"/></counts>
...many lines
Here is what I am trying but I am getting <page-count count="0"/>:
while ( <$input> ) {
my $fpage = $1 if $fpage =~ m/<fpage>(\d+)/;
my $lpage = $1 if $lpage =~ m/<lpage>(\d+)/;
my $pages = $lpage - $fpage;
$_ =~ s!<page-count count="NaN"/>!<page-count count="${pages}"/>!;
print {$output} $_;
}
What am I doing wrong?
You're not actually testing the input for fpage and lpage. Try something like:
while ( my $in = <$input> ) {
my $fpage = $1 if $in =~ /<fpage>(\d+)/;
my $lpage = $1 if $in =~ /<lpage>(\d+)/;
my $pages = $lpage - $fpage;
$in =~ s!<page-count count="NaN"/>!<page-count count="${pages}"/>!;
print {$output} $in;
}
Note: This will only work if the entire block of text you're matching and replacing is available in each iteration of the while loop.
1) Your variables $fpage and $lpage go out of scope on each iteration. You can extend the scope by moving their decleration outsidde the while loop.
2) =~ does not do what you seem to want it to do. the command $fpage =~ m/<fpage>(\d+)/ is telling the Regex to search inside the variable $fpage. The default variable to search is $_, so your use of $_ =~ is redundant (and bad style)
3) If NaN occurs multiple times, your current code would only catch the first occurence.
my $fpage;
my $lpage;
while ( <$input> ) {
$fpage = $1 if $fpage =~ m/<fpage>(\d+)/;
$lpage = $1 if $lpage =~ m/<lpage>(\d+)/;
my $pages = $lpage - $fpage;
s!<page-count count="NaN"/>!<page-count count="${pages}"/>!;
print {$output} $_;
}
This performs no checks that you actually find fpage and lpage before the page-count.

match using regex in perl

HI I am trying to exract some data from a text file in perl. My file looks like this
Name:John
FirstName:Smith
Name:Alice
FirstName:Meyers
....
I want my string to look like John Smith and Alice Meyers
I tried something like this but I'm stuck and I don't know how to continue
while (<INPUT>) {
if (/^[Name]/) {
$match =~ /(:)(.*?)(\n) /
$string = $string.$2;
}
if (/^[FirstName]/) {
$match =~ /(:)(.*?)(\n)/
$string = $string.$2;
}
}
What I try to do is that when I match Name or FirstName to copy to content between : and \n but I get confused which is $1 and $2
This will put you first and last names in a hash:
use strict;
use warnings;
use Data::Dumper;
open my $in, '<', 'in.txt';
my (%data, $names, $firstname);
while(<$in>){
chomp;
($names) = /Name:(.*)/ if /^Name/;
($firstname) = /FirstName:(.*)/ if /^FirstName/;
$data{$names} = $firstname;
}
print Dumper \%data;
Through perl one-liner,
$ perl -0777 -pe 's/(?m).*?Name:([^\n]*)\nFirstName:([^\n]*).*/\1 \2/g' file
John Smith
Alice Meyers
while (<INPUT>) {
/^([A-Za-z])+\:\s*(.*)$/;
if ($1 eq 'Name') {
$surname = $2;
} elsif ($1 eq 'FirstName') {
$completeName = $2 . " " . $surname;
} else {
/* Error */
}
}
You might want to add some error handling, e.g. make sure that a Name is always followed by a FirstName and so on.
$1 $2 $3 .. $N , it's the capture result of () inside regex.
If you do something like that , you cant avoid using $1 like variables.
my ($matched1,$matched2) = $text =~ /(.*):(.*)/
my $names = [];
my $name = '';
while(my $row = <>){
$row =~ /:(.*)/;
$name = $name.' '.$1;
push(#$names,$name) if $name =~ / /;
$name = '' if $name =~ / /;
}
`while(<>){
}
`
open (FH,'abc.txt');
my(%hash,#array);
map{$_=~s/.*?://g;chomp($_);push(#array,$_)} <FH>;
%hash=#array;
print Dumper \%hash;

replace starting characters in string

I have a string in which i need to replace the starting set of characters with mod1.
Its like xyz_gf_111_yz to mod1_111_yz.
bcd_df_222_xx to mod2_222_xx and so on.
can anybody suggest sol, as the starting string is not fixed and im beginner in perl
thanks!
my #strings = qw(xyz_gf_111_yz bcd_df_222_xx asd_cv_333_dd);
my $i = 1;
for my $str (#strings)
{
my $after = $str;
$after =~ s/^\w{3}[_]\w{2}/mod$i/;
$i++;
print "$str -> $after\n";
}
Something like the following could get you started:
my #strings = qw(xyz_gf_111_yz bcd_df_222_xx);
my $i = 0;
for my $str (#strings) {
my $after = $str;
$i++;
$after =~ s/[^_]+/mod$i/;
print "$str -> $after\n";
}
#Miller,
I suggest a different solution, assuming that you want to replace the starting substring (all chars to the left the first digit) and the associated digit to the "mod" string is given by the first digit of the number substring the following could be a way.
my #strings = qw(xyz_gf_111_yz bcd_df_222_xx asd_cv_333_dd);
for my $str (#strings) {
print "bfr:".$str."\n";
$str =~ s/^([^\d]+?)_(\d)/mod$2_$2/;
print "aft:".$str."\n";
}
Here's another option:
use strict;
use warnings;
my $i;
my #strings = ( 'xyz_gf_111_yz', 'bcd_df_222_xx' );
for (#strings) {
print $_, "\n" if s/.+?_[^_]+/'mod'.++$i/e;
}
Output:
mod1_111_yz
mod2_222_xx

adding regex capabilities on simple cgi search

I have this simple cgi script working just fine but I want to add regex capabilities. is that possible? if so what I need to add. thanks.
#!/usr/local/bin/perl
read(STDIN, $buffer,$ENV{'CONTENT_LENGTH'});
#pairs = split(/&/, $buffer);
foreach $pair (#pairs) {
($key, $value) = split(/=/, $pair);
foreach $pair (#pairs) {
($key, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
$value =~ s/%([a-zA-Z0-9][a-zA-Z0-9])/pack("C", hex($1))/eg;
$formdata{$key}.= "$value";
}
}
$search = $formdata{'search'};
open(INFO, "/test/myfile");
#array=<INFO>;
close (INFO);
...code truncate
To find lines that end with ".cgi":
my #array = grep /\.cgi$/, <INFO>;