Given a url the following regular expression is able insert/substitute in words at certain points in the urls.
Code:
#!/usr/bin/perl
use strict;
use warnings;
#use diagnostics;
my #insert_words = qw/HELLO GOODBYE/;
my $word = 0;
my $match;
while (<DATA>) {
chomp;
foreach my $word (#insert_words)
{
my $repeat = 1;
while ((my $match=$_) =~ s|(?<![/])(?:[/](?![/])[^/]*){$repeat}[^/]*\K|$word|)
{
print "$match\n";
$repeat++;
}
print "\n";
}
}
__DATA__
http://www.stackoverflow.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/
10.15.16.17/dog/cat/rabbit/
The output given (for the first example url in __DATA__ with the HELLO word):
http://www.stackoverflow.com/dogHELLO/cat/rabbit/
http://www.stackoverflow.com/dog/catHELLO/rabbit/
http://www.stackoverflow.com/dog/cat/rabbitHELLO/
http://www.stackoverflow.com/dog/cat/rabbit/HELLO
Where I am now stuck:
I would now like to alter the regular expression so that the output will look like what is shown below:
http://www.stackoverflow.com/dogHELLO/cat/rabbit/
http://www.stackoverflow.com/dog/catHELLO/rabbit/
http://www.stackoverflow.com/dog/cat/rabbitHELLO/
http://www.stackoverflow.com/dog/cat/rabbit/HELLO
#above is what it already does at the moment
#below is what i also want it to be able to do as well
http://www.stackoverflow.com/HELLOdog/cat/rabbit/ #<-puts the word at the start of the string
http://www.stackoverflow.com/dog/HELLOcat/rabbit/
http://www.stackoverflow.com/dog/cat/HELLOrabbit/
http://www.stackoverflow.com/dog/cat/rabbit/HELLO
http://www.stackoverflow.com/HELLO/cat/rabbit/ #<- now also replaces the string with the word
http://www.stackoverflow.com/dog/HELLO/rabbit/
http://www.stackoverflow.com/dog/cat/HELLO/
http://www.stackoverflow.com/dog/cat/rabbit/HELLO
But I am having trouble getting it to automatically do this within the one regular expression.
Any help with this matter would be highly appreciated, many thanks
One solution:
use strict;
use warnings;
use URI qw( );
my #insert_words = qw( HELLO );
while (<DATA>) {
chomp;
my $url = URI->new($_);
my $path = $url->path();
for (#insert_words) {
# Use package vars to communicate with /(?{})/ blocks.
local our $insert_word = $_;
local our #paths;
$path =~ m{
^(.*/)([^/]*)((?:/.*)?)\z
(?{
push #paths, "$1$insert_word$2$3";
if (length($2)) {
push #paths, "$1$insert_word$3";
push #paths, "$1$2$insert_word$3";
}
})
(?!)
}x;
for (#paths) {
$url->path($_);
print "$url\n";
}
}
}
__DATA__
http://www.stackoverflow.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/
http://10.15.16.17/dog/cat/rabbit/
Without crazy regexes:
use strict;
use warnings;
use URI qw( );
my #insert_words = qw( HELLO );
while (<DATA>) {
chomp;
my $url = URI->new($_);
my $path = $url->path();
for my $insert_word (#insert_words) {
my #parts = $path =~ m{/([^/]*)}g;
my #paths;
for my $part_idx (0..$#parts) {
my $orig_part = $parts[$part_idx];
local $parts[$part_idx];
{
$parts[$part_idx] = $insert_word . $orig_part;
push #paths, join '', map "/$_", #parts;
}
if (length($orig_part)) {
{
$parts[$part_idx] = $insert_word;
push #paths, join '', map "/$_", #parts;
}
{
$parts[$part_idx] = $orig_part . $insert_word;
push #paths, join '', map "/$_", #parts;
}
}
}
for (#paths) {
$url->path($_);
print "$url\n";
}
}
}
__DATA__
http://www.stackoverflow.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/
http://10.15.16.17/dog/cat/rabbit/
one more solution:
#!/usr/bin/perl
use strict;
use warnings;
my #insert_words = qw/HELLO GOODBYE/;
while (<DATA>) {
chomp;
/(?<![\/])(?:[\/](?![\/])[^\/]*)/p;
my $begin_part = ${^PREMATCH};
my $tail = ${^MATCH} . ${^POSTMATCH};
my #tail_chunks = split /\//, $tail;
foreach my $word (#insert_words) {
for my $index (1..$#tail_chunks) {
my #new_tail = #tail_chunks;
$new_tail[$index] = $word . $tail_chunks[$index];
my $str = $begin_part . join "/", #new_tail;
print $str, "\n";
$new_tail[$index] = $tail_chunks[$index] . $word;
$str = $begin_part . join "/", #new_tail;
print $str, "\n";
}
print "\n";
}
}
__DATA__
http://www.stackoverflow.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/
10.15.16.17/dog/cat/rabbit/
Related
I have an XML file. I need to replace the digits in comment="18" with comment="my string" where my string is from my #array ($array[18] = my string).
<rule ccType="inst" comment="18" domain="icc" entityName="thens" entityType="toggle" excTime="1605163966" name="exclude" reviewer="hpanjali" user="1" vscope="default"></rule>
This is what I have tried.
while (my $line = <FH>) {
chomp $line;
$line =~ s/comment="(\d+)"/comment="$values[$1]"/ig;
#print "$line \n";
print FH1 $line, "\n";
}
Here is an example using XML::LibXML:
use strict;
use warnings;
use XML::LibXML;
my $fn = 'test.xml';
my #array = map { "string$_" } 0..20;
my $doc = XML::LibXML->load_xml(location => $fn);
for my $node ($doc->findnodes('//rule')) {
my $idx = $node->getAttribute('comment');
$node->setAttribute('comment', $array[$idx]);
}
print $doc->toString();
Here's an XML::Twig example. It's basically the same idea as the XML::LibXML example done in a different way with a different tool:
use XML::Twig;
my $xml =
qq(<rule ccType="inst" comment="18"></rule>);
my #array;
$array[18] = 'my string';
my $twig = XML::Twig->new(
twig_handlers => {
rule => \&update_comment,
},
);
$twig->parse( $xml );
$twig->print;
sub update_comment {
my( $t, $e ) = #_;
my $n = $e->{att}{comment};
$e->set_att( comment => $array[$n] );
}
I have a list like:
DD2aaQQmmm
AA34DDmm
And i want to print the upper case, the lower case and the numbers separately, so it would be like:
DDQQ aammm 2
AADD mm 34
How should I do this in perl, using regex?
I have tried for the upper case this:
#!/usr/bin/perl
my #array = <>;
chomp(#array);
foreach (#array){
if ($_ = ~ /([A-Z][a-z])/){
print $_ "\n"
}
}
But this only prints out the words starts with upper case.
The task is quite simple -- separate the wheat from the chaff
use strict;
use warnings;
use feature 'say';
while(my $line = <DATA>) {
my #out;
$out[0] = $line =~ s/[^A-Z]//gr;
$out[1] = $line =~ s/[^a-z]//gr;
$out[2] = $line =~ s/[^0-9]//gr;
say join(' ', #out);
}
__DATA__
DD2aaQQmmm
AA34DDmm
Output
DDQQ aammm 2
AADD mm 34
Of cause the final script can read from pipe or file given on command line
use strict;
use warnings;
use feature 'say';
while(<>) {
my #out;
$out[0] = s/[^A-Z]//gr;
$out[1] = s/[^a-z]//gr;
$out[2] = s/[^0-9]//gr;
say join(' ', #out);
}
Another way to go is to use tr operator.
use Modern::Perl;
while(my $str = <DATA>) {
chomp $str;
my #out;
push #out, $str =~ tr/A-Z//cdr;
push #out, $str =~ tr/a-z//cdr;
push #out, $str =~ tr/0-9//cdr;
say "#out";
}
__DATA__
DD2aaQQmmm
AA34DDmm
Output:
DDQQ aammm 2
AADD mm 34
tr is about 6 times quicker than s///, here is a benchmarck:
use Modern::Perl;
use Benchmark qw(:all);
my $str = "DD2aaQQmmm";
my $count = -3;
cmpthese($count, {
'tr' => sub {
my #out;
push #out, $str =~ tr/A-Z//cdr;
push #out, $str =~ tr/a-z//cdr;
push #out, $str =~ tr/0-9//cdr;
},
'subst' => sub {
my #out;
$out[0] = $str =~ s/[^A-Z]//gr;
$out[1] = $str =~ s/[^a-z]//gr;
$out[2] = $str =~ s/[^0-9]//gr;
},
});
Output:
Rate subst tr
subst 58165/s -- -84%
tr 357629/s 515% --
Polar Bears solution is absolutely correct - but needs perl 5.14+ Here is a simple workaround for older perl versions.
use strict;
use warnings;
while(my $line = <DATA>) {
my #out = ($line, $line, $line);
$out[0] =~ s/[^A-Z]//g;
$out[1] =~ s/[^a-z]//g;
$out[2] =~ s/[^0-9]//g;
print join(' ', #out) . "\n";
}
__DATA__
DD2aaQQmmm
AA34DDmm
if we are in the following case:
my $str = <<EO_STR;
Name=Value1 Adress=Value4
Name=Value2 Adress=Value5
Name=Value3 Adress=Value6
EO_STR
I have a table "T1" in the database with columns: ("Name", "Address") and I want to put on the column "Name" values "value1,Value2,Value3" and on the column "Adress" values "Value4,Value5,Value6"
in this case we have :
my #matches = $str =~ /Name=(.*?)\nAdress=(.*?)\n/g;
how can we use $1 and $2 with #matches in order to get separately all occurence of Name and Adresse in order to insert them on the Table T1?
All captures of all matches are returned, so you'd have to group them up.
use List::Util 1.29 qw( pairs );
for ( pairs( $str =~ /Name=(.*) Address=(.*)/g ) ) {
my #matches = #$_;
...
}
That said, it's far more common to grab the matches iteratively.
while ($str =~ /Name=(.*) Address=(.*)/g) {
my #matches = ( $1, $2 );
...
}
Regex is not always the right tool for the job. Your data looks a lot like it's just key/value pairs. Use split to break it up. No need for a pattern match here.
Your code and data doesn't match, so I've gone with what the code said.
use strict;
use warnings;
my $str = <<EO_STR;
Name=Value1
Adress=Value4
Name=Value2
Adress=Value5
Name=Value3
Adress=Value6
EO_STR
my $fields;
foreach my $pair (split /\n/, $str) {
my ($key, $value) = split /=/, $pair;
$key =~ s/^\s+//;
push #{ $fields->{$key} }, $value;
}
use Data::Dumper;
print Dumper $fields;
The code will create this data structure:
$VAR1 = {
'Name' => [
'Value1',
'Value2',
'Value3'
],
'Adress' => [
'Value4',
'Value5',
'Value6'
]
};
You can now access these two array references and use them to insert data into your table.
I have done the following:
#!/usr/bin/env perl
use v5.28;
my $str = <<EO_STR;
Name=Value1 Adress=Value4
Name=Value2 Adress=Value5
Name=Value3 Adress=Value6
EO_STR
my #array;
for my $a (split(/\n/, $str)) {
my %res = $a =~ m/(\w+)=(\w+)/g;
push #array, \%res;
}
for my $a (#array) {
for my $b (sort keys %{$a}) {
"\n", <INPUT_FILE> ); say $b.'->'.$a->{$b};
}
}
It creates this structure:
#array = [
{
Name->Value1,
Adress->Value4
},
...
];
I have a file with lines similar to following:
abcd1::101:xyz1,user,user1,abcd1,pqrs1,userblah,abcd1
I want to retain strings up to last ":" and remove all occurrences of abcd1
In the end, I need to have below:
abcd1::101:xyz1,xyz2,xyz3,pqrs1,xyz4
I tried code as below, but for some reason, it is not working. So please help
the account name is "abcd1"
sub UpdateEtcGroup {
my $account = shift;
my $file = "/tmp/group";
#ARGV = ($file);
$^I = ".bak";
while (<>){
s#^($account::\d{1,$}:)$account,?#$1#g;
s/,$//; # to remove the last "," if there
print;
}
}
split is the tool for the job, not a regex.
Because split lets you reliably separate out the field you do want to operate on, from the ones that you don't. Like this:
#!/usr/bin/env perl
use strict;
use warnings;
my $username = 'abcd1';
while ( <DATA> ) {
my #fields = split /:/;
my #users = split ( /,/, pop ( #fields ) );
print join ( ":", #fields,
join ( ",", grep { not m/^$username$/ } #users ) ),"\n";
}
__DATA__
abcd1::101:xyz1,user,user1,abcd1,pqrs1,userblah,abcd1
Don't use a regular expression for this.
use strict;
use warnings;
while (<DATA>) {
chomp;
my #parts = split(/:/, $_);
$parts[-1] = join(',', grep { !/^abcd/ } split(/,/, $parts[-1]));
print join(':', #parts) . "\n";
}
__DATA__
abcd1::101:xyz1,user,user1,abcd1,pqrs1,userblah,abcd1
abcd2::102:user1,xyz2,otheruser,abcd2,pqrs1,xyz4,abcd2
Output:
abcd1::101:xyz1,user,user1,pqrs1,userblah
abcd2::102:user1,xyz2,otheruser,pqrs1,xyz4
The problem:
Find pieces of text in a file enclosed by # and replace the inside
Input:
#abc# abc #ABC#
cba #cba CBA#
Deisred output:
абц abc АБЦ
cba цба ЦБА
I have the following:
#!/usr/bin/perl
use strict;
use warnings;
use Encode;
my $output;
open FILE,"<", 'test.txt';
while (<FILE>) {
chomp(my #chars = split(//, $_));
for (#chars) {
my #char;
$_ =~ s/a/chr(0x430)/eg;
$_ =~ s/b/chr(0x431)/eg;
$_ =~ s/c/chr(0x446)/eg;
$_ =~ s/d/chr(0x434)/eg;
$_ =~ s/e/chr(0x435)/eg;
$_ =~ s/A/chr(0x410)/eg;
$_ =~ s/B/chr(0x411)/eg;
$_ =~ s/C/chr(0x426)/eg;
push #char, $_;
$output = join "", #char;
print encode("utf-8",$output);}
print "\n";
}
close FILE;
But I'm stuck on how to process further
Thanks for help in advance!
Kluther
Here my solution. (you will fixed it, yes. It is prototype)
for (my $data = <DATA>){
$data=~s/[#]([\s\w]+)[#]/func($1)/ge;
print $data;
# while($data=~m/[#]([\s\w]+)[#]/g){
# print "marked: ",$1,"\n";
# print "position:", pos();
# }
# print "not marked: ";
}
sub func{
#do your magic here ;)
return "<< #_ >>";
}
__DATA__
#abc# abc #ABC# cba #cba CBA#
What happens here?
First, I read data. You can do it yourself.
for (my $data = <DATA>){...}
Next, I need to search your pattern and replace it.
What should I do?
Use substition operator: s/pattern/replace/
But in interesting form:
s/pattern/func($1)/ge
Key g mean Global Search
Key e mean Evaluate
So, I think, that you need to write your own func function ;)
Maybe better to use transliteration operator: tr/listOfSymbolsToBeReplaced/listOfSymbolsThatBePlacedInstead/
With minimal changes to your algorithm you need to keep track of whether you are inside the #marks or not. so add something like this
my $bConvert = 0;
chomp(my #chars = split(//, $_));
for (#chars) {
my $char = $_;
if (/#/) {
$bConvert = ($bConvert + 1) % 2;
next;
}
elsif ($bConvert) {
$char =~ s/a/chr(0x430)/eg;
$char =~ s/b/chr(0x431)/eg;
$char =~ s/c/chr(0x446)/eg;
$char =~ s/d/chr(0x434)/eg;
$char =~ s/e/chr(0x435)/eg;
$char =~ s/A/chr(0x410)/eg;
$char =~ s/B/chr(0x411)/eg;
$char =~ s/C/chr(0x426)/eg;
}
print encode("utf-8",$char);
}
Try this after $output is processed.
$output =~ s/\#//g;
my #split_output = split(//, $output);
$output = "";
my $len = scalar(#split_output) ;
while ($len--) {
$output .= shift(#split_output);
}
print $output;
It can be done with a single regex and no splitting of the string:
use strict;
use warnings;
use Encode;
my %chars = (
a => chr(0x430),
b => chr(0x431),
c => chr(0x446),
d => chr(0x434),
e => chr(0x435),
A => chr(0x410),
B => chr(0x411),
C => chr(0x426),
);
my $regex = '(' . join ('|', keys %chars) . ')';
while (<DATA>) {
1 while ($_ =~ s|\#(?!\s)[^#]*?\K$regex(?=[^#]*(?!\s)\#)|$chars{$1}|eg);
print encode("utf-8",$_);
}
It does require repeated runs of the regex due to the overlapping nature of the matches.