Find and replace any or both patterns in a string - regex

I have list of urls. I need to strip off the protocol from it.
Some may have only http:// in it some may have www in it or some both.
I have written the code for it as:
my #list = qw'http://de.yahoo.com http://mail.example.org http://www.aol.com';
foreach(#list)
{
my $string = $_;
$string =~ s/http:\/\///;
$string =~ s/www.//;
print $string,"\n";
}
It works fine but is there a better way to write it in one line?

This should do the trick:
my #list = qw(http://de.yahoo.com http://mail.example.org http://www.aol.com);
foreach(#list)
{
my $string = $_;
$string =~ s/^(?:http:\/\/)?(?:www\.)?//;
print $string,"\n";
}
For future reference, http://www.regextester.com/ is your friend :)
** Edit ** Modified to use ikegami's suggestion of (?:...) as it should be more efficient when the values captured are not needed.

I guess you may want:
s!^(http://)?(www\.)?!!;
A few points:
use s!a!b! instead of s/a/b/, this save the \/\/ escape.
use ^, this ensure http:// is at the start of string
As a single line:
print join("\n", map {s!^(http://)?(www\.)?!!;} #list);

Yes:
s{http://(.*)www.|www.(.*)http://|http://|www.}{$1$2}g;
But you probably meant to do:
s{^http://}[};
s{^www\.}[};
which can be combined into:
s{^(?:http://)?(?:www\.)?}{};
http://www.foo.bar/www.html?http://xxx => foo.bar/www.html?http://xxx
http://foo.bar/www.html => foo.bar/www.html?http://xxx
www.foo.bar/www.html => foo.bar/www.html?http://xxx
foo.bar/www.html => foo.bar/www.html?http://xxx

Related

Perl regular expression to get 'This text':(

for example , if i have 'Sample text':(
My $1 should be Sample text, so any string which is enclosed in ' ' and is before :(
The following will do the trick:
/'([^']*)':\(/
For example,
my $str = "'Sample text':(";
my ($match) = $str =~ /'([^']*)':\(/
or die("No match\n");
say $match; # Sample text
Here is the simplest code to answer the question:
#/usr/bin/perl
use strict;
"'Sample text':(" =~ /'(.*)':\(/;
print "$1\n"
I'm not adding anything new here but just some samples and a place for you to test it yourself and see perhaps how it works. There are two assumptions here:
No escape characters, such as 'Sam\'ple text'.
:( is directly after your string. If it's not, you'd want to do a lookahead instead.
+ requires at least one character in your string. So 'a':( would be valid but '':( would not. If you want to allow empty strings, use * instead of +.
'([^']+)':\(

Perl regex return matches from substitution

I am trying to simultaneously remove and store (into an array) all matches of some regex in a string.
To return matches from a string into an array, you could use
my #matches = $string=~/$pattern/g;
I would like to use a similar pattern for a substitution regex. Of course, one option is:
my #matches = $string=~/$pattern/g;
$string =~ s/$pattern//g;
But is there really no way to do this without running the regex engine over the full string twice? Something like
my #matches = $string=~s/$pattern//g
Except that this will only return the number of subs, regardless of list context. I would also take, as a consolation prize, a method to use qr// where I could simply modify the quoted regex to to a sub regex, but I don't know if that's possible either (and that wouldn't preclude searching the same string twice).
Perhaps the following will be helpful:
use warnings;
use strict;
my $string = 'I thistle thing am thinking this Thistle a changed thirsty string.';
my $pattern = '\b[Tt]hi\S+\b';
my #matches;
$string =~ s/($pattern)/push #matches, $1; ''/ge;
print "New string: $string; Removed: #matches\n";
Output:
New string: I am a changed string.; Removed: thistle thing thinking this Thistle thirsty
Here is another way to do it without executing Perl code inside the substitution. The trick is that the s///g will return one capture at a time and undef if it does not match, thus quitting the while loop.
use strict;
use warnings;
use Data::Dump;
my $string = "The example Kenosis came up with was way better than mine.";
my #matches;
push #matches, $1 while $string =~ s/(\b\w{4}\b)\s//;
dd #matches, $string;
__END__
(
"came",
"with",
"than",
"The example Kenosis up was way better mine.",
)

perl regex replace only part of string

I need to write a perl regex to convert
site.company.com => dc=site,dc=company,dc=com
Unfortunately I am not able to remove the trailing "," using the regex I came with below. I could of course remove the trailing "," in the next statement but would prefer that to be handled as a part of the regex.
$data="site.company.com";
$data =~ s/([^.]+)\.?/dc=$1,/g;
print $data;
This above code prints:
dc=site,dc=company,dc=com,
Thanks in advance.
When handling urls it may be a good idea to use a module such as URI. However, I do not think it applies in this case.
This task is most easily solved with a split and join, I think:
my $url = "site.company.com";
my $string = join ",", # join the parts with comma
map "dc=$_", # add the dc= to each part
split /\./, $url; # split into parts
$data =~s/\./,dc=/g&&s/^/dc=/g;
tested below:
> echo "site.company.com" | perl -pe 's/\./,dc=/g&&s/^/dc=/g'
dc=site,dc=company,dc=com
Try doing this :
my $x = "site.company.com";
my #a = split /\./, $x;
map { s/^/dc=/; } #a;
print join",", #a;
just put like this,
$data="site.company.com";
$data =~ s/,dc=$1/dc=$1/g; #(or) $data =~ s/,dc/dc/g;
print $data;
I'm going to try the /ge route:
$data =~ s{^|(\.)}{
( $1 && ',' ) . 'dc='
}ge;
e = evaluate replacement as Perl code.
So, it says given the start of the string, or a dot, make the following replacement. If it captured a period, then emit a ','. Regardless of this result, insert 'dc='.
Note, that I like to use a brace style of delimiter on all my evaluated replacements.

Perl regex replace in same case

If you have a simple regex replace in perl as follows:
($line =~ s/JAM/AAA/g){
how would I modify it so that it looks at the match and makes the replacement the same case as the match for example:
'JAM' would become 'AAA'
and 'jam' would become 'aaa'
Unicode-based solution:
use Unicode::UCD qw(charinfo);
my %category_mapping = (
Lu # upper-case Letter
=> 'A',
Ll # lower-case Letter
=> 'a',
);
join q(), map { $category_mapping{charinfo(ord $_)->{category}} } split //, 'jam';
# returns aaa
join q(), map { $category_mapping{charinfo(ord $_)->{category}} } split //, 'JAM';
# returns AAA
Here the unhandled characters resp. their categories are a bit easier to see than in the other answers.
In Perl 5 you can do something like:
$line =~ s/JAM/$_=$&; tr!A-Z!A!; tr!a-z!a!; $_/gie;
It handles all different cases of JAM, like Jam, and it's easy to add other words, eg:
$line =~ s/JAM|SPAM/$_=$&; tr!A-Z!A!; tr!a-z!a!; $_/gie;
Something like this perhaps?
http://perldoc.perl.org/perlfaq6.html#How-do-I-substitute-case-insensitively-on-the-LHS-while-preserving-case-on-the-RHS%3f
Doing it in two-steps is probably a better/simpler idea...
Using the power of google I found this
The :samecase modifier, short :ii (since it's a variant of :i) preserve case.
my $x = 'Abcd';
$x ~~ s:ii/^../foo/;
say $x; # Foocd
$x = 'ABC'
$x ~~ s:ii/^../foo/;
say $x # FOO
This is very useful if you want to globally rename your module Foo, to Bar,
but for example in environment variables it is written as all uppercase.
With the :ii modifier the case is automatically preserved.
$line =~ s/JAM/{$& eq 'jam' ? 'aaa' : 'AAA'}/gie;

Match regex and assign results in single line of code

I want to be able to do a regex match on a variable and assign the results to the variable itself. What is the best way to do it?
I want to essentially combine lines 2 and 3 in a single line of code:
$variable = "some string";
$variable =~ /(find something).*/;
$variable = $1;
Is there a shorter/simpler way to do this? Am I missing something?
my($variable) = "some string" =~ /(e\s*str)/;
This works because
If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1, $2, $3 …).
and because my($variable) = ... (note the parentheses around the scalar) supplies list context to the match.
If the pattern fails to match, $variable gets the undefined value.
Why do you want it to be shorter? Does is really matter?
$variable = $1 if $variable =~ /(find something).*/;
If you are worried about the variable name or doing this repeatedly, wrap the thing in a subroutine and forget about it:
some_sub( $variable, qr/pattern/ );
sub some_sub { $_[0] = $1 if eval { $_[0] =~ m/$_[1]/ }; $1 };
However you implement it, the point of the subroutine is to make it reuseable so you give a particular set of lines a short name that stands in their place.
Several other answers mention a destructive substitution:
( my $new = $variable ) =~ s/pattern/replacement/;
I tend to keep the original data around, and Perl v5.14 has an /r flag that leaves the original alone and returns a new string with the replacement (instead of the count of replacements):
my $match = $variable =~ s/pattern/replacement/r;
Well, you could say
my $variable;
($variable) = ($variable = "find something soon") =~ /(find something).*/;
or
(my $variable = "find something soon") =~ s/^.*?(find something).*/$1/;
You can do substitution as:
$a = 'stackoverflow';
$a =~ s/(\w+)overflow/$1/;
$a is now "stack"
From Perl Cookbook 2nd ed
6.1 Copying and Substituting Simultaneously
$dst = $src;
$dst =~ s/this/that/;
becomes
($dst = $src) =~ s/this/that/;
I just assumed everyone did it this way, amazed that no one gave this answer.
Almost ....
You can combine the match and retrieve the matched value with a substitution.
$variable =~ s/.*(find something).*/$1/;
AFAIK, You will always have to copy the value though, unless you do not care to clobber the original.
$variable2 = "stackoverflow";
(my $variable1) = ($variable2 =~ /stack(\w+)/);
$variable1 now equals "overflow".
I do this:
#!/usr/bin/perl
$target = "n: 123";
my ($target) = $target =~ /n:\s*(\d+)/g;
print $target; # the var $target now is "123"
Also, to amplify the accepted answer using the ternary operator to allow you to specify a default if there is no match:
my $match = $variable =~ /(*pattern*).*/ ? $1 : *defaultValue*;