Find and replace any or both patterns in a string

Find and replace any or both patterns in a string - regex

I have list of urls. I need to strip off the protocol from it.
Some may have only http:// in it some may have www in it or some both.
I have written the code for it as:
my #list = qw'http://de.yahoo.com http://mail.example.org http://www.aol.com';
foreach(#list)
{
my $string = $_;
$string =~ s/http:\/\///;
$string =~ s/www.//;
print $string,"\n";
}
It works fine but is there a better way to write it in one line?

This should do the trick:
my #list = qw(http://de.yahoo.com http://mail.example.org http://www.aol.com);
foreach(#list)
{
my $string = $_;
$string =~ s/^(?:http:\/\/)?(?:www\.)?//;
print $string,"\n";
}
For future reference, http://www.regextester.com/ is your friend :)
** Edit ** Modified to use ikegami's suggestion of (?:...) as it should be more efficient when the values captured are not needed.

I guess you may want:
s!^(http://)?(www\.)?!!;
A few points:
use s!a!b! instead of s/a/b/, this save the \/\/ escape.
use ^, this ensure http:// is at the start of string
As a single line:
print join("\n", map {s!^(http://)?(www\.)?!!;} #list);

Yes:
s{http://(.*)www.|www.(.*)http://|http://|www.}{$1$2}g;
But you probably meant to do:
s{^http://}[};
s{^www\.}[};
which can be combined into:
s{^(?:http://)?(?:www\.)?}{};
http://www.foo.bar/www.html?http://xxx => foo.bar/www.html?http://xxx
http://foo.bar/www.html => foo.bar/www.html?http://xxx
www.foo.bar/www.html => foo.bar/www.html?http://xxx
foo.bar/www.html => foo.bar/www.html?http://xxx

Related

Perl regular expression to get 'This text':(

for example , if i have 'Sample text':(
My $1 should be Sample text, so any string which is enclosed in ' ' and is before :(

The following will do the trick:
/'([^']*)':\(/
For example,
my $str = "'Sample text':(";
my ($match) = $str =~ /'([^']*)':\(/
or die("No match\n");
say $match; # Sample text

Here is the simplest code to answer the question:
#/usr/bin/perl
use strict;
"'Sample text':(" =~ /'(.*)':\(/;
print "$1\n"

I'm not adding anything new here but just some samples and a place for you to test it yourself and see perhaps how it works. There are two assumptions here:
No escape characters, such as 'Sam\'ple text'.
:( is directly after your string. If it's not, you'd want to do a lookahead instead.
+ requires at least one character in your string. So 'a':( would be valid but '':( would not. If you want to allow empty strings, use * instead of +.
'([^']+)':\(

Perl regex return matches from substitution

I am trying to simultaneously remove and store (into an array) all matches of some regex in a string.
To return matches from a string into an array, you could use
my #matches = $string=~/$pattern/g;
I would like to use a similar pattern for a substitution regex. Of course, one option is:
my #matches = $string=~/$pattern/g;
$string =~ s/$pattern//g;
But is there really no way to do this without running the regex engine over the full string twice? Something like
my #matches = $string=~s/$pattern//g
Except that this will only return the number of subs, regardless of list context. I would also take, as a consolation prize, a method to use qr// where I could simply modify the quoted regex to to a sub regex, but I don't know if that's possible either (and that wouldn't preclude searching the same string twice).

Perhaps the following will be helpful:
use warnings;
use strict;
my $string = 'I thistle thing am thinking this Thistle a changed thirsty string.';
my $pattern = '\b[Tt]hi\S+\b';
my #matches;
$string =~ s/($pattern)/push #matches, $1; ''/ge;
print "New string: $string; Removed: #matches\n";
Output:
New string: I am a changed string.; Removed: thistle thing thinking this Thistle thirsty

Here is another way to do it without executing Perl code inside the substitution. The trick is that the s///g will return one capture at a time and undef if it does not match, thus quitting the while loop.
use strict;
use warnings;
use Data::Dump;
my $string = "The example Kenosis came up with was way better than mine.";
my #matches;
push #matches, $1 while $string =~ s/(\b\w{4}\b)\s//;
dd #matches, $string;
__END__
(
"came",
"with",
"than",
"The example Kenosis up was way better mine.",
)

perl regex replace only part of string

I need to write a perl regex to convert
site.company.com => dc=site,dc=company,dc=com
Unfortunately I am not able to remove the trailing "," using the regex I came with below. I could of course remove the trailing "," in the next statement but would prefer that to be handled as a part of the regex.
$data="site.company.com";
$data =~ s/([^.]+)\.?/dc=$1,/g;
print $data;
This above code prints:
dc=site,dc=company,dc=com,
Thanks in advance.

When handling urls it may be a good idea to use a module such as URI. However, I do not think it applies in this case.
This task is most easily solved with a split and join, I think:
my $url = "site.company.com";
my $string = join ",", # join the parts with comma
map "dc=$_", # add the dc= to each part
split /\./, $url; # split into parts

$data =~s/\./,dc=/g&&s/^/dc=/g;
tested below:
> echo "site.company.com" | perl -pe 's/\./,dc=/g&&s/^/dc=/g'
dc=site,dc=company,dc=com

Try doing this :
my $x = "site.company.com";
my #a = split /\./, $x;
map { s/^/dc=/; } #a;
print join",", #a;

just put like this,
$data="site.company.com";
$data =~ s/,dc=$1/dc=$1/g; #(or) $data =~ s/,dc/dc/g;
print $data;

I'm going to try the /ge route:
$data =~ s{^|(\.)}{
( $1 && ',' ) . 'dc='
}ge;
e = evaluate replacement as Perl code.
So, it says given the start of the string, or a dot, make the following replacement. If it captured a period, then emit a ','. Regardless of this result, insert 'dc='.
Note, that I like to use a brace style of delimiter on all my evaluated replacements.

Perl regex replace in same case

If you have a simple regex replace in perl as follows:
($line =~ s/JAM/AAA/g){
how would I modify it so that it looks at the match and makes the replacement the same case as the match for example:
'JAM' would become 'AAA'
and 'jam' would become 'aaa'

Unicode-based solution:
use Unicode::UCD qw(charinfo);
my %category_mapping = (
Lu # upper-case Letter
=> 'A',
Ll # lower-case Letter
=> 'a',
);
join q(), map { $category_mapping{charinfo(ord $_)->{category}} } split //, 'jam';
# returns aaa
join q(), map { $category_mapping{charinfo(ord $_)->{category}} } split //, 'JAM';
# returns AAA
Here the unhandled characters resp. their categories are a bit easier to see than in the other answers.

In Perl 5 you can do something like:
$line =~ s/JAM/$_=$&; tr!A-Z!A!; tr!a-z!a!; $_/gie;
It handles all different cases of JAM, like Jam, and it's easy to add other words, eg:
$line =~ s/JAM|SPAM/$_=$&; tr!A-Z!A!; tr!a-z!a!; $_/gie;

Something like this perhaps?
http://perldoc.perl.org/perlfaq6.html#How-do-I-substitute-case-insensitively-on-the-LHS-while-preserving-case-on-the-RHS%3f
Doing it in two-steps is probably a better/simpler idea...
Using the power of google I found this
The :samecase modifier, short :ii (since it's a variant of :i) preserve case.
my $x = 'Abcd';
$x ~~ s:ii/^../foo/;
say $x; # Foocd
$x = 'ABC'
$x ~~ s:ii/^../foo/;
say $x # FOO
This is very useful if you want to globally rename your module Foo, to Bar,
but for example in environment variables it is written as all uppercase.
With the :ii modifier the case is automatically preserved.

$line =~ s/JAM/{$& eq 'jam' ? 'aaa' : 'AAA'}/gie;

Match regex and assign results in single line of code

I want to be able to do a regex match on a variable and assign the results to the variable itself. What is the best way to do it?
I want to essentially combine lines 2 and 3 in a single line of code:
$variable = "some string";
$variable =~ /(find something).*/;
$variable = $1;
Is there a shorter/simpler way to do this? Am I missing something?

my($variable) = "some string" =~ /(e\s*str)/;
This works because
If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1, $2, $3 …).
and because my($variable) = ... (note the parentheses around the scalar) supplies list context to the match.
If the pattern fails to match, $variable gets the undefined value.

Why do you want it to be shorter? Does is really matter?
$variable = $1 if $variable =~ /(find something).*/;
If you are worried about the variable name or doing this repeatedly, wrap the thing in a subroutine and forget about it:
some_sub( $variable, qr/pattern/ );
sub some_sub { $_[0] = $1 if eval { $_[0] =~ m/$_[1]/ }; $1 };
However you implement it, the point of the subroutine is to make it reuseable so you give a particular set of lines a short name that stands in their place.
Several other answers mention a destructive substitution:
( my $new = $variable ) =~ s/pattern/replacement/;
I tend to keep the original data around, and Perl v5.14 has an /r flag that leaves the original alone and returns a new string with the replacement (instead of the count of replacements):
my $match = $variable =~ s/pattern/replacement/r;

Well, you could say
my $variable;
($variable) = ($variable = "find something soon") =~ /(find something).*/;
or
(my $variable = "find something soon") =~ s/^.*?(find something).*/$1/;

You can do substitution as:
$a = 'stackoverflow';
$a =~ s/(\w+)overflow/$1/;
$a is now "stack"

From Perl Cookbook 2nd ed
6.1 Copying and Substituting Simultaneously
$dst = $src;
$dst =~ s/this/that/;
becomes
($dst = $src) =~ s/this/that/;
I just assumed everyone did it this way, amazed that no one gave this answer.

Almost ....
You can combine the match and retrieve the matched value with a substitution.
$variable =~ s/.*(find something).*/$1/;
AFAIK, You will always have to copy the value though, unless you do not care to clobber the original.

$variable2 = "stackoverflow";
(my $variable1) = ($variable2 =~ /stack(\w+)/);
$variable1 now equals "overflow".

I do this:
#!/usr/bin/perl
$target = "n: 123";
my ($target) = $target =~ /n:\s*(\d+)/g;
print $target; # the var $target now is "123"

Also, to amplify the accepted answer using the ternary operator to allow you to specify a default if there is no match:
my $match = $variable =~ /(*pattern*).*/ ? $1 : *defaultValue*;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Find and replace any or both patterns in a string - regex

I guess you may want: s!^(http://)?(www\.)?!!; A few points: use s!a!b! instead of s/a/b/, this save the \/\/ escape. use ^, this ensure http:// is at the start of string As a single line: print join("\n", map {s!^(http://)?(www\.)?!!;} #list);

Related

Perl regular expression to get 'This text':(

Perl regex return matches from substitution

perl regex replace only part of string

Perl regex replace in same case

Match regex and assign results in single line of code

Categories

Resources