How do I print what was replaced in a substitution operation?

How do I print what was replaced in a substitution operation? - regex

I understand that's a very simple question but I really failed googling it. =(
I've got something like this:
$a =~ s/(\w*)/--word was here--/g;
And I want to put into a log file which words were replaced.
aa 123 bb 234 cc → --word was here-- 123 --word was here-- 234 --word was here--
And that's okay, but I want to remember aa, bb and cc and write into a log file. What should I do?
In fact I have a link remover script but I need to remember which links were removed. I tried to simplify my task for you but made it much harder to understand - sorry.

You can use the e modifier which evaluates the right side as an expression:
$a =~ s/(\w*)/log_it($1), ""/ge;

You can do it in a loop instead of /g;:
my $string = "xxx ; yyy ; zzz";
my #replaced;
while ($string =~ s/(\w+)//) { push #replaced, $1 };
print join(",",#replaced);
# OUTPUT: xxx,yyy,zzz
Please note that \w is a WORD character, not an alphabet one, so it will match digits 0-9 as well.
If you only want to match letters, use [[:alpha:]] class

The following stores all captured words into an array:
use strict;
use warnings;
use Data::Dumper;
my $s = 'cat dog';
my #words;
while ($s =~ s/(\w+)//) {
push #words, $1;
}
print Dumper(\#words);
__END__
$VAR1 = [
'cat',
'dog'
];
Update: Now that you have added input and output data, it looks like you want to exclude numbers. In that case, you could use [a-zA-Z] instead of [\w].

Related

Perl - Match string between two colons

My string looks like this
important stuff: some text 2: some text 3.
I want to only print "important stuff". So basically I want to print everything up to the first colon. I'm sure this is simple, but my regex foo is not so good.
Edit: Sorry I was doing something stupid and gave you a bad example line. It has been corrected.

Just restrict what you're matching to non-colons, [^:]*. Note, the ^ and : boundaries aren't actually needed, but they help document the intent behind the regex.
my $text = "important stuff: some text 2: some text 3."
if ($text =~ /^([^:]*):/) {
print "$1";
}

Consider just splitting on the colon:
use strict;
use warnings;
my $string = 'important stuff: some text 2: some text 3.';
my $important = ( split /:/, $string )[0];
print $important;
Output:
important stuff

Well, assume its a string
$test = "sass sg22gssg 22222 2222: important important :"
Assume you want all characters between.
Wrong answer: $test =~ /:(.+):/; # thank you for the change from .{1,}
Corrected.
$test =~ /:([^:]*):/;
print $1; #perl memory u can assign to a string ;
$found = $1;
As a cheat sheet of regex in perl. cheat sheet
I did test it.

How to remove duplicate substrings from an undelimited string in perl?

I have an odd situation where I want to remove all but the first match of a substring inside of a very long undelimited string. I have found some similar topics here, but none quite like mine.
For simplicities sake, here are some sudo before and after strings.
I have an undelimited file where "c" could be thousands of random characters but "bbb" is a unique string:
aaabbbbbbccccccbbbccccccbbbccccccaaa
I want to remove all but the first bbb:
aaabbbccccccccccccccccccaaa
Also, I would like to be able to use this as a perl script I can pipe through:
cat file.in | something | perl -pe 's/bbb//g' | somethingelse > file.out
But, unlike my example above, I want to leave the first occurrence of "bbb" intact."
This seems like it should be fairly easy, but it is stumping me.
Any ideas?
Thanks in advance!

Perhaps the following will be helpful:
use strict;
use warnings;
my $string = 'aaabbbbbbccccccbbbccccccbbbccccccaaa';
$string =~ s/(?<=bbb).*?\Kbbb//g;
print $string;
Output:
aaabbbccccccccccccccccccaaa

my $string = 'aaabbbbbbccccccbbbccccccbbbccccccaaa';
my $seen;
sub first {
$seen++;
return $_[0] if $seen eq 1;
return '';
}
$string =~ s/(bbb)/first($1)/ge;
say $string;
Outputs:
aaabbbccccccccccccccccccaaa

perl regex match closest

I'm trying to match from the last item closet to a final word.
For instance, closest b to dog
"abcbdog"
Should be "bdog"
But instead I'm getting "bcbdog"
How can I only match from the last occurrence "b" before "dog"
Here is my current regex:
/b.*?dog/si
Thank you!

Regexes want to go from left to right but you want to go from right to left so just reverse your string, reverse your pattern, and reverse the match:
my $search_this = 'abcbdog';
my $item_name = 'dog';
my $find_closest = 'b';
my $pattern = reverse($item_name)
. '.*?'
. reverse($find_closest);
my $reversed = reverse($search_this);
$reversed =~ /$pattern/si;
my $what_matched = reverse($&);
print "$what_matched\n";
# prints bdog

Try this:
/b[^b]*dog/si
Match b, then anything that isn't a b (including nothing), and then dog.

TIMTOWTDI:
This method can even find multiple matches through the string, or may be optimized if the start or end words will be more common. Edit: Now uses zero-width matches to avoid removing then adding the start and end strings.
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10; #say
my $string = 'abcbdog';
my $start = 'b';
my $end = 'dog';
my #found =
grep { s/(?<=$end).*// }
split( /(?=$start)/, $string );
say for #found;

when you don't know already what is the last character before dog this just works:
my $str = 'abcbdog';
my #r = $str =~ /(.dog)/;
print #r;
prints bdog

The accepted answer seems a little complicated if you're just trying to match up the closest 'b' to 'dog', including dog, you just need to make your matches before the term you're looking for greedy. For example:
# First example
my $string1 = 'abcbdog';
if ( $string1 =~ /.+(b.*dog)/ ) {
print $1;
# Returns 'bdog'
}
# Second example, different string, same regex.
my $string2 = 'abcbmoretextdog';
if ( $string2 =~ /.+(b.*dog)/ ) {
print $1;
# Returns 'bmoretextdog'
}
Or am I missing something? If you want to change the captured string to match what you want, just shift the brackets.

Try this code:
~/.* b.*?dog/si

Regex to get just everything in CAPS

I am looking for a regex to get just about the words in CAPS
for eg : I have an array that is storing the file paths and these could be in any following pattern
images/p/n/ct/XYZ-WW_V1.jpg
images/p/c/ABC-TY_V2.jpg
So basically I want just "XYZ-WW" and "ABC-TY" .
Any suggestions what regex to use in my split code . I am using the following
foreach (#filefound){
my #result = split('_',$_);
push #split1, $result[0];
}
This is just splitting at the _ and I am accessing the [0] the value but now I want to get just the part that is in CAPS .
Any Suggestions please !!

No reason to use split at all. Just grab the bits you want via a regular expression. From your example, it looks like you want everything which is made of capital ASCII letters and dashes:
my #bignames;
foreach (#filefound){
if ( /([A-Z-]+)/ ) {
push #bignames, $1;
}
}

I'm thinking this should work:
[A-Z]+-[A-Z]+

foreach (#filefound) {
if ($_ ~= /.*([A-Z]+-[A-Z]+)_[A-Z]\d\..{3}$ ) {
push #split1, $1;
}

You can try this:
if ($_ =~ /[\-A-Z]+/)
push #split1, $&;
that will match any combination of uppercase letters and -; or, if you want a stricter control, this:
if ($_ =~ /\/([A-Z]{3}-[A-Z]{2})_/)
push #split1, $1;
which will match only a sequence of uppercase letters followed by - and by a sequence of 2 uppercase letters; starting with a / and ending in _ (those are excluded).
From these example you can build the exact regex that you need.

Keep in mind that a match in list context will return captured strings.
#!/usr/bin/perl
use warnings; use strict;
use File::Basename qw(basename);
my #files = qw(
images/p/n/ct/XYZ-WW_V1.jpg
images/p/c/ABC-TY_V2.jpg
);
my #prefixes = map { (basename $_) =~ /^( [A-Z]+ - [A-Z]+ )/x } #files;
print "$_\n" for #prefixes;

What does $1 mean in Perl?

What does $1 mean in Perl? Further, what does $2 mean?
How many $number variables are there?

The $number variables contain the parts of the string that matched the capture groups ( ... ) in the pattern for your last regex match if the match was successful.
For example, take the following string:
$text = "the quick brown fox jumps over the lazy dog.";
After the statement
$text =~ m/ (b.+?) /;
$1 equals the text "brown".

The number variables are the matches from the last successful match or substitution operator you applied:
my $string = 'abcdefghi';
if ($string =~ /(abc)def(ghi)/) {
print "I found $1 and $2\n";
}
Always test that the match or substitution was successful before using $1 and so on. Otherwise, you might pick up the leftovers from another operation.
Perl regular expressions are documented in perlre.

$1, $2, etc will contain the value of captures from the last successful match - it's important to check whether the match succeeded before accessing them, i.e.
if ( $var =~ m/( )/ ) { # use $1 etc... }
An example of the problem - $1 contains 'Quick' in both print statements below:
#!/usr/bin/perl
'Quick brown fox' =~ m{ ( quick ) }ix;
print "Found: $1\n";
'Lazy dog' =~ m{ ( quick ) }ix;
print "Found: $1\n";

As others have pointed out, the $x are capture variables for regular expressions, allowing you to reference sections of a matched pattern.
Perl also supports named captures which might be easier for humans to remember in some cases.
Given input: 111 222
/(\d+)\s+(\d+)/
$1 is 111
$2 is 222
One could also say:
/(?<myvara>\d+)\s+(?<myvarb>\d+)/
$+{myvara} is 111
$+{myvarb} is 222

These are called "match variables". As previously mentioned they contain the text from your last regular expression match.
More information is in Essential Perl. (Ctrl + F for 'Match Variables' to find the corresponding section.)

Since you asked about the capture groups, you might want to know about $+ too...
Pretty useful...
use Data::Dumper;
$text = "hiabc ihabc ads byexx eybxx";
while ($text =~ /(hi|ih)abc|(bye|eyb)xx/igs)
{
print Dumper $+;
}
OUTPUT:
$VAR1 = 'hi';
$VAR1 = 'ih';
$VAR1 = 'bye';
$VAR1 = 'eyb';

The variables $1 .. $9 are also read only variables so you can't implicitly assign a value to them:
$1 = 'foo'; print $1;
That will return an error: Modification of a read-only value attempted at script line 1.
You also can't use numbers for the beginning of variable names:
$1foo = 'foo'; print $1foo;
The above will also return an error.

I would suspect that there can be as many as 2**32 -1 numbered match variables, on a 32-bit compiled Perl binary.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How do I print what was replaced in a substitution operation? - regex

You can use the e modifier which evaluates the right side as an expression: $a =~ s/(\w*)/log_it($1), ""/ge;

Related

Perl - Match string between two colons

How to remove duplicate substrings from an undelimited string in perl?

perl regex match closest

Regex to get just everything in CAPS

What does $1 mean in Perl?

Categories

Resources