Extract only pattern matched text - regex

I have written a basic program using regular expression.
However the entire line is being returned instead of the matched part.
I want to extract the number only.
use strict;
use warnings;
my $line = "ABMA 1234";
$line =~ /(\s)(\d){4}/;
print $line; #prints *ABMA 1234*
Is my regular expression incorrect?

If you want to print 1234, you need to change your regex and print the 2nd match:
use strict;
use warnings;
my $line = "ABMA 1234";
$line =~ /(\s)(\d{4})/;
print $2;

You can replace the exact value with the corresponding values. And your are not removing the text \w;
use strict;
use warnings;
my $line = "ABMA 1234";
$line=~s/([A-z]*)\s+(\d+)/$2/;
print $line; #prints only 1234
If you want to store the value in the new string then
(my $newstring = $line)=~s/([A-z]*)\s+(\d+)/$2/;
print $newstring; #prints only 1234
Just try this:

I don't know how you output the match in perl but you can use below regex for output the full match in your regex, you might getting space appended with your result in your current regex.
\b[\d]{4}
DEMO

Related

Assigning a regex pattern match to a variable in perl

I want to match a regex pattern and assign the matched pattern to a variable.
I then want to print the value assigned to the variable.
My attempt
#!/usr/bin/perl -w
use strict;
my $variable = 'iaw75w8yu';
my $value =~ /w[0-9][0-9]w[0-9]/;
print $value;
Desired output
w75w8
Actual output
Use of uninitialized value $value in pattern match (m//) at ./temp.pl line 5.
Use of uninitialized value $value in print at ./temp.pl line 6.
This works too:
#!/usr/bin/env perl
use strict;
use warnings;
my $variable = 'iaw75w8yu';
my $value = $1 if ($variable =~ /(w[0-9][0-9]\w[0-9])/);
print $value if defined ($value);
It seems closer to what you initialy intended to try. But the shortfall of this method is that it could happen that your value doesn't get defined. It's something important to keep in mind when you start having optional captures in a m// statement.
This should be printing w78w8
No, it shouldn't! Even if your code was correct, it would never print w78w8 because the source string you're trying to match against doesn't contain w78w8. In the examples below, I'm going to assume you meant to say w75w8.
The problem is that you're trying to match an undefined value, the result of which is undefined. Your code is printing warnings, which is exactly what it should be printing since you specified -w.
If you want to capture something, you need a capture group:
use strict;
use warnings;
use Data::Dumper;
'iaw75w8yu' =~ /(w\d\dw\d)/;
print Dumper($1);
Outputs:
$VAR1 = 'w75w8';
If you want to store the matches in a variable of your choosing, you need to evaluate the expression in list context:
my #matches = ('iaw75w8yu' =~ /(w\d\dw\d)/);
print Dumper(\#matches);
Outputs:
$VAR1 = [
'w75w8'
];
I usually do it like this
#!/usr/bin/perl -w
use strict;
# the variable
my $variable = 'iaw75w8yu';
# the value
my $value;
# regex for the variable
if ( $variable =~ /(w[0-9][0-9]\w[0-9])/ ) {
$value = $1;
}
else {
$value = "regex not found";
}
print $value;

Need a regex for the following expression?

#! /usr/bin/perl
$str = "ab_cde,efg_gh,drg_fgt,main_xx,sum(abc),avg(def)";
or
$str = "ab_cde,bc_bn,gy_ihf,efg_gh,drg_fgt,main_xx,sum(abc),avg(def‌​)";
Guys, the string before main_xx is dynamic means there can be more elements with this format like xx_xx or xxx_xx or xx_xxx or xxx_xxxx or it can be as many characters before and after "underscore". So before main_xx, as many elements can come with above format. I want to match string UP TO main_xxbecause even fetching dynamically, this "main_xx" will be the last element and want to ignore elements aftermain_xx`. Please help to create a regex for this.
#!/usr/bin/perl -w
use strict;
my $str = "ab_cde,efg_gh,drg_fgt,main_xx,sum(abc),avg(def)";
(my $result) = ($str =~ m/(.*main_xx)/);
print $result;
The output will be everything up to main_xx (given xx is just the string made of x's).
Try this
my $str = "ab_cde,efg_gh,drg_fgt,main_xx,sum(abc),avg(def)";
my ($match)= $str =~m/(.+main_xx)/;
print $match;

Perl replace delimiters

I have CSV text like
1,2,3,{4,5,6,7,8},9,10,100
I want to replace the delimiter of fields between {}. The text should look like:
1,2,3,{4|5|6|7|8},9,10,100
I tried perl -0777 -pe 's/\{.*?,\}/|/g'
but nothing happens. What should I do instead?
This will do as you ask. It replaces all commas that are followed by a sequence of characters that are not braces { }, and then a closing brace
use strict;
use warnings;
use 5.010;
my $s = '1,2,3,{4,5,6,7,8},9,10,100';
$s =~ s/,(?=[^{}]*\})/|/g;
say $s;
output
1,2,3,{4|5|6|7|8},9,10,100
You can use the following regex with $1$2| replacement string:
(\{\s*|(?<!^)\G)(\d+),(?=[,0-9]*\})
Output:
1,2,3,{4|5|6|7|8},9,10,100
Sample code:
#!/usr/bin/perl
$txt = "1,2,3,{4,5,6,7,8},9,10,100";
$txt =~ s/(\{\s*|(?<!^)\G)(\d+),(?=[,0-9]*\})/$1$2|/g;
print $txt;
Here's a command line version for Perl 5.14 and greater.
perl -pe 's/([{][\d,]+[}])/$1 =~ s~,~|~gr/ge'
The /e means it's evaluating the replacement as a Perl expression and not the standard regex expression. That means that it is taking the value of the first capture ($1) and performing a substitution with return (/r) so as to avoid the error trying to modify the read-only value ($1).
You can try this:
$st = "1,2,3,{4,5,6,7,8},9,10,100";
if ( $st=~/\{(.*)\}/ ) {
$tr = $1;
$tr =~ s/,/|/g;
$st =~ s/\{*\}/{$tr}/;
print "$st \n"
}
Output:
1,2,3,{4,5,6,7,8{4|5|6|7|8},9,10,100

Perl regex return matches from substitution

I am trying to simultaneously remove and store (into an array) all matches of some regex in a string.
To return matches from a string into an array, you could use
my #matches = $string=~/$pattern/g;
I would like to use a similar pattern for a substitution regex. Of course, one option is:
my #matches = $string=~/$pattern/g;
$string =~ s/$pattern//g;
But is there really no way to do this without running the regex engine over the full string twice? Something like
my #matches = $string=~s/$pattern//g
Except that this will only return the number of subs, regardless of list context. I would also take, as a consolation prize, a method to use qr// where I could simply modify the quoted regex to to a sub regex, but I don't know if that's possible either (and that wouldn't preclude searching the same string twice).
Perhaps the following will be helpful:
use warnings;
use strict;
my $string = 'I thistle thing am thinking this Thistle a changed thirsty string.';
my $pattern = '\b[Tt]hi\S+\b';
my #matches;
$string =~ s/($pattern)/push #matches, $1; ''/ge;
print "New string: $string; Removed: #matches\n";
Output:
New string: I am a changed string.; Removed: thistle thing thinking this Thistle thirsty
Here is another way to do it without executing Perl code inside the substitution. The trick is that the s///g will return one capture at a time and undef if it does not match, thus quitting the while loop.
use strict;
use warnings;
use Data::Dump;
my $string = "The example Kenosis came up with was way better than mine.";
my #matches;
push #matches, $1 while $string =~ s/(\b\w{4}\b)\s//;
dd #matches, $string;
__END__
(
"came",
"with",
"than",
"The example Kenosis up was way better mine.",
)

How do I substitute with an evaluated expression in Perl?

There's a file dummy.txt
The contents are:
9/0/2010
9/2/2010
10/11/2010
I have to change the month portion (0,2,11) to +1, ie, (1,3,12)
I wrote the substitution regex as follows
$line =~ s/\/(\d+)\//\/\1+1\//;
It's is printing
9/0+1/2010
9/2+1/2010
10/11+1/2010
How to make it add - 3 numerically than perform string concat? 2+1??
Three changes:
You'll have to use the e modifier
to allow an expression in the
replacement part.
To make the replacement globally
you should use the g modifier. This is not needed if you've one date per line.
You use $1 on the replacement side, not a backreference
This should work:
$line =~ s{/(\d+)/}{'/'.($1+1).'/'}eg;
Also if your regex contains the delimiter you're using(/ in your case), it's better to choose a different delimiter ({} above), this way you don't have to escape the delimiter in the regex making your regex clean.
this works: (e is to evaluate the replacement string: see the perlrequick documentation).
$line = '8/10/2010';
$line =~ s!/(\d+)/!('/'.($1+1).'/')!e;
print $line;
It helps to use ! or some other character as the delimiter if your regular expression has / itself.
You can also use, from this question in Can Perl string interpolation perform any expression evaluation?
$line = '8/10/2010';
$line =~ s!/(\d+)/!("/#{[$1+1]}/")!e;
print $line;
but if this is a homework question, be ready to explain when the teacher asks you how you reach this solution.
How about this?
$ cat date.txt
9/0/2010
9/2/2010
10/11/2010
$ perl chdate.pl
9/1/2010
9/3/2010
10/12/2010
$ cat chdate.pl
use strict;
use warnings;
open my $fp, '<', "date.txt" or die $!;
while (<$fp>) {
chomp;
my #arr = split (/\//, $_);
my $temp = $arr[1]+1;
print "$arr[0]/$temp/$arr[2]\n";
}
close $fp;
$