I was trying to get the number from a string. The number can be pure digits e.g. 12334 or can be separated with underscore 12_345
I was trying with the below code but was unable to get anything from it.
my $string = "this is a 141_153_923 number : $_123_456";
if ($string =~ /\b\d*(?:\d+\_?\d+)*\d*\b/) {
print "$&\n";
}
expected output is 141_153_923
I have also tried with string 141_153_923 and it it still not returning anything even with
$string =~ /\b\d\b/
on the string 141_153_923
I hope you have the variable $_123_456 is declared in your Perl code. Otherwise you'll get an warning.
Now the regex. Try with this one:
if ($string =~ /\b(\d+(?:_\d+)*)\b/) {
Try this regex: /((?:\d+\_?)+)/.
...
my $string = "this is a 141_153_923 number : \$_123_456";
my $num;
if (($num) = $string =~ /((?:\d+\_?)+)/) {
print "first: $num\n";
}
$string = "this is a 141153923 number : \$_123_456";
if (($num) = $string =~ /((?:\d+\_?)+)/) {
print "second: $num\n";
}
...
output:
first: 141_153_923
second: 141153923
Related
I am trying to find out the number of occurrences of "The/the". Below is the code I tried"
print ("Enter the String.\n");
$inputline = <STDIN>;
chop($inputline);
$regex="\[Tt\]he";
if($inputline ne "")
{
#splitarr= split(/$regex/,$inputline);
}
$scalar=#splitarr;
print $scalar;
The string is :
Hello the how are you the wanna work on the project but i the u the
The
The output that it gives is 7. However with the string :
Hello the how are you the wanna work on the project but i the u the
the output is 5. I suspect my regex. Can anyone help in pointing out what's wrong.
I get the correct number - 6 - for the first string
However your method is wrong, because if you count the number of pieces you get by splitting on the regex pattern it will give you different values depending on whether the word appears at the beginning of the string. You should also put word boundaries \b into your regular expression to prevent the regex from matching something like theory
Also, it is unnecessary to escape the square brackets, and you can use the /i modifier to do a case-independent match
Try something like this instead
use strict;
use warnings;
print 'Enter the String: ';
my $inputline = <>;
chomp $inputline;
my $regex = 'the';
if ( $inputline ne '' ) {
my #matches = $inputline =~ /\b$regex\b/gi;
print scalar #matches, " occurrences\n";
}
With split, you're counting the substrings between the the's. Use match instead:
#!/usr/bin/perl
use warnings;
use strict;
my $regex = qr/[Tt]he/;
for my $string ('Hello the how are you the wanna work on the project but i the u the The',
'Hello the how are you the wanna work on the project but i the u the',
'the theological cathedral'
) {
my $count = () = $string =~ /$regex/g;
print $count, "\n";
my #between = split /$regex/, $string;
print 0 + #between, "\n";
print join '|', #between;
print "\n";
}
Note that both methods return the same number for the two inputs you mentioned (and the first one returns 6, not 7).
The following snippet uses a code side-effect to increment a counter, followed by an always-failing match to keep searching. It produces the correct answer for matches that overlap (e.g. "aaaa" contains "aa" 3 times, not 2). The split-based answers don't get that right.
my $i;
my $string;
$i = 0;
$string = "aaaa";
$string =~ /aa(?{$i++})(?!)/;
print "'$string' contains /aa/ x $i (should be 3)\n";
$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the The";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 6)\n";
$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 5)\n";
What you need is 'countof' operator to count the number of matches:
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/[Tt]he/g;
print $count;
If you want to select only the word the or The, add word boundary:
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/\b[Tt]he\b/g;
print $count;
I am trying to replace all words from a text except some that I have in an array. Here's my code:
my $text = "This is a text!And that's some-more text,text!";
while ($text =~ m/([\w']+)/g) {
next if $1 ~~ #ignore_words;
my $search = $1;
my $replace = uc $search;
$text =~ s/$search/$replace/e;
}
However, the program doesn't work. Basically I am trying to make all words uppercase but skip the ones in #ignore_words. I know it's a problem with the variables being used in the regular expression, but I can't figure the problem out.
#!/usr/bin/perl
my $text = "This is a text!And that's some-more text,text!";
my #ignorearr=qw(is some);
my %h1=map{$_ => 1}#ignorearr;
$text=~s/([\w']+)/($h1{$1})?$1:uc($1)/ge;
print $text;
On running this,
THIS is A TEXT!AND THAT'S some-MORE TEXT,TEXT!
You can figure the problem out of your code if instead of applying an expression to the same control variable of a while loop, just let s/../../eg do it globally for you:
my $text = "This is a text!And that's some-more text,text!";
my #ignore_words = qw{ is more };
$text =~ s/([\w']+)/$1 ~~ #ignore_words ? $1 : uc($1)/eg;
print $text;
And on running:
THIS is A TEXT!AND THAT'S SOME-more TEXT,TEXT!
DESCR: "10GE SR"
i need match this above part which is part of my rest of the string. Im using regex in perl.
i tried
if ($line =~ /DESCR: \"([a-zA-Z0-9)\"/) {
print "$1\n";
}
but im not able to understand how to consider spaces inside my string. these spaces can occur any where within the quotes. can someone help me out.
$str = 'DESCR: "10GE SR"';
if ($str =~ /DESCR: \"([a-zA-Z0-9\s]+)\"/) {
print "$1\n";
}
Take a look, this pattern can match double quoted in string:
if ($line =~ /DESCR: \"((?:[^\\"]|\\.)*)\"/) {
print "$1\n";
}
It may be simpler:
if ( $line =~ /DESCR: "([^"]+)"/ ) {
print "$1\n";
}
I have a strange problem in matching a pattern.
Consider the Perl code below
#!/usr/bin/perl -w
use strict;
my #Array = ("Hello|World","Good|Day");
function();
function();
function();
sub function
{
foreach my $pattern (#Array)
{
$pattern =~ /(\w+)\|(\w+)/g;
print $1."\n";
}
print "\n";
}
__END__
The output I expect should be
Hello
Good
Hello
Good
Hello
Good
But what I get is
Hello
Good
Use of uninitialized value $1 in concatenation (.) or string at D:\perlfiles\problem.pl li
ne 28.
Use of uninitialized value $1 in concatenation (.) or string at D:\perlfiles\problem.pl li
ne 28.
Hello
Good
What I observed was that the pattern matches alternatively.
Can someone explain me what is the problem regarding this code.
To fix this I changed the function subroutine to something like this:
sub function
{
my $string;
foreach my $pattern (#Array)
{
$string .= $pattern."\n";
}
while ($string =~ m/(\w+)\|(\w+)/g)
{
print $1."\n";
}
print "\n";
}
Now I get the output as expected.
It is the global /g modifier that is at work. It remembers the position of the last pattern match. When it reaches the end of the string, it starts over.
Remove the /g modifier, and it will act as you expect.
How can I count the amount of spaces at the start of a string in Perl?
I now have:
$temp = rtrim($line[0]);
$count = ($temp =~ tr/^ //);
But that gives me the count of all spaces.
$str =~ /^(\s*)/;
my $count = length( $1 );
If you just want actual spaces (instead of whitespace), then that would be:
$str =~ /^( *)/;
Edit: The reason why tr doesn't work is it's not a regular expression operator. What you're doing with $count = ( $temp =~ tr/^ // ); is replacing all instances of ^ and with itself (see comment below by cjm), then counting up how many replacements you've done. tr doesn't see ^ as "hey this is the beginning of the string pseudo-character" it sees it as "hey this is a ^".
You can get the offset of a match using #-. If you search for a non-whitespace character, this will be the number of whitespace characters at the start of the string:
#!/usr/bin/perl
use strict;
use warnings;
for my $s ("foo bar", " foo bar", " foo bar", " ") {
my $count = $s =~ /\S/ ? $-[0] : length $s;
print "'$s' has $count whitespace characters at its start\n";
}
Or, even better, use #+ to find the end of the whitespace:
#!/usr/bin/perl
use strict;
use warnings;
for my $s ("foo bar", " foo bar", " foo bar", " ") {
$s =~ /^\s*/;
print "$+[0] '$s'\n";
}
Here's a script that does this for every line of stdin. The relevant snippet of code is the first in the body of the loop.
#!/usr/bin/perl
while ($x = <>) {
$s = length(($x =~ m/^( +)/)[0]);
print $s, ":", $x, "\n";
}
tr/// is not a regex operator. However, you can use s///:
use strict; use warnings;
my $t = (my $s = " \t\n sdklsdjfkl");
my $n = 0;
++$n while $s =~ s{^\s}{};
print "$n \\s characters were removed from \$s\n";
$n = ( $t =~ s{^(\s*)}{} ) && length $1;
print "$n \\s characters were removed from \$t\n";
Since the regexp matcher returns the parenthesed matches when called in a list context, CanSpice's answer can be written in a single statement:
$count = length( ($line[0] =~ /^( *)/)[0] );
This prints amount of white space
echo " hello" |perl -lane 's/^(\s+)(.*)+$/length($1)/e; print'
3