regex pattern match and extraction

regex pattern match and extraction - regex

I am trying to write a perl regex to extract words greater than 2 letters after the colon :. For example, If the pattern is subject:I am about to write a regex. I need to extract in my $variable only >2 letter wordsi.e, $variable = "subject:about write regex".
Here is my program where the regex and pattern matching is done but when I print, my variable is empty. What am I doing wrong?
#!/usr/bin/perl
while (<STDIN>) {
foreach my $query_part (split(/\s+/, $_)) {
my($query_part_subject) = $query_part =~ /([^\w\#\.]+)?((?:\w{3,}|[\$\#()+.])+)(?::(\w{3,}.+))?/ ;
print "query_part : $query_part_subject \n";
}
}
exit(0);

Try doing this :
#!/usr/bin/perl
use strict; use warnings;
while (<DATA>) {
s/.*?://;
print join "\n", grep { length($_) > 2 } split;
__DATA__
subject:a bb ccc dddd fffff
OUTPUT
ccc
dddd
fffff
NOTE
from my understanding of your question : I display only the words length > 2 characters after the : character.

It isn't clear from your question. Is this what you are looking for??
$txt='I am about to create regex';
$x='(I)';
$y='.*?';
$z='(am)';
$re1=$x.$y.$z;
if ($txt =~ m/$re1/is)
{
$var1=$1;
$word1=$2;
print "($var1) ($word1) \n";
}

Related

How to capture string before the symbol

Morning all,
I want to capture the string before the colon:
and then compare the string after the colon
and remove the values which are equal to the values before the colon.
For example:
aaa:aaa-bbb-ccc
Output:
aaa others:bbb,ccc
my code below
$string = "aaa:aaa-bbb-ccc";
$first =~ /(:.*\)/; //get aaa before the colon
$others =$string=~ s/$first//; remove the same values after colon
Can you please help me ?? Thank you..

This works:
use strict;
use warnings;
my $s='aaa:aaa-bbb-ccc';
if ($s=~/^([^:]+):/) {
my $first=$1;
$s=~s/\Q$1\E://;
print "$first others:" . join ',', grep { $_ ne $first } split '-', $s;
}
# aaa others:bbb,ccc

Counting occurrences of a word in a string in Perl

I am trying to find out the number of occurrences of "The/the". Below is the code I tried"
print ("Enter the String.\n");
$inputline = <STDIN>;
chop($inputline);
$regex="\[Tt\]he";
if($inputline ne "")
{
#splitarr= split(/$regex/,$inputline);
}
$scalar=#splitarr;
print $scalar;
The string is :
Hello the how are you the wanna work on the project but i the u the
The
The output that it gives is 7. However with the string :
Hello the how are you the wanna work on the project but i the u the
the output is 5. I suspect my regex. Can anyone help in pointing out what's wrong.

I get the correct number - 6 - for the first string
However your method is wrong, because if you count the number of pieces you get by splitting on the regex pattern it will give you different values depending on whether the word appears at the beginning of the string. You should also put word boundaries \b into your regular expression to prevent the regex from matching something like theory
Also, it is unnecessary to escape the square brackets, and you can use the /i modifier to do a case-independent match
Try something like this instead
use strict;
use warnings;
print 'Enter the String: ';
my $inputline = <>;
chomp $inputline;
my $regex = 'the';
if ( $inputline ne '' ) {
my #matches = $inputline =~ /\b$regex\b/gi;
print scalar #matches, " occurrences\n";
}

With split, you're counting the substrings between the the's. Use match instead:
#!/usr/bin/perl
use warnings;
use strict;
my $regex = qr/[Tt]he/;
for my $string ('Hello the how are you the wanna work on the project but i the u the The',
'Hello the how are you the wanna work on the project but i the u the',
'the theological cathedral'
) {
my $count = () = $string =~ /$regex/g;
print $count, "\n";
my #between = split /$regex/, $string;
print 0 + #between, "\n";
print join '|', #between;
print "\n";
}
Note that both methods return the same number for the two inputs you mentioned (and the first one returns 6, not 7).

The following snippet uses a code side-effect to increment a counter, followed by an always-failing match to keep searching. It produces the correct answer for matches that overlap (e.g. "aaaa" contains "aa" 3 times, not 2). The split-based answers don't get that right.
my $i;
my $string;
$i = 0;
$string = "aaaa";
$string =~ /aa(?{$i++})(?!)/;
print "'$string' contains /aa/ x $i (should be 3)\n";
$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the The";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 6)\n";
$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 5)\n";

What you need is 'countof' operator to count the number of matches:
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/[Tt]he/g;
print $count;
If you want to select only the word the or The, add word boundary:
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/\b[Tt]he\b/g;
print $count;

How to grep the word between two symbols in Perl?

I need to grep the word between the symbols as shown below in an array.
my $string = "?hi how r u?what is your name?what is your age?";
It's to be converted to array where array should be like:
my $array[0]="hi how r u";
my $array[1]="what is your name";
my $array[2]="what is your age";

To ignore empty results you can match the input with regex and store matched results in an array:
use strict;
use warnings;
my $string = "?hi how r u?what is your name?what is your age?";
my #matches = ( $string =~ /(?<=\?)[^?]+/g );
foreach my $i (#matches) {
print $i . "\n";
}
Output:
hi how r u
what is your name
what is your age

You can use the split function, however you have to escape the ? character, so that it won't get special treatment as part of a regular expression control character.
my #array = split '\\?', $string;

Matching words with exactly one vowel

I want to match only the strings that have exactly one vowel.
I tried this code, and it works but it also matches those strings that haven't any vowels (for example hshs, ksks, lslsl) and I need only the strings that have just one vowel
if ( $string !~ /\*w[aeiou]\w*[aeiou]\W*/ ) {
print $string;
}

You can use tr/// to count the occurrences of letters in a string.
Something like this perhaps
use strict;
use warnings;
for my $string ( qw/ a fare is paid for every cab /) {
if ( $string =~ tr/aeiuoAEIOU// == 1 ) {
print $string, "\n";
}
}
output
a
is
for
cab

Make it simple, at least one vowel:
if ($string =~ /[aeiou]/i) {
print $string;
}
exactly one vowel:
if ($string =~ /^[^aeiou]*[aeiou][^aeiou]*$/i) {
print $string;
}

pattern matching in regular expression (Perl)

Make a pattern that will match three consecutive copies of whatever is currently contained in $what. That is, if $what is fred, your pattern should match fredfredfred. If $what is fred|barney, your pattern should match fredfredbarney, barneyfredfred, barneybarneybarney, or many other variations. (Hint: You should set $what at the top of the pattern test program with a statement like my $what = 'fred|barney';)
But my solution to this is just too easy so I'm assuming its wrong. My solution is:
#! usr/bin/perl
use warnings;
use strict;
while (<>){
chomp;
if (/fred|barney/ig) {
print "pattern found! \n";
}
}
It display what I want. And I didn't even have to save the pattern in a variable. Can someone help me through this? Or enlighten me if I'm doing/understanding the problem wrong?

This example should clear up what was wrong with your solution:
my #tests = qw(xxxfooxx oofoobar bar bax rrrbarrrrr);
my $str = 'foo|bar';
for my $test (#tests) {
my $match = $test =~ /$str/ig ? 'match' : 'not match';
print "$test did $match\n";
}
OUTPUT
xxxfooxx did match
oofoobar did match
bar did match
bax did not match
rrrbarrrrr did match
SOLUTION
#!/usr/bin/perl
use warnings;
use strict;
# notice the example has the `|`. Meaning
# match "fred" or "barney" 3 times.
my $str = 'fred|barney';
my #tests = qw(fred fredfredfred barney barneybarneybarny barneyfredbarney);
for my $test (#tests) {
if( $test =~ /^($str){3}$/ ) {
print "$test matched!\n";
} else {
print "$test did not match!\n";
}
}
OUTPUT
$ ./test.pl
fred did not match!
fredfredfred matched!
barney did not match!
barneybarneybarny did not match!
barneyfredbarney matched!

use strict;
use warnings;
my $s="barney/fred";
my #ra=split("/", $s);
my $test="barneybarneyfred"; #etc, this will work on all permutations
if ($test =~ /^(?:$ra[0]|$ra[1]){3}$/)
{
print "Valid\n";
}
else
{
print "Invalid\n";
}
Split delimited your string based off of "/". (?:$ra[0]|$ra[1]) says group, but do not extract, "barney" or "fred", {3} says exactly three copies. Add an i after the closing "/" if the case doesn't matter. The ^ says "begins with," and the $ says "ends with."
EDIT:
If you need the format to be barney\fred, use this:
my $s="barney\\fred";
my #ra=split(/\\/, $s);
If you know that the matching will always be on fred and barney, then you can just replace $ra[0], $ra[1] with fred and barney.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regex pattern match and extraction - regex

It isn't clear from your question. Is this what you are looking for?? $txt='I am about to create regex'; $x='(I)'; $y='.*?'; $z='(am)'; $re1=$x.$y.$z; if ($txt =~ m/$re1/is) { $var1=$1; $word1=$2; print "($var1) ($word1) \n"; }

Related

How to capture string before the symbol

Counting occurrences of a word in a string in Perl

How to grep the word between two symbols in Perl?

Matching words with exactly one vowel

pattern matching in regular expression (Perl)

Categories

Resources