regular expression for matching a string - regex

I'm trying to remove a part of a given string using the either of the two rules:
Eliminate all the consonant(s) at the beginning of a string
Eliminate all but the consonants at the beginning of a string.
Suppose my string is str. Is ${str%%[aeoui]{1}*} correct for the second rule? I'm not sure what to do for the first rule.

I'm not sure what language you are trying to implement this in, so I'll just use some generic syntax.
1. s/^[^aeiouAEIOU]*(.*)/\1/
2. s/^[aeiouAEIOU]*(.*)/\1/
There are ways to make it case insensitive, but I like being specific like this just for clarity.
The only difference between the two is ^ inside the [] in #1 which just negates it.
* means zero or more. If you use +, for instance, there would have to be at least one consonant in #1 and at least one vowel in #2 or the test would fail.
In my generic syntax here \1 returns what was found by (.*).
Here's some very crude Perl to demonstrate (where $1 in the print statements behaves as \1 in my example above):
#!/usr/bin/perl
$string1="abcdef";
$string2="fedcba";
if ($string1 =~ /^[aeiouAEIOU]*(.*)/) {
print "Test 1 on $string1: $1\n";
}
if ($string2 =~ /^[aeiouAEIOU]*(.*)/) {
print "Test 1 on $string2: $1\n";
}
if ($string1 =~ /^[^aeiouAEIOU]*(.*)/) {
print "Test 2 on $string1: $1\n";
}
if ($string2 =~ /^[^aeiouAEIOU]*(.*)/) {
print "Test 2 on $string2: $1\n";
}
And here's the output:
Test 1 on abcdef: bcdef
Test 1 on fedcba: fedcba
Test 2 on abcdef: abcdef
Test 2 on fedcba: edcba

Related

How to verify if a variable value contains a character and ends with a number using Perl

I am trying to check if a variable contains a character "C" and ends with a number, in minor version. I have :
my $str1 = "1.0.99.10C9";
my $str2 = "1.0.99.10C10";
my $str3 = "1.0.999.101C9";
my $str4 = "1.0.995.511";
my $str5 = "1.0.995.AC";
I would like to put a regex to print some message if the variable has C in 4th place and ends with number. so, for str1,str2,str3 -> it should print "matches". I am trying below regexes, but none of them working, can you help correcting it.
my $str1 = "1.0.99.10C9";
if ( $str1 =~ /\D+\d+$/ ) {
print "Candy match1\n";
}
if ( $str1 =~ /\D+C\d+$/ ) {
print "Candy match2\n";
}
if ($str1 =~ /\D+"C"+\d+$/) {
print "candy match3";
}
if ($str1 =~ /\D+[Cc]+\d+$/) {
print "candy match4";
}
if ($str1 =~ /\D+\\C\d+$/) {
print "candy match5";
}
if ($str1 =~ /C[^.]*\d$/)
C matches the letter C.
[^.]* matches any number of characters that aren't .. This ensures that the match won't go across multiple fields of the version number, it will only match the last field.
\d matches a digit.
$ matches the end of the string. So the digit has to be at the end.
I found it really helpful to use https://www.regextester.com/109925 to test and analyse my regex strings.
Let me know if this regex works for you:
((.*\.){3}(.*C\d{1}))
Following your format, this regex assums 3 . with characters between, and then after the third . it checks if the rest of the string contains a C.
EDIT:
If you want to make sure the string ends in a digit, and don't want to use it to check longer strings containing the formula, use:
^((.*\.){3}(.*C\d{1}))$
Lets look what regex should look like:
start{digit}.{digit}.{2-3 digits}.{2-3 digits}C{1-2 digits}end
very very strict qr/^1\.0\.9{2,3}\.101?C\d+\z/ - must start with 1.0.99[9]?.
very strict qr/^1\.\0.\d{2,3}\.\d{2,3}C\d{1,2}\z/ - must start with 1.0.
strict qr/^\d\.\d\.\d{2,3}\.\d{2,3}C\d{1,2}\z/
relaxed qr/^\d\.\d\.\d+\.\d+C\d+\z/
very relaxed qr/\.\d+C\d+\z/
use strict;
use warnings;
use feature 'say';
my #data = qw/1.0.99.10C9 1.0.99.10C10 1.0.999.101C9 1.0.995.511 1.0.995.AC/;
#my $re = qr/^\d\.\d\.\d+\.\d+C\d+\z/;
my $re = qr/^\d\.\d\.\d{2,3}\.\d{2,3}C\d+\z/;
say '--- Input Data ---';
say for #data;
say '--- Matching -----';
for( #data ) {
say 'match ' . $_ if /$re/;
}
Output
--- Input Data ---
1.0.99.10C9
1.0.99.10C10
1.0.999.101C9
1.0.995.511
1.0.995.AC
--- Matching -----
match 1.0.99.10C9
match 1.0.99.10C10
match 1.0.999.101C9

Same regex doesn't match twice

Trying to solve a problem in my perl script I finally could break it down to this situation:
my $content = 'test';
if($content =~ m/test/g) {
print "1\n";
}
if($content =~ m/test/g) {
print "2\n";
}
if($content =~ m/test/g) {
print "3\n";
}
Output:
1
3
My real case is just a bit different but at the end it's the same thing: I'm confused why regex 2 isn't matching. Does anyone has an explanation for this? I realized that /g seems to be the reason and of course this is not needed in my example. But (why) is this output normal behaviour?
This is exactly what /g in scalar context is supposed to do.
The first time it matches "test". The second match tries to start matching in the string after where the previous match left off, and fails. The third match then tries again from the beginning of the string (and succeeds) because the second match failed and you didn't also specify /c.
(/c keeps it from restarting at the beginning if a match fails; if your second match was /test/gc, the second and third match would both fail.)
Generally speaking, if (/.../g) makes no sense and should be replaced with if (/.../)[1].
You wouldn't expect the following to match twice:
my $content = "test";
while ($content =~ /test/g) {
print(++$i, "\n");
}
So why would you expect the following to match twice:
my $content = "test";
if ($content =~ /test/g) {
print(++$i, "\n");
}
if ($content =~ /test/g) {
print(++$i, "\n");
}
They're the same!
Let's imagine $content contains testtest.
The 1st time $content =~ /test/g is evaluated in scalar context,it matches the first test.
The 2nd time $content =~ /test/g is evaluated in scalar context,it matches the second test.
The 3rd time $content =~ /test/g is evaluated in scalar context,it returns false to indicate there are no more matches.This also resets the position at which $content future matches will start.
The 4th time $content =~ /test/g is evaluated in scalar context,it matches the first test.
...
There are advanced uses for if (/\G.../gc), but that's different. if (/.../g) only makes sense if you're unrolling a while loop. (e.g. while (1) { ...; last if !/.../g; ... }).

RegEx - Find everything that does not contain a pattern

I got this regex code:
((\w)(\w)\3\2)
It matches everything that contains something like anna, otto, xyyx ...
But I want to match everything that does NOT contain such a pattern.
How can i do that?
This issue has already been raised on this SO post. You should try this :
^((?!(\w)(\w)\3\2).)*$
Initially I thought this kind of does what you are asking. But for the reasons raised by #WiktorStribiżew below it does not work. In particular test strings such as AAAB and ABBC are supposed to match the below but do not
^((\w)(\w)(?!\3)(?!\2))
My second thought is to use
^((\w)(\w)(?!\3\2))
And this does seem to work.
New test program. This generates all possible strings from AAAA to ZZZZ. Then a non regexp check is used to test if each string should match or not. Finally, each string is checked for compliance against both the positive
$findrepeats, ^((\w)(\w)(\3)(\2)) matches abba
and the negative
$repeatnomatch ^((\w)(\w)(?!\3)(?!\2)) matches ab[not b][not a]
use strict;
use warnings;
my #fourchar=('AAAA'..'ZZZZ');
my #norepeats=();
my #hasrepeats=();
for my $pattern ('AAAA' .. 'ZZZZ') {
if (checkstring($pattern)) {
push #hasrepeats, $pattern;
} else {
push #norepeats, $pattern;
}
}
print scalar #hasrepeats, " strings with repeated abba\n";
print scalar #norepeats, " strings with ab[not b][not a]\n";
my $findsrepeats=qr/^((\w)(\w)(\3)(\2))/;
my $repeatnomatch=qr/^((\w)(\w)(?!\3\2))/;
for my $example (#hasrepeats) {
die $example if (not($example=~$findsrepeats));
die $example if ($example=~$repeatnomatch);
}
for my $example (#norepeats) {
die $example if (not($example=~$repeatnomatch));
die $example if ($example=~$findsrepeats);
}
print "pass\n";
sub checkstring {
my $s=shift;
my #element=split(//,$s);
return ($element[0] eq $element[3] &&
$element[1] eq $element[2]);
}
Running the above perl program should produce this output
$ perl nr3.pl
676 strings with repeated abba
456300 strings with ab[not b][not a]
pass

Reversing a string in perl without using "reverse" function

I was looking for clues on how to reverse a string in Perl without using the builtin reverse function and came across the following piece of code for reversing $str.
print +($str =~ /./g)[-$_] for (1 .. $#{[$str =~ /./g]} + 1);
I was trying to understand how this works a bit more and expanded the above code to something like this.
for (1 .. $#{[$str =~ /./g]} + 1) {
$rev_str_1 = ($str =~ /./g)[-$_];
print $rev_str_1;
}
The above code snippet also works fine. But, the problem comes when I add any print inside the for loop to understand how the string manipulation is working.
for (1 .. $#{[$str =~ /./g]} + 1) {
$rev_str_1 = ($str =~ /./g)[-$_];
print "\nin loop now ";
print $rev_str_1;
}
For input string of stressed, following is the output for above code
in loop now d
in loop now e
in loop now s
in loop now s
in loop now e
in loop now r
in loop now t
in loop now s
It seems like the entire string reversal is happening in this part ($str =~ /./g)[-$_] but I am trying to understand why is it not working when I add an extra print. Appreciate any pointers.
You're assuming that the string is reversed before being printed, but the program just prints all the characters in the string one at a time in reverse order
Here's how it works
It's based around the expression $str =~ /./g which uses a global regex match with a pattern that matches any single character. In list context it returns all the characters in the string as a list. Note that a dot . without the /s pattern modifier doesn't match linefeed. That's a bug, but probably isn't critical in this situation
This expression
$#{ [ $str =~ /./g ] } + 1
creates an anonymous array of the characters in $str with [ $str =~ /./g ]. Then uses $# to get the index of the last element of the array, and adds 1 to get the total number of characters (because the index is zero-based). So the loop is executing with $_ in the range 1 to the number of characters in $str. This is unnecessarily obscure and should probably be written 1 .. length($str) except for the special case of linefeed characters mentioned above
The body of the loop uses ($str =~ /./g)[-$_], which splits $str into characters again in the same way as before, and then uses the fact that negative indexes in Perl refer to elements relative to the end of the array or list. So the last character in $str is at index -1, the second to last at index -2 and so on. Again, this is unnecessarily arcane; the expression is exactly equivalent to substr($str, -$_, 1), again with the exception that the regex version ignores linefeed characters
Printing the characters one at a time like this results in $str being printed in reverse
It may be easier to understand if the string is split into a real array, and the reversed string is accumulated into a buffer, like this
my $reverse = '';
my #str = $str =~ /./sg;
for ( 1 .. #str ) {
$reverse .= $str[-$_];
}
print $reverse, "\n";
Or, using length and substr as described above, this is equivalent to
my $reverse = '';
$reverse .= substr($str, -$_, 1) for 1 .. length($str);
print $reverse, "\n";

Perl regex digit

Consider my regex in this code section:
use strict;
my #list = ("1", "2", "123");
&chk(#list);
sub chk {
my #num = split (" ", "#_");
foreach my $chk (#num) {
chomp $chk;
if ($chk =~ m/\d{1,2}?/) {
print "$chk\n";
}
}
}
The \d{4} will print nothing. The \d{3} will print only 123. But if I change to \d{1,2}? it will print all. I thought, according to all the sources I read so far, that {1,2} mean: one digit but no more than two. So it should have printed only 1 and 2, correct?
What do I need to extract items that contains only one to two digits?
\d{1,2} succeeds if it finds 1 or 2 digits anywhere in the string provided. Additional string content is does not cause the match to fail. If you want to match only when the string contains exactly 1 or 2 digits, do this: ^\d{1,2}$
You should anchor your regular expression for the desired effect. The built-in function grep suits better here since it is a selection from an array that is to be done:
#!/usr/bin/env perl
use strict;
use warnings;
my #list = ( 1, 2, 123 );
print join "\n", grep /^\d{1,2}$/, #list;
It appears to be working perfectly!
Here's a hint: Use the Perl variables $`, $&, and $'. These variables are special regular expression variables that show the part of the string before the match, what was matched, and the post matched string.
Here's a sample program:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use Scalar::Util;
my #list = ("1", "2", "123");
foreach my $string (#list) {
if ($string =~ /\d{1,2}?/) {
say qq(We have a match for "string"!);
say qq("$`" "$&" "$'");
}
else {
say "No match makes David Sad";
}
}
The output will be:
We have a match for "1"!
"" "1" ""
We have a match for "2"!
"" "2" ""
We have a match for "123"!
"" "1" "23"
What this does is divide up the string into three sections: The section of the string before the regular expression match, the section of the string that matches the regular expression, and the section of the string after the regular expression match.
In each case, there was no pre-match because the regular expression matches from the start of the string. We also see that \d{1,2}? matches a single digit in each case even through 123 could have matched two digits. Why? Because the question mark on the end of the match specifier tells the regular expression not to be greedy. In this case, we tell the regular expression to match either one or two characters. Fine, it matches on one. Remove the question mark, and the last line would have looked like this:
We have a match for "123"!
"" "12" "3"
If you want to match on one or two digits, but not three or more digits, you'll have to specify the part of your string before and after the one or two digits. Something like this:
/\D\d{1,2}\D/
This would match your string foo12bar, but not foo123bar. But what if the string is 12? In that case, we want to say that either we have the beginning of the string, or a non-digit before our one or two character match, and we either have a non-digit or the end of the string at the end of our one or two character match:
/(\D|^)\d{1,2}(/D|$)/
A quick explanation:
(\D|^): A non-digit or the beginning of the string (The ^ anchor)
d{1,2}: One or two digits
(\D|$): A non-digit or the end of the string (The $ anchor)
Now, this will match 12, but not 123, and it will match foo12 and foo12bar, but not foo123 or foo123bar.
Just looking for a one or two digit number, we can simply specify the anchors:
/^\d{1,2}$/;
Now, that will match 1, 12, but not foo12 or 123.
The main thing is to use the $`, $&, and $' variables in order to help see exactly what your regular expression is matching on and what's before and after your match.
No, because while the regex only matches two digits, $chk still contains 123. If you want to only print the part that is matched, use
if ($chk =~ m/(\d{1,2})/) {
print "$1\n";
}
Note the parentheses and the $1. This causes it to print only that which is in the parentheses.
Also, this code doesn't make much sense:
sub chk {
my #num = split (" ", "#_");
Because #_ already is an array it makes no sense to make it into a string and then split it. Simply do:
sub chk {
foreach my $chk (#_) {
You also do not need to use chomp for data that is not coming from user input, as it is intended to remove the trailing newline. There is no newline in any of this data.
#!/usr/bin/perl
use strict;
my #list = ("1", "2", "123");
&chk(\#list);
sub chk {
foreach my $chk (#{$_[0]}) {
print "$chk\n" if $chk =~ m/^\d{1,2}$/ ;
}
}
#!/usr/bin/perl
use strict;
use warnings;
my #list = ("1", "2", "123");
&chk(#list);
sub chk {
my #num = split (" ", "#_");
foreach my $chk (#num) {
chomp $chk;
if ($chk =~ m/\d{1,2}/ && length($chk) <= 2) {
print "$chk\n";
}
}
}