My question is basically as the title says. How can I check, in perl, whether a user inputted string
is only certain characters long (for example, 3 characters long)
contains only characters (for example, it should only contain the characters 'u', 'a', 'g', and 'c')
my $input = <>;
if ($input =~ /^[uagc]{3}$/g){
# your codes
} else {
#exit or die;
}
$s =~ /^[uagc]{3}\z/
or die("usage\n");
Related
I have the following piece of code:
$url = "http://www.example.com/url.html";
$content=Encode::encode_utf8(get $url);
$nameaux = Encode::encode_utf8($DBfield);
if($content =~ />$nameaux<\/a><\/td><td class="class1">(.*?)<\/td>/ ||
$content =~ />$nameaux<\/a><\/td><td class="class2">(.*?)<\/td>/ ||
$content =~ />$nameaux<\/a><\/td><td class="class3">(.*?)<\/td>/ ) {
... more code ...
}
This piece of code works great except when $DBfield is equal to a string containing a plus (ex. A+1) on it that exists on $content.
Could someone explain my how to handle this?
If $nameaux can contain regex characters (like +), you need to escape the field to a regex literal by wrapping with \Q ... \E.
$content =~ />\Q$nameaux\E<\/a><\/td><td class="class1">(.*?)<\/td>/ ||
So + will be just a plus sign and not mean "one or more of", which is why your regex doesn't match.
I am trying to find out the number of occurrences of "The/the". Below is the code I tried"
print ("Enter the String.\n");
$inputline = <STDIN>;
chop($inputline);
$regex="\[Tt\]he";
if($inputline ne "")
{
#splitarr= split(/$regex/,$inputline);
}
$scalar=#splitarr;
print $scalar;
The string is :
Hello the how are you the wanna work on the project but i the u the
The
The output that it gives is 7. However with the string :
Hello the how are you the wanna work on the project but i the u the
the output is 5. I suspect my regex. Can anyone help in pointing out what's wrong.
I get the correct number - 6 - for the first string
However your method is wrong, because if you count the number of pieces you get by splitting on the regex pattern it will give you different values depending on whether the word appears at the beginning of the string. You should also put word boundaries \b into your regular expression to prevent the regex from matching something like theory
Also, it is unnecessary to escape the square brackets, and you can use the /i modifier to do a case-independent match
Try something like this instead
use strict;
use warnings;
print 'Enter the String: ';
my $inputline = <>;
chomp $inputline;
my $regex = 'the';
if ( $inputline ne '' ) {
my #matches = $inputline =~ /\b$regex\b/gi;
print scalar #matches, " occurrences\n";
}
With split, you're counting the substrings between the the's. Use match instead:
#!/usr/bin/perl
use warnings;
use strict;
my $regex = qr/[Tt]he/;
for my $string ('Hello the how are you the wanna work on the project but i the u the The',
'Hello the how are you the wanna work on the project but i the u the',
'the theological cathedral'
) {
my $count = () = $string =~ /$regex/g;
print $count, "\n";
my #between = split /$regex/, $string;
print 0 + #between, "\n";
print join '|', #between;
print "\n";
}
Note that both methods return the same number for the two inputs you mentioned (and the first one returns 6, not 7).
The following snippet uses a code side-effect to increment a counter, followed by an always-failing match to keep searching. It produces the correct answer for matches that overlap (e.g. "aaaa" contains "aa" 3 times, not 2). The split-based answers don't get that right.
my $i;
my $string;
$i = 0;
$string = "aaaa";
$string =~ /aa(?{$i++})(?!)/;
print "'$string' contains /aa/ x $i (should be 3)\n";
$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the The";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 6)\n";
$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 5)\n";
What you need is 'countof' operator to count the number of matches:
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/[Tt]he/g;
print $count;
If you want to select only the word the or The, add word boundary:
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/\b[Tt]he\b/g;
print $count;
I need to sort lowercase strings based on the vowels appearing in the given strings that are inputted from a file. I need to print the sorted list of strings on the command prompt.
i.e. the vowel will be the substring of vowels appearing in stringA (vowelA), and vowelB the corresponding substring of stringB. StringA appears before stringB in the output if the substring vowelA appears before vowelB in the ascending ASCII order.
What I have currently:
#!/usr/bin/perl -w
use warnings;
use strict;
open my $INFILE, '<', $ARGV[0] or die $!;
while( my $line = <$INFILE> ) {
sub sort_vowels {
my $vowels_a = $a;
my $vowels_b = $b;
$vowels_a =~ s/[^aeiou]//g; # only vowels
$vowels_b =~ s/[^aeiou]//g;
return $vowels_a cmp $vowels_b; # compare the substrings
}
}
print sort { sort_vowels }; # print the sorted strings
close $INFILE;
Sample Input:
albacore
albatross
vermeil
panacea
apparate
parmesan
candelabra
fanfare
false
beans
This should output:
apparate
fanfare
panacea
albatross
albacore
false
parmesan
candelabra
beans
vermeil
The error I'm getting:
syntax error at sort_strings.pl line 22, near "};"
Execution of sort_strings.pl aborted due to compilation errors.
Not sure where I went wrong. Any help would be greatly appreciated!
Perhaps print sort { sort_vowels } <$INFILE>; is what you're looking for.
while and foreach loops allow you to work with a single element at a time, but sort requires an entire list as it's input.
Well, if you consider a vowels-only version of the string a key to the sort order of the words, then you can process each word like so:
push #{ $hash{ lc ( $word =~ s/[^aeiou]//igr ) } }, $word;
Starting with Perl 5.14 the /r flag returns the result. Same could be done this way, pre-5.14:
push #{ $hash{ lc( join( '', $word =~ m/([aeiou]+)/ig )) } }, $word;
Then outputting the order is only a matter of getting a sorted set of keys and dereffing the list of words stored within those keys:
say foreach map { #{ $hash{ $_ } } } sort keys %hash;
I am pretty new to regular expressions. I want to write a regular expression which validates whether the given string has only certain characters. If the string has any other characters than these it should not be matched.
The characters I want are:
& ' : , / - ( ) . # " ; A-Z a-z 0-9
Try this:
$val =~ m/^[&':,\/\-().#";A-Za-z0-9]+$/;
$val will match if it has at least one character and consists entirely of characters in that character set. An empty string will not be matched (if you want an empty string to match, change the last + to a *).
You can test it out yourself:
# Here's the file contents. $ARGV[0] is the first command-line parameter.
# We print out the matched text if we have a match, or nothing if we don't.
[/tmp]> cat regex.pl
$val = $ARGV[0];
print ($val =~ m/^[&':,\/\-().#";A-Za-z0-9]+$/g);
print "\n";
Some examples:
# Have to escape ( and & in the shell, since they have meaning.
[/tmp]> perl regex.pl a\(bc\&
a(bc&
[/tmp]> perl regex.pl abbb%c
[/tmp]> perl regex.pl abcx
abcx
[/tmp]> perl regex.pl 52
52
[/tmp]> perl regex.pl 5%2
/\A[A-Za-z0-9&':,\().#";-]+\z/
Those so called special characters are not special in a character class.
There are two main approaches to construct a regular expression for this purpose. First is to make sure that all symbols are allowed. Another is to make sure that no symbols are not allowed. And you can also use the transliteration operator instead. Here's a benchmark:
use Benchmark 'cmpthese';
my #chars = ('0' .. '9', 'A' .. 'Z', 'a' .. 'z');
my $randstr = map $chars[rand #chars], 1 .. 16;
sub nextstr() { return $randstr++ }
cmpthese 1000000, {
regex1 => sub { nextstr =~ /\A["#&'(),\-.\/0-9:;A-Za-z]*\z/ },
regex2 => sub { nextstr !~ /[^"#&'(),\-.\/0-9:;A-Za-z]/ },
tr => sub { (my $dummy = nextstr) !~ y/"#&'(),\-.\/0-9:;A-Za-z/"#&'(),\-.\/0-9:;A-Za-z/c },
};
Results:
Rate regex1 regex2 tr
regex1 137552/s -- -41% -60%
regex2 231481/s 68% -- -32%
tr 341297/s 148% 47% --
/^[&':,/-().#";A-Za-z0-9]*$/
I need to extract certain Abbreviations from a file such as ABS,TVS,and PERL. Any abbreviations which are in uppercase letters. I'd preferably like to do this with a regular expression. Any help is appreciated.
It would have been nice to hear what part you were particularly having trouble with.
my %abbr;
open my $inputfh, '<', 'filename'
or die "open error: $!\n";
while ( my $line = readline($inputfh) ) {
while ( $line =~ /\b([A-Z]{2,})\b/g ) {
$abbr{$1}++;
}
}
for my $abbr ( sort keys %abbr ) {
print "Found $abbr $abbr{$abbr} time(s)\n";
}
Reading text to be searched from standard input and writing
all abbreviations found to standard output, separated by spaces:
my $text;
# Slurp all text
{ local $/ = undef; $text = <>; }
# Extract all sequences of 2 or more uppercase characters
my #abbrevs = $text =~ /\b([[:upper:]]{2,})\b/g;
# Output separated by spaces
print join(" ", #abbrevs), "\n";
Note the use of the POSIX character class [:upper:], which will match
all uppercase characters, not just English ones (A-Z).
Untested:
my %abbr;
open (my $input, "<", "filename")
|| die "open: $!";
for ( < $input > ) {
while (s/([A-Z][A-Z]+)//) {
$abbr{$1}++;
}
}
Modified it to look for at least two consecutive capital letters.
#!/usr/bin/perl
use strict;
use warnings;
my %abbrs = ();
while(<>){
my #words = split ' ', $_;
foreach my $word(#words){
$word =~ /([A-Z]{2,})/ && $abbrs{$1}++;
}
}
# %abbrs now contains all abreviations