I am using the following regex to check a name field for invalid characters...
if (!preg_match("/^[a-zA-Z ]*$/",$mystring))
Is there a way to also detect if the string is blank using regex? Or am I better off doing that using PHP?
You can just do a simple check without a regex:
if( $string == "") //do something
or
if(strlen($string) == 0) //do something
or
if(strlen(trim($string)) == 0) //do something
Or,
<?php
$str = "\r\n\t\0 ";
if (trim($str) == "") {
echo "This string is blank";
}
Trimming the string of any white space characters, including one or more spaces and then comparing the result to an empty string will detect a blank string. The advantage here is that you need only use one function, namely trim.
One may certainly use trim() and strlen() together to achieve the same result, but that requires two functions instead of one.
Using strlen() without trimming the input $str could lead to accepting a "blank" line, as follows:
<?php
$content = 'Content: ';
$str = " \r\n\t\0 ";
if ( strlen( $str ) == 0) {
echo 'blank line',"\n";
}
else
{
$content .= $str;
}
echo $content;
By not trimming the string of any white space characters the string length in this case is six but the a visibly blank $str gets appended to $content.
Related
I have the following piece of code:
$url = "http://www.example.com/url.html";
$content=Encode::encode_utf8(get $url);
$nameaux = Encode::encode_utf8($DBfield);
if($content =~ />$nameaux<\/a><\/td><td class="class1">(.*?)<\/td>/ ||
$content =~ />$nameaux<\/a><\/td><td class="class2">(.*?)<\/td>/ ||
$content =~ />$nameaux<\/a><\/td><td class="class3">(.*?)<\/td>/ ) {
... more code ...
}
This piece of code works great except when $DBfield is equal to a string containing a plus (ex. A+1) on it that exists on $content.
Could someone explain my how to handle this?
If $nameaux can contain regex characters (like +), you need to escape the field to a regex literal by wrapping with \Q ... \E.
$content =~ />\Q$nameaux\E<\/a><\/td><td class="class1">(.*?)<\/td>/ ||
So + will be just a plus sign and not mean "one or more of", which is why your regex doesn't match.
I am using a regex but am getting some odd, unexpected "matches". "Names" are sent to a subroutine to be compared to an array called #ASlist, which contains multiple rows. The first element of each row is also a name, followed by 0 to several synonyms. The goal is to match the incoming "name" to any row in #ASlist that has a matching cell.
Sample input, from which $names is derived for the comparison against #ASlist:
13 1 13 chr7 7 70606019 74345818 Otud7a Klf13 E030018B13Rik Trpm1 Mir211 Mtmr10 Fan1 Mphosph10 Mcee Apba2 Fam189a1 Ndnl2 Tjp1 Tarsl2 Tm2d3 1810008I18Rik Pcsk6 Snrpa1 H47 Chsy1 Lrrk1 Aldh1a3 Asb7 Lins Lass3 Adamts17
Sample lines from #ASlist:
HSPA5 BIP FLJ26106 GRP78 MIF2
NDUFA5 B13 CI-13KD-B DKFZp781K1356 FLJ12147 NUFM UQOR13
ACAN AGC1 AGCAN CSPG1 CSPGCP MSK16 SEDK
The code:
my ($name) = #_; ## this comes in from another loop elsewhere in code I did not include
chomp $name;
my #collectmatches = (); ## container to collect matches
foreach my $ASline ( #ASlist ){
my #synonyms = split("\t", $ASline );
for ( my $i = 0; $i < scalar #synonyms; $i++ ){
chomp $synonyms[ $i ];
#print "COMPARE $name TO $synonyms[ $i ]\n";
if ( $name =~m/$synonyms[$i]/ ){
print "\tname $name from block matches\n\t$synonyms[0]\n\tvia $synonyms[$i] from AS list\n";
push ( #collectmatches, $synonyms[0], $synonyms[$i] );
}
else {
# print "$name does not match $synonyms[$i]\n";
}
}
}
The script is working but also reports weird matches. Such as, when $name is "E030018B13Rik" it matches "NDUFA5" when it occurs in #ASlist. These two should not be matched up.
If I change the regex from ~m/$synonyms[$i]/ to ~m/^$synonyms[$i]$/, the "weird" matches go away, BUT the script misses the vast majority of matches.
The NDUFA5 record contains B13 as a pattern, which will match E030018<B13>Rik.
If you want to be more literal, then add boundary conditions to your regular expression /\b...\b/. Also should probably escape regular expression special characters using quotemeta.
if ( $name =~ m/\b\Q$synonyms[$i]\E\b/ ) {
Or if you want to test straight equality, then just use eq
if ( $name eq $synonyms[$i] ) {
Another, more Perlish way to test for string equality is to use a hash.
You don't show any real test data, but this short Perl program builds a hash from your array #ASlist of lines of match strings. After that, most of the work is done.
The subsequent for loop tests just E030018B13Rik to see if it is one of the keys of the new %ASlist and prints an appropriate message
use strict;
use warnings;
my #ASlist = (
'HSPA5 BIP FLJ26106 GRP78 MIF2',
'NDUFA5 B13 CI-13KD-B DKFZp781K1356 FLJ12147 NUFM UQOR13',
'ACAN AGC1 AGCAN CSPG1 CSPGCP MSK16 SEDK',
);
my %ASlist = map { $_ => 1 } map /\S+/g, #ASlist;
for (qw/ E030018B13Rik /) {
printf "%s %s\n", $_, $ASlist{$_} ? 'matches' : 'doesn\'t match';
}
output
E030018B13Rik doesn't match
Since you only need to compare two strings, you can simply use eq:
if ( $name eq $synonyms[$i] ){
You are using B13 as the regular expression. As none of the characters has a special meaning, any string containing the substring B13 matches the expression.
E030018B13Rik
^^^
If you want the expression to match the whole string, use anchors:
if ($name =~m/^$synonyms[$i]$/) {
Or, use index or eq to detect substrings (or identical strings, respectively), as your input doesn't seem to use any features of regular expressions.
I am trying to write a perl script that contains an if statement, and I want this if statement to check if a string is found via regex a certain number of times in a saved string. I would like to do this in a single line if possible, imagined like so:
$saved_string = "This abc is my abc test abc";
if( #something_to_denote_counting ($saved_string =~ /abc/) == 3)
{
print "abc was found in the saved string exactly 3 times";
}
else
{
print "abc wasn't found exactly 3 times";
}
...But I don't know what I need to do in that if statement to check for the number of times the regex matches. Can someone please tell me if this is possible? Thanks!
if ( 3 == ( () = $saved_string =~ /abc/g ) ) {
print "abc was found in the saved string exactly 3 times";
}
To get the count, you need to use /g in list context. So you could do:
#matches = $saved_string =~ /abc/g;
if ( #matches == 3 ) {
but perl provides a little help to make it easier; a list assignment, placed in scalar context (such as is provided by ==), returns the count of elements on the right side of the assignment. This enables code like:
while ( my ($key, $value) = each %hash ) {
So you could do:
if ( 3 == ( #matches = $saved_string =~ /abc/g ) ) {
but using an array isn't even necessary; assigning into an empty list is sufficient (and has become an idiom wherever you need to execute code in list context but only get a count of results).
Save matches to anon array reference, dereference it using #{} and compare to number,
if( #{[ $saved_string =~ /abc/g ]} == 3) {
print "abc was found in the saved string exactly 3 times";
}
I'm not entirely sure why my if-statements are not validating user input. Here's my code.
The statements that contain regular expressions are supposed to allow leading, and trailing white space.
sub Menu
{
&processlist;
&creating_Refs;
print "[Sort by COLUMN|sortup|sortdown| quit]:";
my $user_input = <STDIN>;
chomp($user_input);
if($user_input =~ m/[quit\s]/)
{
exit;
}
elsif($user_input eq 'sortup')
{
print "working bro\n\n";
#$VAR1 = sort sortup #$VAR1;
foreach my $ref (#$VAR1)
{
print "$ref->{PID}, $ref->{USER}, $ref->{PR}, $ref->{NI}, $ref->{VIRT}, $ref->{RES}, $ref->{SHR}, $ref->{S}, $ref->{CPU}, $ref->{MEM}, $ref->{TIME}, $ref->{COMMAND} \n";
}
}
elsif($user_input eq 'sortdown \n')
{
print "working on sortdown\n\n";
}
elsif($user_input =~ m/[sort by]+\w/)
{
}
else
{
print "Error, please re-enter command \n\n";
&Menu;
}
}
A character class like [abcd] allows any one of the characters specified in the square brackets. When you say [sort by], it is equivalent to /s|o|r|t| |b|y/, which will match any one of those characters, only once. If you want to match sort by, use /sort by/.
And in your case:
if($user_input =~ m/quit/){
exit;
}
and to match exact words use word boundaries:
if($user_input =~ m/\bquit\b/){
exit;
}
if($user_input =~ m/quit/){
exit;
}
Also chomp removes trailing \n
So:
elsif($user_input eq 'sortdown \n')
Will never be true.
I'm tying to come up with some regex that will remove all space chars from a string as long as it's not inside of double quotes (").
Example string:
some string with "text in quotes"
Result:
somestringwith"text in quotes"
So far I've come up with something like this:
$str =~ /"[^"]+"|/g;
But it doesn't seem to be giving the intended result.
I'm honestly very new at perl and haven't had too much regexp experience. So if anyone willing to answer would also be willing to provide some insight into the why and how that would be great!
Thanks!
EDIT
String will not contain escaped "'s
It should actually always be formatted like this:
Some.String = "Some Value"
Result would be
Some.String="Some Value"
Here is a technique using split to separate the quoted strings. It relies on your data being consistent and will not work with loose quotes.
use strict;
use warnings;
my #line = split /("[^"]*")/;
for (#line) {
unless (/^"/) {
s/[ \t]+//g;
}
}
print #line; # line is altered
Basically, you split up the string in order to isolate the quoted strings. Once that is done, perform the substitution on all other strings. Since the array elements are aliased in the loop, substitutions are performed on the actual array.
You can run this script like so:
perl -n script.pl inputfile
To see the output. Or
perl -n -i.bak script.pl inputfile
To do in-place edit on inputfile, while saving backup in inputfile.bak.
With that said, I'm not sure what your edit means. Do you want to change
Some.String = "Some Value"
to
Some.String="Some Value"
Text::ParseWords is tailor-made for this:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::ParseWords;
my #strings = (
q{This.string = "Hello World"},
q{That " string " and "another shoutout to my bytes"},
);
for my $s ( #strings ) {
my #words = quotewords '\s+', 1, $s;
print join('', #words), "\n";
}
Output:
This.string="Hello World"
That" string "and"another shoutout to my bytes"
Using Text::ParseWords means if you ever had to deal with quoted strings with escaped quotation marks in them, you'd be ready ;-)
Also, this sounds like you have a configuration file of some sort and you're trying to parse it. If that is the case, there are probably better solutions.
I suggest removing the quoted substrings using split and then recombining them with join after removing whitespace from the intermediate text.
Note that if the regex used for split contains captures then the captured values will also be included in the list returned.
Here's some sample code.
use strict;
use warnings;
my $source = <<END;
Some.String = "Some Value";
Other.String = "Other Value";
Last.String = "Last Value";
END
print join '', map {s/\s+// unless /"/; $_; } split /("[^"]*")/, $source;
output
Some.String= "Some Value";Other.String = "Other Value";Last.String = "Last Value";
I would simply loop through the string char by char. This way you can handle escaped strings too (just add an isEscaped variable).
my $text='lala "some thing with quotes " lala ... ';
my $quoteOpen = 0;
my $out;
foreach $char(split//,$text) {
if ($char eq "\"" && $quoteOpen==0) {
$quoteOpen = 1;
$out .= $char;
} elsif ($char eq "\"" && $quoteOpen==1) {
$quoteOpen = 0;
$out .= $char;
} elsif ($char =~ /\s/ && $quoteOpen==1) {
$out .= $char;
} elsif ($char !~ /\s/) {
$out .= $char;
}
}
print "$out\n";
Splitting on double quotes, removing spaces only from even fields (i.e. those in quotes):
sub remove_spaces {
my $string = shift;
my #fields = split /"/, $string . ' '; # trailing space needed to keep final " in output
my $flag = 1;
return join '"', map { s/ +//g if $flag; $flag = ! $flag; $_} #fields;
}
It can be done with regex:
s/([^ ]*|\"[^\"]*\") */$1/g
Note that this won't handle any kind of escapes inside the quotes.