Strings with quotemeta enabled not able to match specific regex - regex

For following strings with quotemeta enabled, the if statements are not able to match .cpp and .o file names. Am I doing anything wrong here.
E\:\\P4\\NTG5\\PATHOLOGY_products\\arm\-qnx\-m650\-4\.4\.2\-osz\-trc\-dbg\\gen\\deliveries\\ntg5\\arm\\api\\sys\\most\\pf\\mss\\src\\private\\DSIDSYSMOSTServerMoCCAStream\.cpp\
`E\:\\P4\\NTG5\\PATHOLOGY_products\\arm\-qnx\-m650\-4\.4\.2\-osz\-trc\-dbg\\bin\\deliveries\\ntg5\\arm\\api\\sys\\most\\pf\\mss\\src\\DSIDSYSMOSTServerMoCCAStream\.o\`
if ($a_path =~ m/[\\>](\w+\.(?:cpp|c))/) {
$compile_line_array->source_filename($a_path);
$compile_line_array->include_list_index($include_path_cnt);
$j=0;
last;
}
if($a_path =~ m/[\\>](\w+\.(?:o))/) {
$compile_line_array->object_file($a_path);
}

The regexes match a word character followed by a .; if your strings have a backslash before every ., they will not match.
Somehow, you are not thinking about this correctly: "quotemeta" isn't something that is enabled or disabled, it is an operator that sticks backslashes before some characters in your string. Why are you using it in the first place?

Why do you have your filenames run through quotemeta? As you've demonstrated, that's going to backslash escape all your .'s. Therefore if that's what you want to match against, you'll have to add some backslashes to your regex.
if ($a_path =~ m/[\\>](\w+\\\.(?:cpp|c))/) {
or
if($a_path =~ m/[\\>](\\\w+\.(?:o))/) {

Related

perl regex remove newlines in string

I have a Perl script which runs over a database dump in a plain text file, trying to remove all instances of newlines and possibly other odd characters when I see strings between quotes:
INSERT INTO ... VALUES ( "... these are the lines I'm interested in." )
I slurp in the file:
#file = <FILE>;
and:
foreach my $line (#file) {
$line =~ s/"[^"]*(\R)+[^"]*"//g;
# I want to get rid of newlines in strings
# And other odd characters I might come across
}
One character class I used instead of (\R) was:
([\r\n\t\v\f]+)
and I would try to:
$line =~ s/"[^"]+?([\r\n\t\v\f]+)[^"]*"//g;
I'm sure I'm missing something. I try to start matching with a literal double quote, scan past anything not a double quote (non-greedy, at least one match), reach the characters I want to get rid of, and keep scanning not double quote (any number of other characters not a double quote) until I reach the ending double quote.
So I wanted to replace $1 capture above with nothing.
I've tried on-line regex builders, and
/"[^"]*?([\r\n\t\f\v]+)[^"]*"/
worked with an on-line test, using a short paragraph with newlines and tabs in it, although it was in PHP pcre mode. I thought it would have worked with Perl.
Perhaps I'm not escaping some characters properly in the regex for Perl? Or the pattern is just not going to work the way I want it to, because it's wrong.
Thank you, any help appreciated.
The regex at regex101.com:
"[^"]*?([\r\n\f\t\v]+)[^"]*?"
matches for strings like this:
"This is
my\t test
string.
So there!"
I'm thoroughly puzzled now. :)
The real problem is that you will only find one group of \R's when there could be many groups between quotes. The best thing to do is make a callback (eval) with a general match between quotes, then substitute the \R's in
the replacement.
something like:
sub repl {
my ($content) = _#;
$content =~ s/\R+//g;
return $content;
}
$input =~ s/"([^"]*)"/ repl($1) /ge;
edit: If you're looking for only 1 linebreak cluster, you have to
exclude linebreaks leading up to it. For example: [^"\r\n]+
edit2: To slurp the file into $input, do a
$/ = undef;
my $input = <$fh>;

Regex Match "words" That Contain Periods perl

I am going through a TCPDUMP file (in .txt format) and trying to pull out any line that contains the use of the "word" V.I.D.C.A.M. It is embedded between a bunch of periods and includes periods which is really screwing me up, here is an example:
E..)..#.#.8Q...obr.f...$[......TP..P<........SMBs......................................NTLMSSP.........0.........`b.m..........L.L.<...V.I.D.C.A.M.....V.I.D.C.A.M.....V.I.D.C.A.M.....V.I.D.C.A.M.....V.I.D.C.A.M..............W.i.n.d.o.w.s. .5...1...W.i.n.d.o.w.s. .2.0.0.0. .L.A.N. .M.a.n.a.g.e.r..
How do you handle something like that?
You need to escape the periods:
if ($string =~ m{V\.I\.D\.C\.A\.M\.}) {
...
}
or if your string is entirely quoted, use the \Q which escapes any metacharacters that follow.
if ($string =~ m{\QV.I.D.C.A.M.})

Search for a pattern in a string

I need to determine whether
+review CR #someuser
exists in a string. How can I do that?
You will need to escape the meta characters and use a matching regex m//.
if ($string =~ /\+review CR \#someuser/) {
# do something
}
Note that you cannot use the \Q ... \E escape sequence to escape the meta characters, since #someuser will still be interpolated. You could use it for the +, but then you would still need to escape the #, so this way is simpler. You can also use the quotemeta function. However, in this case that might be overkill.
Read more about this in perldoc perlop
Use index:
$search_string = "+review CR \#someuser";
if (index($string, $search_string) != -1) { # found }
Or, if you use a regex, you'll want to make sure the '+' and '#' are properly escaped:
if ( $string =~ m#\+review CR \#someuser# ) {
# found
}

What regex should be used to avoid symbols before text in perl?

i have following variable. i only want to print yes if the variable has "imoport/canada/campingplaces/tobermory" not # or anything. What should insert in a regex for this kind of things.
my $textfile = "# imoport/canada/campingplaces/tobermory
imoport/canada/campingplaces/tobermory
#imoport/canada/campingplaces/tobermory";
my $textNeeded= "imoport/canada/campingplaces/tobermory"
THIS IS WHAT i am using
if ($textfile =~ m/$textNeeded/i) {
print "yes working"
}
note:- i am getting data from differnt text files so some text files might just have "#imoport/canada/campingplaces/tobermory". I want to avoid those
Despite the quite vague problem description, I think I have puzzled out what you mean. You mean you may have lines where the text is commented out with #, and you want to avoid matching those.
print "yes" if $textfile =~ /^\s*$textNeeded/im;
This will match any string inside $textfile which has a newline followed by optional whitespace followed by your string. The /m option makes the regex multiline, meaning that ^ and $ match line endings represented by newlines inside a larger string.
You may wish to be wary of regex meta characters in your search string. If for example your search string is foo[bar].txt, those brackets will be interpreted as a character class instead. In which case you would use
/^\s*\Q$textNeeded\E/im
instead. The \Q ... \E will make the text inside match only literal characters.
I think you need to create an Anchor to say you want a match if your target string appears at the BEGINNING of the line. This uses the up-carat symbol:
if ($textfile =~ m/^$textNeeded/i) {
print "yes working"
}
This wont report a match if you have spaces or tabs before your textNeeded string.
To simply return the rows having no leading hash, something like this:
my $textfile = "# imoport/canada/campingplaces/tobermory
imoport/canada/campingplaces/tobermory
#imoport/canada/campingplaces/tobermory";
for (split /^/, $textfile) {
print $_ if(m/^\s*[a-zA-Z].*/);
}
Returns:
imoport/canada/campingplaces/tobermory

Slashes and hashes in Perl and metacharacters

I'd asked earlier about escaping special characters and understand the rules around // and ## yet the example below doesn't work and by my understanding I need to escape, the escape character. It's being searched for as match for it's usual meaning of \ between the names. I'm stumped. Please help. This has me in knots, despite probably appearing easy to the masses.I know I could have written it as $userInfo =~ #\#;
#!C:\strawberry\perl\bin\perl.exe
#strict
#diagnostics
$userInfo = "firstname\middlename\lastname.";
if($userInfo =~ m/\\/){
print("Found it");
}
else{
print("No match found");
}
The problem there is that you have to escape the backslash in your $userInfo assignment too:
$userInfo = "firstname\\middlename\\lastname.";
You are trying to search a string which contains the literal bakslash character \. Double quotes interpolate. Use single quotes instead.
use warnings;
use strict;
my $userInfo = 'firstname\middlename\lastname.';
if ($userInfo =~ m/\\/){
print("Found it");
}
else{
print("No match found");
}
The warnings pragma would have generated a warning message.
See also: Quote and Quote-like Operators
I agree with toolic, if you can, use single quotes.
It will save some preprocessing time needed for string interpolation.
However, if you really need to escape special characters, you can write it like this:
#!C:\strawberry\perl\bin\perl.exe
#strict
#diagnostics
$userInfo = "firstname\\middlename\\lastname."; #please note escaped backslahes
if($userInfo =~ m/\\/)
{
print("Found it");
}
else
{
print("No match found");
}