Regex match a specific filename format Perl - regex

I'm trying to match a filename format which is filename_nrows_ncols. I came up with (_[\d]+_[\d]+)$ and tested it in Rubular and it works there. http://www.rubular.com/r/W7DKNhmpMV
But when I'm trying to assugn the match to a variable in my perl code, I get Use of uninitialized value... error. What's wrong with my regex? Thanks in adv.
$match =~ /(_[\d]+_[\d]+)$/;

Without seeing your code, it's hard to say, but I'd imagine it should look something like this:
if ($filename =~ /(\d+_\d+)$/) {
# Do something
}
By the way the [] around [\d] isn't necessary in this case. If you had something other than the \d within it, it would be.
-- EDIT --
I think I see what's wrong. You want the results of the regex to go into $match. If that's the case, assuming your filename is in the default variable, then you probably want this:
my ($match) = /(\d+_\d+)$/;
or if it's in another variable
my ($match) = $filename =~ /(\d+_\d+)$/;
The error, by the way, only appears to be a warning from "use warnings" or -W. It's a good one, though.

You'll need to provide the entire line of code that's causing that error. The regular expression itself looks fine (although you may be better off with something like ^(.+)_(\d+)_(\d+)$ if you plan on doing anything with the filename, nrows, or ncols (which would then be stored in $1, $2, and $3 respectively).

Related

Split using Regular Expression, with negative look behind and ahead to skip inside a block

Given a string, i need to split the string on forward slashes, but only if those forward slashes don't appear in a {} block.
I know this can be accomplished in a variety of other ways. At this point, I just want to see if this is possible and what the regex will look like. And if it is functional, it would likely speed up the program a little bit too. Win win. :)
Using perl in the following examples, though it may ultimately be implemented in another language.
This is essentially what I want to do:
#!/bin/perl -w
use strict;
use Data::Dumper;
my #strings = (
"/this/that",
"/this/",
"/this/{ID=/foo\/bar/}",
"/this/{ID=foo/bar}",
"/this/{/}",
"/this/{ID=/foobar/}/that/foo/",
"/this/{ID=f/o/o/b/a/r/}",
"/this/{ID=/foobar/}/that/{bar}/that"
);
foreach my $string (#strings) {
print $string."\n";
my #items = split(/(?<!{.*?)\/(?!.*?})/,$string);
print Dumper(\#items);
}
The problem is that you can't use a variable length look behind.
So, i've been playing with using only look aheads to accomplish the same thing.
The closest I've been able to come is using this line for the split:
my #items = split(/\/(?![^{].*?}|})/,$string);
That almost gets it, but doesn't split on / before a {} block. So i end up with results like this:
$VAR1 = [
'/this',
'{ID=/foobar/}/that',
'{bar}',
'that'
];
where it should be:
$VAR1 = [
'this',
'{ID=/foobar/}',
'that',
'{bar}',
'that'
];
Thanks in advance.
You can change your current regex to:
/(?![^{]*\})
It will match a / if there's no } ahead of it.
For example, you will get a split where there are matches here.
But I think that it'd be perhaps easier with a match instead?
\{[^}]*\}|[^/]+
regex101 demo
Now, the above assume there's no nesting of braces within the strings.

QRegex look ahead/look behind

I have been pondering on this for quite awhile and still can't figure it out. The regex look ahead/behinds. Anyway, I'm not sure which to use in my situation, I am still having trouble grasping the concept. Let me give you an example.
I have a name....My Business,LLC (Milwaukee,WI)_12345678_12345678
What I want to do is if there is a comma in the name, no matter how many, remove it. At the same time, if there is not a comma in the name, still read the line. The one-liner I have is listed below.
s/(.*?)(_)(\d+_)(\d+$)/$1$2$3$4/gi;
I want to remove any comma from $1(My Business,LLC (Milwaukee,WI)). I could call out the comma in regex as a literal string((.?),(.?),(.*?)(_.*?$)) if it was this EXACT situation everytime, however it is not.
I want it to omit commas and match 'My Business, LLC_12345678_12345678' or just 'My Business_12345678_12345678', even though there is no comma.
In any situation I want it to match the line, comma or not, and remove any commas(if any) no matter how many or where.
If someone can help me understand this concept, it will be a breakthrough!!
Use the /e modifier of Perl so that you can pass your function during the replace in s///
$str = 'My Business,LLC (Milwaukee,WI)_12345678_12345678';
## modified your regex as well using lookahead
$str =~ s/(.*?)(?=_\d+_\d+$)/funct($1)/ge;
print $str;
sub funct{
my $val = shift;
## replacing , with empty, use anything what you want!
$val =~ s/,//g;
return $val;
}
Using funct($1) in substitute you are basically calling the funct() function with parameter $1

How to find value after specific/static string using regex(perl)?

I'm still learning regex, and have a long ways to go so would appreciate help from any of you with more regex experience. I'm working on a perl script to parse multiple log files, and parse for certain values. In this case, I'm trying to get a list of user names.
Here's what my log file looks like:
[date timestamp]UserName = Joe_Smith
[date timestamp]IP Address = 10.10.10.10
..
Just testing, I've been able to pull it out using \UserName\s\=\s\w+, however I just want the actual UserName value, and not include the 'UserName =' part. Ideally if I can get this to work, I should be able to apply the same logic for pulling out the IP Address etc, but just hoping to get list of Usernames for the moment.
Also, the usernames are always in the format above of Firstname_Lastname, so I believe \w+ should always get everything I need.
Appreciate any help!
You should capture the part of the matched string that you are interested in using parentheses in the regular expression.
If the match succeeds, then captures are available in the built-in variables $1, $2 etc, numbered in the order that their opening parenthesis appears in the regular expressions.
In this case you need only a single capture so you need look only at $1.
Beware that you should always check that a regex match succeeded before using the values in the capture variables, as they retain the values from the last successful match and a failed match doesn't reset them.
use strict;
use warnings;
my $str = '[date timestamp]UserName = Joe_Smith';
if ($str =~ /UserName = (\w+)/) {
print $1, "\n";
}
output
Joe_Smith
Another way to do it:
my ($username) = $str =~ /UserName\s\=\s(\w+)/
or warn "no username parsed from '$str'\n";
You should make the regex as \UserName\s\=\s(\w+)$ And after this the part in the bracket will be available in the variable $1. My perl is a bit rusty, so if it doesnt work right, look at http://www.troubleshooters.com/codecorn/littperl/perlreg.htm#StringSelections

Perl pattern match variable question

I'm trying to open a file, match a particular line, and then wrap HTML tags around that line. Seems terribly simple but apparently I'm missing something and don't understand the Perl matched pattern variables correctly.
I'm matching the line with this:
$line =~ m/(Number of items:.*)/i;
Which puts the entire line into $1. I try to then print out my new line like this:
print "<p>" . $1 . "<\/p>;
I expect it to print this:
<p>Number of items: 22</p>
However, I'm actually getting this:
</p>umber of items: 22
I've tried all kinds of variations - printing each bit on a separate line, setting $1 to a new variable, using $+ and $&, etc. and I always get the same result.
What am I missing?
You have an \r in your match, which when printed results in the malformed output.
edit:
To explain further, chances are your file has windows style \r\n line endings. chomp won't remove the \r, which will then get slurped into your greedy match, and results in the unpleasant output (\r means go back to the start of the line and continue printing).
You can remove the \r by adding something like
$line =~ tr/\015//d;
Can you provide a complete code snippet that demonstrates your problem? I'm not seeing it.
One thing to be cautious of is that $1 and friends refer to captures from the last successful match in that dynamic scope. You should always verify that a match succeeds before using one:
$line = "Foo Number of items: 97\n";
if ( $line =~ m/(Number of items:.*)/i ) {
print "<p>" . $1 . "<\/p>\n";
}
You've just learned (for future reference) how dangerous .* can be.
Having banged my head against similar unpleasantnesses, these days I like to be as precise as I can about what I expect to capture. Maybe
$line =~ m/(Number of items:\s+\d+)/;
Then I'm sure of not capturing the offending control character in the first place. Whatever Cygwin may be doing with Windows files, I can remain blissfully ignorant.

Embedding evaluations in Perl regex

So i'm writing a quick perl script that cleans up some HTML code and runs it through a html -> pdf program. I want to lose as little information as possible, so I'd like to extend my textareas to fit all the text that is currently in them. This means, in my case, setting the number of rows to a calculated value based on the value of the string inside the textbox.
This is currently the regex i'm using
$file=~s/<textarea rows="(.+?)"(.*?)>(.*?)<\/textarea>/<textarea rows="(?{ length($3)/80 })"$2>$3<\/textarea>/gis;
Unfortunately Perl doesn't seem to be recognizing what I was told was the syntax for embedding Perl code inside search-and-replace regexs
Are there any Perl junkies out there willing to tell me what I'm doing wrong?
Regards,
Zach
The (?{...}) pattern is an experimental feature for executing code on the match side, but you want to execute code on the replacement side. Use the /e regular-expression switch for that:
#! /usr/bin/perl
use warnings;
use strict;
use POSIX qw/ ceil /;
while (<DATA>) {
s[<textarea rows="(.+?)"(.*?)>(.*?)</textarea>] {
my $rows = ceil(length($3) / 80);
qq[<textarea rows="$rows"$2>$3</textarea>];
}egis;
print;
}
__DATA__
<textarea rows="123" bar="baz">howdy</textarea>
Output:
<textarea rows="1" bar="baz">howdy</textarea>
The syntax you are using to embed code is only valid in the "match" portion of the substitution (the left hand side). To embed code in the right hand side (which is a normal Perl double quoted string), you can do this:
$file =~ s{<textarea rows="(.+?)"(.*?)>(.*?)</textarea>}
{<textarea rows="#{[ length($3)/80 ]}"$2>$3</textarea>}gis;
This uses the Perl idiom of "some string #{[ embedded_perl_code() ]} more string".
But if you are working with a very complex statement, it may be easier to put the substitution into "eval" mode, where it treats the replacement string as Perl code:
$file =~ s{<textarea rows="(.+?)"(.*?)>(.*?)</textarea>}
{'<textarea rows="' . (length($3)/80) . qq{"$2>$3</textarea>}}gise;
Note that in both examples the regex is structured as s{}{}. This not only eliminates the need to escape the slashes, but also allows you to spread the expression over multiple lines for readability.
Must this be done with regex? Parsing any markup language (or even CSV) with regex is fraught with error. If you can, try to utilize a standard library:
http://search.cpan.org/dist/HTML-Parser/Parser.pm
Otherwise you risk the revenge of Cthulu:
http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html
(Yes, the article leaves room for some simple string-manipulation, so I think your soul is safe, though. :-)
I believe your problem is an unescaped /
If it's not the problem, it certainly is a problem.
Try this instead, note the \/80
$file=~s/<textarea rows="(.+?)"(.*?)>(.*?)<\/textarea>/<textarea rows="(?{ length($3)\/80 })"$2>$3<\/textarea>/gis;
The basic pattern for this code is:
$file =~ s/some_search/some_replace/gis;
The gis are options, which I'd have to look up. I think g = global, i = case insensitive, s = nothing comes to mind right now.
First, you need to quote the / inside the expression in the replacement text (otherwise perl will see a s/// operator followed by the number 80 and so on). Or you can use a different delimiter; for complex substitutions, matching brackets are a good idea.
Then you get to the main problem, which is that (?{...}) is only available in patterns. The replacement text is not a pattern, it's (almost) an ordinary string.
Instead, there is the e modifier to the s/// operator, which lets you write a replacement expression rather than replacement string.
$file =~ s(<textarea rows="(.+?)"(.*?)>(.*?)</textarea>)
("<textarea rows=\"" . (length($3)/80) . "\"$2>$3</textarea>")egis;
As per http://perldoc.perl.org/perlrequick.html#Search-and-replace, this can be accomplished with the "evaluation modifier s///e", e.g., you gis must have an extra e in it.
The evaluation modifier s///e wraps an eval{...} around the replacement string and the evaluated result is substituted for the matched substring. Some examples:
# convert percentage to decimal
$x = "A 39% hit rate";
$x =~ s!(\d+)%!$1/100!e; # $x contains "A 0.39 hit rate"