I'm looking for a perl code line that may contain regexps and comments
i need to capture everything until a comment. so i want all characters until # but I AM INTERESTED in capturing #
for example, if the line was:
if ($line=/\#/) { #captures lines with '#'
I want to capture:
if ($line=/\#/) {
Give this a try:
use PPI;
my $ppi = PPI::Document->new('source.pl');
my $source = '';
for my $token ( #{ $ppi->find("PPI::Token") } ) {
last if $token->isa("PPI::Token::Comment");
$source .= $token;
}
print $source;
This should handle pretty much everything except here-docs. If you need to deal with those, start by copying PPI::Document::serialize and modify it to stop on the first comment.
Try this
^(?:[^#]|(?<=\\)#)+
See it here on Regexr
This will match anything from the start of the string (^), that is not a # ([^#]) OR
a # that is preceeded by a backslash ((?<=\\)#)
Related
I am using Perl to do some prototyping.
I need an expression to replace e by [ee] if the string is exactly 2 chars and finishes by "e".
le -> l [ee]
me -> m [ee]
elle -> elle : no change
I cannot test the length of the string, I need one expression to do the whole job.
I tried:
`s/(?=^.{0,2}\z).*e\z%/[ee]/g` but this is replacing the whole string
`s/^[c|d|j|l|m|n|s|t]e$/[ee]/g` same result (I listed the possible letters that could precede my "e")
`^(?<=[c|d|j|l|m|n|s|t])e$/[ee]/g` but I have no match, not sure I can use ^ on a positive look behind
EDIT
Guys you're amazing, hours of search on the web and here I get answers minutes after I posted.
I tried all your solutions and they are working perfectly directly in my script, i.e. this one:
my $test2="le";
$test2=~ s/^(\S)e$/\1\[ee\]/g;
print "test2:".$test2."\n";
-> test2:l[ee]
But I am loading these regex from a text file (using Perl for proto, the idea is to reuse it with any language implementing regex):
In the text file I store for example (I used % to split the line between match and replace):
^(\S)e$% \1\[ee\]
and then I parse and apply all regex like that:
my $test="le";
while (my $row = <$fh>) {
chomp $row;
if( $row =~ /%/){
my #reg = split /%/, $row;
#if no replacement, put empty string
if($#reg == 0){
push(#reg,"");
}
print "reg found, reg:".$reg[0].", replace:".$reg[1]."\n";
push #regs, [ #reg ];
}
}
print "orgine:".$test."\n";
for my $i (0 .. $#regs){
my $p=$regs[$i][0];
my $r=$regs[$i][1];
$test=~ s/$p/$r/g;
}
print "final:".$test."\n";
This technique is working well with my other regex, but not yet when I have a $1 or \1 in the replace... here is what I am obtaining:
final:\1\ee\
PS: you answered to initial question, should I open another post ?
Something like s/(?i)^([a-z])e$/$1[ee]/
Why aren't you using a capture group to do the replacement?
`s/^([c|d|j|l|m|n|s|t])e$/\1 [ee]/g`
If those are the characters you need and if it is indeed one word to a line with no whitespace before it or after it, then this will work.
Here's another option depending on what you are looking for. It will match a two character string consisting of one a-z character followed by one 'e' on its own line with possible whitespace before or after. It will replace this will the single a-z character followed by ' [ee]'
`s/^\s*([a-z])e\s*$/\1 [ee]/`
^(\S)e$
Try this.Replace by $1 [ee].See demo.
https://regex101.com/r/hR7tH4/28
I'd do something like this
$word =~ s/^(\w{1})(e)$/$1$2e/;
You can use following regex which match 2 character and then you can replace it with $1\[$2$2\]:
^([a-zA-Z])([a-zA-Z])$
Demo :
$my_string =~ s/^([a-zA-Z])([a-zA-Z])$/$1[$2$2]/;
See demo https://regex101.com/r/iD9oN4/1
I am looking for a Perl Regex to extract all images from the following code:
slideshowimages("images/image1.jpg","images/image2.png","images/image3.jpg")
slideshowimages('images/image4.jpg','images/image5.png','images/image6.jpg')
The output would be an array with:
images/image1.jpg, images/image2.png, images/image3.jpg, images/image4.jpg,
images/image5.png, images/image6.jpg
The things to pay attention to are:
Image extension could be .jpg / .png
Image could be inside singe or double quotes
What i've tried so far:
while ($html =~ /["|'|=](.*\.jpg|gif|png|bmp|swf).*"/g) {
my $item = $1;
$item =~ s/\"|\'|=//g;
push (#images, $item);
}
But this doesn't really work well.
Thank you for your help.
my #images = $html =~ /["'] (.+?) ["']/gx; # /g for multiple hits on a line
A simple regex could be:
["'][^'"]*[.](jpg|png)["']
It would fail though on paths including ' or " like slideshowimages("images/image'1'.jpg"). The expression than can handle this case would be:
('.*?\.(jpg|png)')|(".*?\.(jpg|png)")
/(["']).*?\.(jpg|gif|png|bmp|swf)\1/
You need to capture what you're using, a single or a double quote, and refer back to that.
Also, you need to group your choices for the extension. Otherwise, you're saying 'anything ending with jpg OR just plain gif (, png, and so on)'.
Adjusting #mpapec's answer:
my #images = $html =~ /(["'])[^"']*?\.(jpg|gif|png|bmp|swf)\1/g;
(Disallowing " and ' within the filename is probably a good idea too).
my $html = <<EOF;
slideshowimages("images/image'1'.jpg","images/image2.png","images/image3.jpg")
slideshowimages('images/image4.jpg','images/image5.png','images/image6jpg')
EOF
my #images = ();
while ($html =~ s/(["'])(.+?\/.+?\.(?:jpg|png|gif))\1//) {
push #images, $2;
}
foreach my $image (#images) {
print "$image, ";
}
This is a bit more restrictive so it does not get "images/image6png"
Does not fail on "images/image'1'.jpg"
(["']) - either a single or double quote (any character between the brackets) and capture
( - begin capture
.+?/ - one or more characters up to first slash
.+?. - one or more characters up to first period
(?: - begin grouping without capture
jpg|png|gif - any of the strings
) - end grouping without capture
) - end capture
\1 - captured single or double quote
In a Perl/Tk code I found a conditional statement as below
if (s/^\+//)
{
#do something
}
elsif (/^-/)
{
#do another thing
}
Seems like some pattern matching has been done. But I cannot understand it. Can anyone help me understanding this pattern matching?
They are both regular expressions. You can read up on them at perlre and perlretut. You can play around with them on http://www.rubular.com.
They both implicitly do something with $_. There probably is a while or foreach around your lines of code without a loop variable. In that case, $_ becomes that loop variable. It might for instance contain the current line of a file that is being read.
If the current value of $_ contains a + (plus) sign as the first character at the beginning of the string, #do somehting.
Else if it contains a - (minus) sign, #do another thing.
In case 1. it also replaces that + sign with nothing (i.e. removes it). It does not remove the - in 2. however.
Let's look at an explanation with YAPE::Regex::Explain.
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/^\+/)->explain();
Here it is. Not really helpful in our case, but a nice tool nonetheless. Note that the (?-imsx and ) parts are the default things Perl implies. They are always there unless you change them.
The regular expression:
(?-imsx:^\+)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\+ '+'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
Update: As Mikko L in the comments pointed out, you should maybe refactor/change this piece of code. While it probably does what it is supposed to, I believe it would be a good idea to make it more readable. Whoever wrote it obviously didn't care about you as the later maintainer. I suggest you do. You could change it to:
# look at the content of $_ (current line?)
if ( s/^\+// )
{
# the line starts with a + sign,
# which we remove!
#do something
}
elsif ( m/^-/ )
{
# the line starts witha - sign
# we do NOT remove the - sign!
#do another thing
}
Those are regular expressions, used for pattern matching and substitution.
You should read up on the concept, but as for your question:
s/^\+//
If the string started with a plus, remove that plus (the "s" means "substitute"), and return true.
/^-/
True if the string starts with a minus.
This code is equivalent to
if ($_ =~ s/^\+//) { # s/// modifies $_ by default
#do something
}
elsif ($_ =~ m/^-/) { # m// searches $_ by default
#do another thing
}
s/// and m// are regexp quote-like operators. You can read about them in perlop.
The other answers have given a summary of how the code works, but not much of why. Here is a simple example of why one might use such logic.
#!/usr/bin/env perl
use strict;
use warnings;
my $args = {};
for ( #ARGV ) {
if ( s/^no// ) {
$args->{$_} = 0;
} else {
$args->{$_} = 1;
}
}
use Data::Dumper;
print Dumper $args;
When you call the script like
./test.pl hi nobye
you get
$VAR1 = {
'hi' => 1,
'bye' => 0
};
The key is the string, however if it is preceded by no then remove it (to get the key in question) and instead set the value to 0.
The OP's example is a little more involved, but follows the same logic.
if the key starts with a +, remove it and do something
if the key starts with a -, don't remove it and do something else
$match="";
for($i=0;$i<=$#wor;$i++)
{
$match=$match.$letter[$wor[$i]];
}
print $match;
open ABC,"<words.txt";
while(<ABC>)
{
if($_ =~ /^$match$/ )
{
print "$_";
print "\n";
}
}
In the following code, I am not able to match the line of the file i.e. $_ with the variable $match (which contains the actual metacharacters which are to be matched )?
And hence no output is produced
What changes are needed?
You need to remove the ^ and $ anchors from your regexp which match the beginning and end of a string.
With them, the regexp will only match lines which only contain the meta-characters.
You probably also want to wrap $match in [ .. ] characters, to indicate that it's a range of characters, and not a word.
For example, if you wanted to exclude any line containing _ or % your $match would need to contain [_$]
EDIT if, per the comments, you only want to match if the meta characters are found at either end, use:
if (/^${match}/ || /${match}$/) {
...
}
Perl (no modules loaded and -Tw & strict)
I found much info on regex and pattern matching here but not exactly what I need.
I want to know if this is the correct way to validate a couple things.
Sorry about the beginner attempt here. I am quite new to this.
my $this = "12345678";
if ($this != m/\b[0-9]{8}\b/x) { print "$this is bad"; }
my $that = "12345678.gif";
if ($that != m/\b[0-9]{8}\.gif\b/x) { print "$that is bad"; }
or
if ($that != m/\b[0-9]{8}\.(jpe?g|gif|png)\b/x) { print "$that is bad"; }
my ($ext) = $that =~ m/\.([^\.]+)$/x;
# verify extension
if ($ext != m/\.(jpe?g|png|gif)$/x ){ print "$ext is bad"; }
# for content type
if ($ext eq "jpg") {$ext = "jpeg";}
I use the /x because perl::critic indicated I needed it. Passes with /x so...
\d is not an option and should be avoided from what I have read here.
The file name "that" has to be 8 digits + an image type. The other set of digits "this" is actually a folder name. This provides a little error checking for an image serving script. The root htaccess sends calls to images in a specific folder to said script. I grab the directory and image name off via path info.
If you are running under taint checking, this isn't the way to do it. You need to match the pattern you need then remember that in a memory variable to clear the taint:
my $this = ...;
my $regex = qr/
^ # beginning of string
( # start of $1
[0-9]{8}
\.
(gif|jpg) # extension in $2
)
\z #end of string
/x;
my( $cleansed, $extension ) = do {
if( $this =~ m/$regex/ ) { ( $1, $2 ) }
else { die "Bad filename!" }
};
I'm not sure why you have a \b at the beginning of your regex. It probably doesn't do what you think it does. If you want the file name to be only the digits, use the ^ beginning of string anchor instead. That way, nothing can come before the digits. Likewise, the end of string anchor \z says that nothing can come after the extension.
If you then need to match an extension to a content-type for an HTTP response, which I'm guessing that your doing, you can use a hash to make the map:
my %types = (
jpg => jpeg,
gif => gif,
...
);
Now that you have the hash, you can use it as another level of validation:
unless( exists $types{$extension} ) { die "Unsupported type!" }
Most of what you have above looks fine. A few points:
if ($ext != m/pattern/) is wrong - the != operator should be !~
\d is just fine, if you're parsing filenames or anything else not likely to be unicode
avoid /x unless you really need it (you're not splitting the regexp up over multiple lines for readability). For that matter, avoid any of the flags unless you need them
(jpe?g|gif|png) can be modified to (?:jpe?g|gif|png) to disable capturing on that set of parentheses (the efficiency increase is nominal in most cases, but it occasionally can make a difference e.g. in a rapid loop, so I make it a habit to not capture unless I need to)
you don't need to escape . inside a character class - i.e. [^\.] can be [^.] (I believe the only character you need to escape is ] itself, but don't take this as gospel) :)
it's "Perl", not "PERL" :)
You need to use =~ and !~ instead of == and != for regex matching. Also after removing redundant code and optimizing, I would write it this way.
my $that = "12345678.gif";
if ($that =~ m/\b[0-9]{8}\.(jpe?g|gif|png)\b/x)
{
my $ext = $1;
if ($ext eq "jpg") {$ext = "jpeg";}
}
else
{
print "$that is bad";
}