regular expression preg replace omit a character - regex

I use this function to replace relative links with absolutes and make them as parameters for the page to stream it with file_get_contents. there is a problem i think in my regular expression that omits a character
its the function
$pattern = "/<a([^>]*) " .
"href=\"[^http|ftp|https|mailto]([^\"]*)\"/";
$replace = "<a\${1} href=\"?u=" . $base . "\${2}\"";
$text = preg_replace($pattern, $replace, $text);
$pattern = "/<a([^>]*) " .
"href='[^http|ftp|https|mailto]([^\']*)'/";
$replace = "<a\${1} href=\"?u=" . $base . "\${2}\"";
$text = preg_replace($pattern, $replace, $text);
$pattern = "/<img([^>]*) " .
"src=\"[^http|ftp|https]([^\"]*)\"/";
$replace = "<img\${1} src=\"" . $base . "\${2}\"";
$text = preg_replace($pattern, $replace, $text);
$pattern = "/<a([^>]*) " .
"href=\"([^\"]*)\"/";
$replace = "<a\${1} href=\"?u=" . "\${2}\"";
$text = preg_replace($pattern, $replace, $text);
so
"UsersList.aspx?dir=09"
with this $base url":
http://www.some-url.com/Members/
should be replaced to
"?u=http://www.some-url.com/Members/UsersList.aspx?dir=09"
but i get
"?u=http://www.some-url.com/Members/sersList.aspx?dir=09"
i dont know whats the problem in my regular expression and how to fix it

Guess your a tag is like
and it will not work with this pattern for your desired result.
$pattern = "/<a([^>]*) " . "href=\"[^http|ftp|https|mailto]([^\"]*)\"/";
in that
[^http|ftp|https|mailto] -- this expression matches only one character, means 'U' will be missing
try removing that like
$pattern = "/<a([^>]*) " . "href=\"([^\"]*)\"/";

Related

Perl string replace not working with $1 and $2

Search and replace is not working when I use $1 and $2 defined earlier.
It works when I store it in a new variable.
Does not work as intended.
perl -e'
my $name = "start middle end";
my $rep = "";
my $orig = "";
if ($name =~ /sta(.*?)\s\w+\s(.*)/) {
$orig = $1;
$rep = $2;
$name =~ s/$1/$2/;
print "$name\n";
}
'
sta middle end
Is it because $1 and $2 are getting replaced in the new $name =~ I am doing?
Works as intended.
perl -e'
my $name = "start middle end";
my $rep = "";
my $orig = "";
if ($name =~ /sta(.*?)\s\w+\s(.*)/) {
$orig = $1;
$rep = $2;
$name =~ s/$orig/${rep}/;
print "$name\n";
}
'
staend middle end
Is there a better one liner to do this? I do not want to define new variables.
The capture variables are reset by running the match in the first part of the s/// operator, for the replacement to use. The m// operator in list context will return the captured values so you can easily assign them there. Also you may want to use \Q (quotemeta) if your search string is not a regex.
perl -e'
my $name = "start middle end";
if (my ($orig, $rep) = $name =~ /sta(.*?)\s\w+\s(.*)/) {
$name =~ s/\Q$orig/$rep/;
print "$name\n";
}
'
sta middle end
Yes, the new successful regex match replaces $1 and $2.
You could avoid the global vars entirely as follows:
perl -e'
my $name = "start middle end";
if ( my ($orig, $rep) = $name =~ /sta(.*?)\s\w+\s(.*)/ ) {
$name =~ s/\Q$orig/$rep/;
CORE::say $name;
}
'
Better yet, you could avoid doing two matches as follows:
perl -e'
my $name = "start middle end";
if ( $name =~ s/sta\K.*?(?=\s\w+\s(.*))/$1/ ) {
CORE::say $name;
}
'
However, I'd use the following:
perl -e'
my $name = "start middle end";
if ( (my ($prefix, $suffix, $foo) = $name =~ /^(.*?sta).*?(\s\w+\s(.*))/ ) {
CORE::say "$prefix$foo$suffix";
}
'
Note that your code suffered from a code injection bug which I fixed using quotemeta (as \Q).
Here, just in case, we'd have had unexpected extra spaces, we could also try this expression:
(sta)([a-z]*)\s+(\w+)\s+(.+)
It's just another option.
TEST
perl -e'
my $name = "start middle end";
$name =~ s/(sta)([a-z]*)\s+(\w+)\s+(.+)/$1$4 $3 $4/;
print "$name\n";
'
OUTPUT
staend middle end
Please see the demo here
$2 in the replacement part refers to the capture group from the pattern part of the same substitution. Therefore, you only need one variable to remember $2.
perl -lwe '$_ = "start middle end" ; if (/sta(.*?)\s\w+\s(.*)/) {my $rep = $2; s/$1/$rep/; print}'
staend middle end
You can avoid other variables by using the last match start and end global arrays #- and #+ and just doing a substring replace:
my $name = "start middle end";
if ($name =~ /sta(.*?)\s\w+\s(.*)/) {
substr($name, $-[1], $+[1]-$-[1], $2);
print "$name\n";
}
See the entry for #- in perldoc perlvar
The regex capture variables exhibit strange behavior depending on code
flow, function calls and other stuff.
To fully explain and wrap a head around this requires a few pages
of explanation.
As for now, avoid the whole mess and just use a single regex
perl -e'
my $name = "start middle end";
$name =~ s/^(sta)(.*?)(\s\w+\s)(.*)/$1$4$3$4/;
print "$name\n";
'

Replace lines in a multi-line string

I have an Oracle WebLogic config.xml file read into a string. I'm looking to update a series of lines in it. I've verified that I'm reading the file, getting the lines set, and able to update the correct line with the parameters I'm looking for, but I can't seem to update the original string.
Here's the main loop:
while ( $lines =~ m{(<arguments>.*?</arguments>)}mgs ) {
my $nchunk = my $ochunk = $1;
print "#" . '=' x 70 . "\n";
my ($ms) = $ochunk =~ m{.*/(.*?)\.out.*};
my $nname = $monster->{$domain}->{$ms}->{nodeName};
my $tname = $monster->{$domain}->{$ms}->{tierName};
my $newentry = sprintf(" %s %s.nodeName=-Dappdynamics.agent.nodeName=%s",
$appdjar, $ms, $nname);
$newentry .= " $ms.appdynamics.tierName=-Dappdynamics.tierName=$tname";
$nchunk =~ s/(<\/arguments>)/$newentry\1/g;
print "$ochunk\n";
print "#" . '-' x 70 . "\n";
print "$nchunk\n";
# $lines =~ s!$ochunk!!msg;
# $lines =~ s!$ochunk!$nchunk!msg;
}
As written, that results in:
#======================================================================
<arguments>-Xms512m -Xmx512m -Dweblogic.system.BootIdentityFile=/opt/app/oracle/user_projects/domains/AccountingServices_Domain/boot.properties -Dweblogic.Stdout=/opt/app/oracle/user_projects/logs/AccountingServices_Domain/AccountingCommon_MS1.out -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/app/oracle/user_projects/logs/AccountingServices_Domain/dumps -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -Dcom.sun.management.jmxremote.port=40124 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:FlightRecorderOptions=defaultrecording=true,disk=true,repository=/opt/app/oracle/user_projects/logs/AccountingServices_Domain,maxage=10m,dumponexit=true,dumponexitpath=/opt/app/oracle/user_projects/logs/AccountingServices_Domain -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -Dlog4j.configuration=file:/opt/app/oracle/user_projects/applications/AccountingServices_Domain/log4j.xml</arguments>
#----------------------------------------------------------------------
<arguments>-Xms512m -Xmx512m -Dweblogic.system.BootIdentityFile=/opt/app/oracle/user_projects/domains/AccountingServices_Domain/boot.properties -Dweblogic.Stdout=/opt/app/oracle/user_projects/logs/AccountingServices_Domain/AccountingCommon_MS1.out -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/app/oracle/user_projects/logs/AccountingServices_Domain/dumps -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -Dcom.sun.management.jmxremote.port=40124 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:FlightRecorderOptions=defaultrecording=true,disk=true,repository=/opt/app/oracle/user_projects/logs/AccountingServices_Domain,maxage=10m,dumponexit=true,dumponexitpath=/opt/app/oracle/user_projects/logs/AccountingServices_Domain -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -Dlog4j.configuration=file:/opt/app/oracle/user_projects/applications/AccountingServices_Domain/log4j.xml -javaagent:/opt/app/appdynamics/universal-agent/monitor/java/javaagent.jar AccountingCommon_MS1.nodeName=-Dappdynamics.agent.nodeName=AccountingCommon_2123 AccountingCommon_MS1.appdynamics.tierName=-Dappdynamics.tierName=AccountingCommon</arguments>
[[snip]]
I can't seem to 're-find' the source chunk as indicated by one of those commented $lines trying to replace $ochunk with nothing.
You're going about this in a very round-about way, which is why I couldn't fathom what you were trying to do for the longest time. What you're actually trying to do is
Insert an additional string after the existing text in an arguments element
And you need just a substitution. I've left it global in case there really are multiple such elements in the XML. I've not been able to test it, but I do know that it compiles
$lines =~ s{ (<arguments>) (.*?) (</arguments>) }{
my ($otag, $text, $ctag) = ($1, $2, $3);
my ($ms) = $text =~ m{.*/(.*?)\.out};
my $msdata = $monster->{$domain}{$ms};
my $node = $msdata->{nodeName};
my $tier = $msdata->{tierName};
my $newentry = " $appdjar $ms.nodeName=-Dappdynamics.agent.nodeName=$node";
$newentry .= " $ms.appdynamics.tierName=-Dappdynamics.tierName=$tier";
$otag . $text . $newentry . $ctag;
}segx

If an regex expression is in a string , how can I exec the regex?

Like this :
$pattern = " /(\d{4})\s*-\s*(\d{2})\s*-\s*(\d{2})\s*(\d{2}):(\d{2}):(\d{2})\.(\d+)/" ;
$_ = "1972-01-01 00:00:00.0" ;
How can I execute the regex in the $Pattern by using Perl ?
You have to use the match operator.
$_ =~ /$pattern/

PowerShell Replace number in string

I'm not such a regex expert and frankly I'm trying to avoid it whenever I can.
I would like to create a new $String where the number within the string is updated with +1. This number can be one or two digits and will always be between 2 brackets.
From:
$String = "\\Server\c$\share_1\share2_\Taget2[1] - 2014-07-29.log"
To:
$String = "\\Server\c$\share_1\share2_\Taget2[2] - 2014-07-29.log"
Thank you for your help.
If you want to avoid the regex:
$String = "\\Server\c$\share_1\share2_\Taget2[1] - 2014-07-29.log"
$parts = $string.Split('[]')
$Newstring = '{0}[{1}]{2}' -f $parts[0],(1 + $parts[1]),$parts[2]
$Newstring
\\Server\c$\share_1\share2_\Taget2[2] - 2014-07-29.log
Another option is using the Replace() method of the Regex class with a scriptblock (code taken from this answer by Roman Kuzmin):
$callback = {
$v = [int]$args[0].Groups[1].Value
$args[0] -replace $v,++$v
}
$filename = "\\Server\c$\share_1\share2_\Taget2[1] - 2014-07-29.log"
$re = [Regex]"\[(\d+)\]"
$re.Replace($filename, $callback)
Existing files could be handled like this:
...
$re = [Regex]"\[(\d+)\]"
while (Test-Path -LiteralPath $filename) {
$filename = $re.Replace($filename, $callback)
}
Note that you must use Test-Path with the parameter -LiteralPath here, because your filename contains square brackets, which would otherwise be interpreted as wildcard characters.
With regex:
$s = "\\Server\c$\share_1\share2_\Taget2[1] - 2014-07-29.log"
$s -replace "(?<=\[)(\d+)","bla"
Result:
\\Server\c$\share_1\share2_\Taget2[bla] - 2014-07-29.log
So you can do something like this:
$s = "\\Server\c$\share_1\share2_\Taget2[1] - 2014-07-29.log"
$s -match "(?<=\[)(\d+)" | Out-Null ## find regex matches
$newNumber = [int]$matches[0] + 1
$s -replace "(?<=\[)(\d+)",$newNumber
Result:
\\Server\c$\share_1\share2_\Taget2[2] - 2014-07-29.log

How to replace a variable with another variable in PERL?

I am trying to replace all words from a text except some that I have in an array. Here's my code:
my $text = "This is a text!And that's some-more text,text!";
while ($text =~ m/([\w']+)/g) {
next if $1 ~~ #ignore_words;
my $search = $1;
my $replace = uc $search;
$text =~ s/$search/$replace/e;
}
However, the program doesn't work. Basically I am trying to make all words uppercase but skip the ones in #ignore_words. I know it's a problem with the variables being used in the regular expression, but I can't figure the problem out.
#!/usr/bin/perl
my $text = "This is a text!And that's some-more text,text!";
my #ignorearr=qw(is some);
my %h1=map{$_ => 1}#ignorearr;
$text=~s/([\w']+)/($h1{$1})?$1:uc($1)/ge;
print $text;
On running this,
THIS is A TEXT!AND THAT'S some-MORE TEXT,TEXT!
You can figure the problem out of your code if instead of applying an expression to the same control variable of a while loop, just let s/../../eg do it globally for you:
my $text = "This is a text!And that's some-more text,text!";
my #ignore_words = qw{ is more };
$text =~ s/([\w']+)/$1 ~~ #ignore_words ? $1 : uc($1)/eg;
print $text;
And on running:
THIS is A TEXT!AND THAT'S SOME-more TEXT,TEXT!