Powershell, how many replacements did you make? - replace

I need to know how many replacements are made by Powershell when using either the -replace operator or Replace() method. Or, if that's not possible, if it made any replacements at all.
For example, in Perl, because the substitution operation returns the number of replacements made, and zero is false while non-zero is true in a boolean context, one can write:
$greeting = "Hello, Earthlings";
if ($greeting ~= s/Earthlings/Martians/) { print "Mars greeting ready." }
However with Powershell the operator and method return the new string. It appears that the operator provides some additional information, if one knows how to ask for it (e.g., captured groups are stored in a new variable it creates in the current scope), but I can't find out how to get a count or success value.
I could just compare the before and after values, but that seems entirely inefficient.

You're right, I don't think you can squeeze anything extra out of -replace. However, you can find the number of matches using Regex.Matches(). For example
> $greeting = "Hello, Earthlings"
> $needle = "l"
> $([regex]::matches($greeting, $needle)).Length # cast explicitly to an array
3
You can then use the -replace operator which uses the same matching engine.
After looking a little deeper, there's an overload of Replace which takes a MatchEvaluator delegate which is called each time a match is made. So, if we use that as an accumulator, it can count the number of replacements in one go.
> $count = 0
> $matchEvaluator = [System.Text.RegularExpressions.MatchEvaluator]{$count ++}
> [regex]::Replace("Hello, Earthlings","l",$matchEvaluator)
> $count
Heo, Earthings
3

Here a complete functional example which preserves the replacement behavior and count the number of matches
$Script:Count = 0
$Result = [regex]::Replace($InputText, $Regex, [System.Text.RegularExpressions.MatchEvaluator] {
param($Match)
$Script:Count++
return $Match.Result($Replacement)
})

None of the above answers are actually do replacement and working in recent PS versions:
James Kolpack - show how to count a removed regex (not replaced);
Kino101 - incomplete answer, variables not defined;
Annarfych - outdated answer, in recent PS version the evaluator count variable need to be global
Here is how you can do a replace and count it:
$String = "Hello World"
$Regex = "l|o" #search for 'l' or 'o'
$ReplaceWith = "?"
$Count = 0
$Result = [regex]::Replace($String, $Regex, { param($found); $Global:Count++; return $found.Result($ReplaceWith) })
$Result
$Count
Result in Powershell 5.1:
He??? W?r?d
5

Version of the script that actually does replace things and not null them:
$greeting = "Hello, earthlings. Mars greeting ready"
$counter = 0
$search = '\s'
$replace = ''
$evaluator = [System.Text.RegularExpressions.MatchEvaluator] {
param($found)
$counter++
Write-Output ([regex]::Replace($found, [regex] $search, $replace))
}
[regex]::Replace($greeting, [regex] $search, $evaluator);
$counter
->
> Hello,earthlings.Marsgreetingready
> 4

Related

Jenkinsfile/Groovy: how to use variables in regex pattern find-counts?

In the following declarative syntax pipeline:
pipeline {
agent any
stages {
stage( "1" ) {
steps {
script {
orig = "/path/to/file"
two_lev_down = (orig =~ /^(?:\/[^\/]*){2}(.*)/)[0][1]
echo "${two_lev_down}"
depth = 2
two_lev_down = (orig =~ /^(?:\/[^\/]*){depth}(.*)/)[0][1]
echo "${two_lev_down}"
}
}
}
}
}
...the regex is meant to match everything after the third instance of "/".
The first, i.e. (orig =~ /^(?:\/[^\/]*){2}(.*)/)[0][1] works.
But the second, (orig =~ /^(?:\/[^\/]*){depth}(.*)/)[0][1] does not. It generates this error:
java.util.regex.PatternSyntaxException: Illegal repetition near index 10
^(?:/[^/]*){depth}(.*)
I assume the problem is the use of the variable depth instead of a hardcoded integer, since that's the only difference between the working code and error-generating code.
How can I use a Groovy variable in a regex pattern find-count? Or what is the Groovy-language idiomatic way to write a regex that returns everything after the nth occurrence of a pattern?
You are missing the $ in front of your variable. It should be:
orig = "/path/to/file"
depth = 2
two_lev_down = (orig =~ /^(?:\/[^\/]*){$depth}(.*)/)[0][1]
assert '/file' == two_lev_down
Why?
In Groovy the String-interpolation (over GString) works for 3 String literals:
usual double quotes: "Hello $world, my name is ${name.toUpperCase()}"
Slashy-strings used usually as regexp-literals: /.{$depth}/
Multi-line double-quoted Strings:
def email = """
Dear ${user}.
Thank your for blablah.
"""

Unanchored substring searching: index vs regex?

I am writing some Perl scripts where I need to do a lot of string matching.
For example:
my $str1 = "this is a test string";
my $str2 = "test";
To see if $str1 contains $str2 - I found that there are 2 approaches:
Approach 1:
use Index function:
if ( index($str1, $str2) != -1 ) { .... }
Approach 2:
use regular expression:
if( $str1 =~ /$str2/ ) { .... }
Which is better? and when should we use each of these over the other?
Here is the result of Benchmark:
use Benchmark qw(:all) ;
my $count = -1;
my $str1 = "this is a test string";
my $str2 = "test";
my $str3 = qr/test/;
cmpthese($count, {
'type1' => sub { if ( index($str1, $str2) != -1 ) { 1 } },
'type2' => sub { if( $str1 =~ $str3 ) { 1 } },
});
Result (when a match happens):
Rate type2 type1
type2 1747627/s -- -70%
type1 5770465/s 230% --
To be able to draw a conclusion, test not to match:
my $str2 = "text";
my $str3 = qr/text/;
Result (when a match does not happen):
Rate type2 type1
type2 1857295/s -- -67%
type1 5560630/s 199% --
Conclusion:
The index function is much faster than the regexp match.
When I see code that uses index, I usually see an index within an index within an index, etc. There's also more branching too: "if found, look for this; otherwise since not found, look for that." Almost always a single regex would have worked. So, for me, I almost always use a regex unless there's some specific reason I want to use an index.
Unfortunately, most programmers I run into don't read regex well and so for maintainability, the index method should be used more than I do.
If you need a substring match, use index. If you need a regexp match (with special meaning for regexp metacharacters), use =~. A substring match is usually faster, but regexps in Perl are quite well optimized, and simple regexp matches can be surprisingly fast. Benchmark it for yourself.
Since Perl 5.6, Perl is smart enough to recompile the regexp in $str =~ /$str2/ iff $str2 has changed since the last compilation. To fully control when your regexp is compiled, use qr/$str2/. See Does the 'o' modifier for Perl regular expressions still provide any benefit? for q/.../o (obsolete) and qr/.../ (not needed most of the time, but can be useful).

Passing a parameter to a regular expression to match the first letter in a word in perl

So here is what I'm doing. This is for homework, and I know I can't come on here and get you guys to do my homework for me but I'm stuck. We have to use perl (First time ever using it so forgive my stupidity) to make a function $starts_with that takes a parameter $str0 and $prefix. if $str0 starts with $prefix. then the function returns true. if it doesn't then it isn't pretty simple. We have to use regular expressions because that is the whole point of the exercise so here is my code
sub starts_with
{
$str0 = $_[0];
$prefix = $_[1];
if($prefix =~ /^($str0)/)
{
print $str0."\n";
print m/^(prefix)/."\n";
$startsWith = "Y"
}
if ($startsWith eq "Y")
{
print $str0." starts with ".$prefix."\n";
}
else
{
print $str0." does not start with ".$prefix."\n";
}
}
I'm almost ashamed to put this up here because I have no Idea what I'm doing yet. But I am trying to learn. I don't know how to do true false in perl thats why I have the $startsWith variable. you can fix that if you want. the part I need to fix is the line
if(str0 =~ /^($prefix)/)
I also need to find out how to refer to the first letter in str0...I think
A couple points without giving away the answer:
1) Arguments to functions are passed in a special variable called #_, which is what you are accessing when you say $_[0] and $_[1], but can be written much more concisely by assigned the argument list (#_) to your variables in list context
sub starts_with {
my ($str0, $prefix) = #_;
...
}
2) This statement: if($prefix =~ /^($str0)/) tests the exact opposite condition you are trying to prove. It says does the prefix start with the value of the variable $str0. What you really want to test is if $str0 starts with $prefix.
It might also be using to prefix your pattern with m flag, m/PATTERN which means match this pattern.
3) You don't have a return statement in your function, (As #M42 points out) the result of the last expression is returned; that expression being print will return true. You probably want to return true or false explicity.
See if you can use this to get started.
What I would do :
use Modern::Perl; # or use strict; use warnings; use feature qw/say/;
sub starts_with {
# better use #_, the default array instead of just elements of them
# ...like $_[0]
my ($str, $pref) = #_;
# very short expression, the pattern matching return a boolean.
# \Q\E is there to treat the prefix as-is (no metacharacters)
return $str =~ /^\Q$pref\E/;
}
# using our function
if (starts_with("foobar", "f")) {
say "TRUE";
}
else {
say "FALSE";
}
Golfing it a bit...
sub starts_with { $_[0] =~ /^\Q$_[1]/ }
Don't hand that version in though :-)

regular expression help: catch this: |TrxId=475665|

For example I have a string:
MsgNam=WMS.WEATXT|VersionsNr=0|TrxId=475665|MndNr=0257|Werk=0000|WeaNr=0171581054|WepNr=|WeaTxtTyp=110|SpraNam=ru|WeaTxtNr=2|WeaTxtTxt=100 111|
and I want to catch this: |TrxId=475665|
after TrxId= it could be any numbers and any amount of them, so regex should catch as well:
|TrxId=111333| and |TrxId=0000011112222| and |TrxId=123|
TrxId=(\d+)
That would give a group (1) with the TrxId.
PS: Use global modifier.
The regex should look somewhat like this:
TrxId=[0-9]+
It will match TrxId= followed by at least one digit.
An example solution in Python:
In [107]: data = 'MsgNam=WMS.WEATXT|VersionsNr=0|TrxId=475665|MndNr=0257|Werk=0000|WeaNr=0171581054|WepNr=|WeaTxtTyp=110|SpraNam=ru|WeaTxtNr=2|WeaTxtTxt=100 111|'
In [108]: m = re.search(r'\|TrxId=(\d+)\|', data)
In [109]: m.group(0)
Out[109]: '|TrxId=475665|'
In [110]: m.group(1)
Out[110]: '475665'
/MsgNam\=.*?\|(TrxId\=\d+)\|.*/
for example in perl:
$a = "MsgNam=WMS.WEATXT|VersionsNr=0|TrxId=475665|MndNr=0257|Werk=0000|WeaNr=0171581054|WepNr=|WeaTxtTyp=110|SpraNam=ru|WeaTxtNr=2|WeaTxtTxt=100111|";
$a =~ /MsgNam\=.*?\|(TrxId\=\d+)\|.*/;
print $1;
will print TrxId=475665
You know what your delimiters look like, so you don't need a regex, you need to split. Here's an implementation in Perl.
use strict;
use warnings;
my $input = "MsgNam=WMS.WEATXT|VersionsNr=0|TrxId=475665|MndNr=0257|Werk=0000|WeaNr=0171581054|WepNr=|WeaTxtTyp=110|SpraNam=ru|WeaTxtNr=2|WeaTxtTxt=100 111|";
my #first_array = split(/\|/,$input); #splitting $input on "|"
#Now, since the last character of $input is "|", the last element
#of this array is undef (ie the Perl equivalent of null)
#So, filter that out.
#first_array = grep{defined}#first_array;
#Also filter out elements that do not have an equals sign appearing.
#first_array = grep{/=/}#first_array;
#Now, put these elements into an associative array:
my %assoc_array;
foreach(#first_array)
{
if(/^([^=]+)=(.+)$/)
{
$assoc_array{$1} = $2;
}
else
{
#Something weird may be happening...
#we may have an element starting with "=" for example.
#Do what you want: throw a warning, die, silently move on, etc.
}
}
if(exists $assoc_array{TrxId})
{
print "|TrxId=" . $assoc_array{TrxId} . "|\n";
}
else
{
print "Sorry, TrxId not found!\n";
}
The code above yields the expected output:
|TrxId=475665|
Now, obviously this is more complex than some of the other answers, but it's also a bit more robust in that it allows you to search for more keys as well.
This approach does have a potential issue if your keys appear more than once. In that case, it's easy enough to modify the code above to collect an array reference of values for each key.

In Perl, how many groups are in the matched regex?

I would like to tell the difference between a number 1 and string '1'.
The reason that I want to do this is because I want to determine the number of capturing parentheses in a regular expression after a successful match. According the perlop doc, a list (1) is returned when there are no capturing groups in the pattern. So if I get a successful match and a list (1) then I cannot tell if the pattern has no parens or it has one paren and it matched a '1'. I can resolve that ambiguity if there is a difference between number 1 and string '1'.
You can tell how many capturing groups are in the last successful match by using the special #+ array. $#+ is the number of capturing groups. If that's 0, then there were no capturing parentheses.
For example, bitwise operators behave differently for strings and integers:
~1 = 18446744073709551614
~'1' = Î ('1' = 0x31, ~'1' = ~0x31 = 0xce = 'Î')
#!/usr/bin/perl
($b) = ('1' =~ /(1)/);
print isstring($b) ? "string\n" : "int\n";
($b) = ('1' =~ /1/);
print isstring($b) ? "string\n" : "int\n";
sub isstring() {
return ($_[0] & ~$_[0]);
}
isstring returns either 0 (as a result of numeric bitwise op) which is false, or "\0" (as a result of bitwise string ops, set perldoc perlop) which is true as it is a non-empty string.
If you want to know the number of capture groups a regex matched, just count them. Don't look at the values they return, which appears to be your problem:
You can get the count by looking at the result of the list assignment, which returns the number of items on the right hand side of the list assignment:
my $count = my #array = $string =~ m/.../g;
If you don't need to keep the capture buffers, assign to an empty list:
my $count = () = $string =~ m/.../g;
Or do it in two steps:
my #array = $string =~ m/.../g;
my $count = #array;
You can also use the #+ or #- variables, using some of the tricks I show in the first pages of Mastering Perl. These arrays have the starting and ending positions of each of the capture buffers. The values in index 0 apply to the entire pattern, the values in index 1 are for $1, and so on. The last index, then, is the total number of capture buffers. See perlvar.
Perl converts between strings and numbers automatically as needed. Internally, it tracks the values separately. You can use Devel::Peek to see this in action:
use Devel::Peek;
$x = 1;
$y = '1';
Dump($x);
Dump($y);
The output is:
SV = IV(0x3073f40) at 0x3073f44
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 1
SV = PV(0x30698cc) at 0x3073484
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x3079bb4 "1"\0
CUR = 1
LEN = 4
Note that the dump of $x has a value for the IV slot, while the dump of $y doesn't but does have a value in the PV slot. Also note that simply using the values in a different context can trigger stringification or nummification and populate the other slots. e.g. if you did $x . '' or $y + 0 before peeking at the value, you'd get this:
SV = PVIV(0x2b30b74) at 0x3073f44
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 1
PV = 0x3079c5c "1"\0
CUR = 1
LEN = 4
At which point 1 and '1' are no longer distinguishable at all.
Check for the definedness of $1 after a successful match. The logic goes like this:
If the list is empty then the pattern match failed
Else if $1 is defined then the list contains all the catpured substrings
Else the match was successful, but there were no captures
Your question doesn't make a lot of sense, but it appears you want to know the difference between:
$a = "foo";
#f = $a =~ /foo/;
and
$a = "foo1";
#f = $a =~ /foo(1)?/;
Since they both return the same thing regardless if a capture was made.
The answer is: Don't try and use the returned array. Check to see if $1 is not equal to ""