Match true or false between strings - regex

I'm trying to create a regex that works with the -match operator. The following already works:
# Exact match
$string1 = [Regex]::Escape('C:\Fruit\kiwi')
$regex = [regex] '^{0}$' -f $string1
'C:\Fruit\kiwi' -match $regex
# Match where trail is allowed ( -like 'c:\folder*')
$string1 = [Regex]::Escape('C:\Fruit\kiwi')
$regex = [regex] '^{0}' -f $string1
'C:\Fruit\kiwi\other folder' -match $regex
Now we're trying to have a match when there is something between two strings, but this fails:
# Match in between strings
$string1 = [Regex]::Escape("C:\Fruit")
$string2 = [Regex]::Escape("\kiwi")
$regex = [regex] '(?is)(?<=\b{0}\b)+?(?=\b{1}\b)' -f $string1, $string2
'C:\Fruit\d\kiwi' -match $regex
According to the docs it says:
'*' matches 0 or more times
'+' matches 1 or more times
'?' matches 1 or 0 times
'*?' matches 0 or more times, but as few as possible
'+?' matches 1 or more times, but as few as possible
So I was expecting that anything between C:\Fruit and \kiwi would result in true but this is not the case. What are we doing wrong? We're only after true false, because in the end we will glue these pieces together like regex1|regex2|...

You may fix your current code by using
$regex = [regex] "(?is)(?<=$string1).+?(?=$string2)"
Here,
.+? is used to match any 1+ chars as few as possible, you need to quantify a consuming pattern, not a lookbehind
Use double quotation marks to form the string literal in order to support string interpolation. Also, see more options at How to use a variable as part of a regular expression in PowerShell?
You should remove \b word boundaries as they are context dependent (you may add any further restrictions later if need be).

Related

Need a regex where two different substrings must not be included in a string

I have the following strings:
$string = #(
'Get-WindowsDevel'
'Put-WindowsDevel'
'Get-LinuxDevel'
'Put-LinuxDevel'
)
Now I need one regex with the following two rules:
$string must not start with "Get-"
$string must not contain "Linux"
This exclude the "Get-" at the beginning:
PS C:\> $string | Where-Object { $_ -match "^(?!Get-).*" }
Put-WindowsDevel Put-LinuxDevel
I would expect that the following command does not match "Put-LinuxDevel" but it does:
PS C:\> $string | Where-Object { $_ -match "^(?!Get-).*(?!Linux)" }
Put-WindowsDevel Put-LinuxDevel
So, what I need is a regex that is valid for this string only:
Put-WindowsDevel
Use -notmatch (or, if case-sensitive matching is needed, -cnotmatch) - i.e., a negated match - in combination with alternation (|):
PS> $string -notmatch '(^Get-|Linux)'
Put-WindowsDevel
The -match operator and its variations (as well as many other operators) can directly act on arrays as the LHS, in which case the operator acts as a filter and returns only matching elements.
Using -match on an array is much faster than using the Where-Object cmdlet in a pipeline for filtering.
Regex (^Get-|Linux) matches either Get- at the start of the string (^) or (|) substring Linux anywhere in the string.
Therefore, this regex matches strings that you don't want, and by using the negated form of -match - -notmatch - you therefore exclude those strings, as desired.
If you really want to express your regex as a positive match:
PS> $string -match '^(?!Get)((?!Linux).)*$'
Put-WindowsDevel
Note, however, that not only is this regex much more complex, it will also perform worse (albeit only slightly).
As for what you tried:
The .*(?!Linux) part of your regex - involving a negative lookahead assertion ((?!...)) - is not effective at excluding strings that contain substring Linux; e.g.:
PS> 'Linux' -match '.*(?!Linux)'
True # !!
The reason is that .* matches the entire string and then looks ahead to see if Linux isn't there - which is obviously true at the end of the string.
To effectively rule out a substring, the assertion must be applied around each character of the entire string:
PS> '', 'inux', 'Linux', 'a Linux', 'aLinuxb' -match '^((?!Linux).)*$'
# '' (empty string) matched
inux # 'inux' matched
Note how 'Linux', 'a Linux', and 'aLinuxb' were correctly excluded.
this seems to do what you seek [grin] ...
$StringList = #(
'Get-WindowsDevel'
'Put-WindowsDevel'
'Get-LinuxDevel'
'Put-LinuxDevel'
)
$ExcludeList = #(
'^get'
'linux'
)
$RegexExcludeList = $ExcludeList -join '|'
$StringList -notmatch $RegexExcludeList
output ...
Put-WindowsDevel

Replace text after special character

I have string which should to be change from numbers to text in my case variable is:
$string = '18.3.0-31290741.41742-1'
I want to replace everything after '-' to be "-SNAPSHOT" and when perform echo $string to show information below. I tried with LastIndexOf(), Trim() and other things but seems not able to manage how to do it.
Expected result:
PS> echo $string
18.3.0-SNAPSHOT
Maybe that can be the light of the correct way, but when have two '-' is going to replace the last one not the first which can see:
$string = "18.3.0-31290741.41742-1" -replace '(.*)-(.*)', '$1-SNAPSHOT'
.* is a greedy match, meaning it will produce the longest matching (sub)string. In your case that would be everything up to the last hyphen. You need either a non-greedy match (.*?) or a pattern that won't match hyphens (^[^-]*).
Demonstration:
PS C:\> '18.3.0-31290741.41742-1' -replace '(^.*?)-.*', '$1-SNAPSHOT'
18.3.0-SNAPSHOT
PS C:\> '18.3.0-31290741.41742-1' -replace '(^[^-]*)-.*', '$1-SNAPSHOT'
18.3.0-SNAPSHOT
By using a positive lookbehind assertion ((?<=...)) you could eliminate the need for a capturing group and backreference:
PS C:\> "18.3.0-31290741.41742-1" -replace '(?<=^.*?-).*', 'SNAPSHOT'
18.3.0-SNAPSHOT
You could use Select-String and an regular expression to match the pattern, then pass the match to ForEach-Object (commonly shorthanded with alias %) to construct the final string:
$string = "18.3.0-31290741.41742-1" | Select-String -pattern ".*-.*-" | %{ "$($_.Matches.value)SNAPSHOT" }
$string

How to extract sub-pattern from regex match

when I run the below regex match command, either:
'abc123' -match '(\d+)|(\w+)|(abc123)|(25)'
or
[regex]::matches('abc123', '(\d+)|(\w+)|(abc123)|(25)')
is there a way for me to extract the matching sub-pattern? In this case it would be the third capture block: 'abc123'
You can't get the exact regex part that matched your string as far as I'm aware, if you use a smart constructor for the Regex you can easily automate it though.
$ToMatch = 'abc123FOO'
$PossibleMatches = #('\d+','\w+','abc123.+','25')
$JoinOn = ')|('
$Regex = "($($PossibleMatches -join $JoinOn))"
$CaughtGroup = [Regex]::Matches($ToMatch,$Regex).Groups | ? {$_.Success -and $_.Name -ne '0'}
$CaughtIndex = [int]$CaughtGroup.Name
$CaughtMatch = $PossibleMatches[$CaughtIndex]
"Matched Group $($CaughtIndex) '$($CaughtMatch)'"
will give you
Matched Group 2 'abc123.+'
if this isn't ok for you (i.e. you have wildly varied regex etc.) you might want to do break up the program flow and try match it against an array of possible ones first?

Regex greedyness REasking

I have this text $line = "config.txt.1", and I want to match it with regex and extract the number
part of it. I am using two versions:
$line = "config.txt.1";
(my $result) = $line =~ /(\d*).*/; #ver 1, matched, but returns nothing
(my $result) = $line =~ /(\d).*/; #ver 2, matched, returns 1
(my $result) = $line =~ /(\d+).*/; #ver 3, matched, returns 1
I think the * was sort of messing things around, I have been looking at this, but still
don't the greedy mechanism in the regex engine. If I start from left of the regex, and potentially there might be no digits in the text, so for ver 1, it will match too. But for
ver 3, it won't match. Can someone give me an explanation for why it is that and how
I should write for what I want? (potentially with a number, not necessarily single digit)
Edit
Requirement: potentially with a number, not necessarily single digit, and match can not capture anything, but should not fail
The output must be as follows (for the above example):
config.txt 1
The regex /(\d*).*/ always matches immediately, because it can match zero characters. It translates to match as many digits at this position as possible (zero or more). Then, match as many non-newline characters as possible. Well, the match starts looking at the c of config. Ok, it matches zero digits.
You probably want to use a regex like /\.(\d+)$/ -- this matches an integer number between a period . and the end of string.
Use the literal '.' as a reference to match before the number:
#!/usr/bin/perl
use strict;
use warnings;
my #line = qw(config.txt file.txt config.txt.1 config.foo.2 config.txt.23 differentname.fsdfsdsdfasd.2444);
my (#capture1, #capture2);
foreach (#line){
my (#filematch) = ($_ =~ /(\w+\.\w+)/);
my (#numbermatch) = ($_ =~ /\w+\.\w+\.?(\d*)/);
my $numbermatch = $numbermatch[0] // $numbermatch[1];
push #capture1, #filematch;
push #capture2, #numbermatch;
}
print "$capture1[$_]\t$capture2[$_]\n" for 0 .. $#capture1;
Output:
config.txt
file.txt
config.txt 1
config.foo 2
config.txt 23
differentname.fsdfsdsdfasd 2444
Thanks guys, I think I figured out myself what I want:
my ($match) = $line =~ /\.(\d+)?/; #this will match and capture any digit
#number if there was one, and not fail
#if there wasn't one
To capture all digits following a final . and not fail the match if the string doesn't end with digits, use /(?:\.(\d+))?$/
perl -E 'if ("abc.123" =~ /(?:\.(\d+))?$/) { say "matched $1" } else { say "match failed" }'
matched 123
perl -E 'if ("abc" =~ /(?:\.(\d+))?$/) { say "matched $1" } else { say "match failed" }'
matched
You do not need .* at all. These two statements assign the exact same number:
my ($match1) = $str =~ /(\d+).*/;
my ($match1) = $str =~ /(\d+)/;
A regex by default matches partially, you do not need to add wildcards.
The reason your first match does not capture a number is because * can match zero times as well. And since it does not have to match your number, it does not. Which is why .* is actually detrimental in that regex. Unless something is truly optional, you should use + instead.

How can I extract a substring up to the first digit?

How can I find the first substring until I find the first digit?
Example:
my $string = 'AAAA_BBBB_12_13_14' ;
Result expected: 'AAAA_BBBB_'
Judging from the tags you want to use a regular expression. So let's build this up.
We want to match from the beginning of the string so we anchor with a ^ metacharacter at the beginning
We want to match anything but digits so we look at the character classes and find out this is \D
We want 1 or more of these so we use the + quantifier which means 1 or more of the previous part of the pattern.
This gives us the following regular expression:
^\D+
Which we can use in code like so:
my $string = 'AAAA_BBBB_12_13_14';
$string =~ /^\D+/;
my $result = $&;
Most people got half of the answer right, but they missed several key points.
You can only trust the match variables after a successful match. Don't use them unless you know you had a successful match.
The $&, $``, and$'` have well known performance penalties across all regexes in your program.
You need to anchor the match to the beginning of the string. Since Perl now has user-settable default match flags, you want to stay away from the ^ beginning of line anchor. The \A beginning of string anchor won't change what it does even with default flags.
This would work:
my $substring = $string =~ m/\A(\D+)/ ? $1 : undef;
If you really wanted to use something like $&, use Perl 5.10's per-match version instead. The /p switch provides non-global-perfomance-sucking versions:
my $substring = $string =~ m/\A\D+/p ? ${^MATCH} : undef;
If you're worried about what might be in \D, you can specify the character class yourself instead of using the shortcut:
my $substring = $string =~ m/\A[^0-9]+/p ? ${^MATCH} : undef;
I don't particularly like the conditional operator here, so I would probably use the match in list context:
my( $substring ) = $string =~ m/\A([^0-9]+)/;
If there must be a number in the string (so, you don't match an entire string that has no digits, you can throw in a lookahead, which won't be part of the capture:
my( $substring ) = $string =~ m/\A([^0-9]+)(?=[0-9])/;
$str =~ /(\d)/; print $`;
This code print string, which stand before matching
perl -le '$string=q(AAAA_BBBB_12_13_14);$string=~m{(\D+)} and print $1'
AAAA_BBBB_