Powershell Regex match statement - regex

Trying to get nxxxxx number as the output from below input,
uniqueMember: uid=n039833,ou=people,ou=networks,o=test,c=us
uniqueMember: uid=N019560, ou=people, ou=Networks, o=test, c=Us
Tried,
[Regex]::Matches($item, "uid=([^%]+)\,")
but this gives,
Groups : {0, 1}
Success : True
Name : 0
Captures : {0}
Index : 14
Length : 43
Value : uid=N018315,ou=people,ou=Networks,o=test,
Success : True
Name : 1
Captures : {1}
Index : 18
Length : 38
Value : N018315,ou=people,ou=Networks,o=test
Some help with improving the match statement appreciated ..

You can use
[Regex]::Matches($s, "(?<=uid=)[^,]+").Value
To save in an object variable:
$matches = [Regex]::Matches($s, "(?<=uid=)[^,]+").Value
Output:
n039833
N019560
Details:
(?<=uid=) - a positive lookbehind that requires uid= text to appear immediately to the left of the current location
[^,]+ - one or more chars other than a comma.

You can use a capture group and prevent matching , and if you don't want to match % you can also exclude that.
$s = "uniqueMember: uid=n039833,ou=people,ou=networks,o=test,c=us\nuniqueMember: uid=N019560, ou=people, ou=Networks, o=test, c=Us"
[regex]::Matches($s,'uid=([^,]+)') | Foreach-Object {$_.Groups[1].Value}
Output
n039833
N019560
Note that in the current pattern there should be a trailing comma present. If that is not ways the case, you can omit matching that from the pattern. If you only want to exclude matching a comma, the pattern will be:
uid=([^,]+)

Related

Regex to capture a variable number of items?

I am trying to use a regex to capture values from SPACE delimited items. Yes, I know that I could use [string]::Split() or -split. The goal is to use a regex in order fit it into the regex of another, larger regex.
There are a variable number of items in the string. In this example there are four (4). The resulting $Matches variable has the full string for all Value members. I also tried the regex '^((.*)\s*)+', but that resulted in '' for all except the first .\value.txt
How can I write a regex to capture a variable number of items.
PS C:\src\t> $s = 'now is the time'
PS C:\src\t> $m = [regex]::Matches($s, '^((.*)\s*)')
PS C:\src\t> $m
Groups : {0, 1, 2}
Success : True
Name : 0
Captures : {0}
Index : 0
Length : 15
Value : now is the time
ValueSpan :
PS C:\src\t> $m.Groups.Value
now is the time
now is the time
now is the time
PS C:\src\t> $PSVersionTable.PSVersion.ToString()
7.2.2
You can use [regex]::Match() to find the first matching substring, then call NextMatch() to advance through the input string until no further matches can be made.
I've taken the liberty of simplifying the expression to \S+ (consecutive non-whitespace characters):
$string = 'now is the time'
$regex = [regex]'\S+'
$match = $regex.Match($string)
while($match.Success){
Write-Host "Match at index [$($match.Index)]: '$($match.Value)'"
# advance to the next match, if any
$match = $match.NextMatch()
}
Which will print:
Match at index [0]: 'now'
Match at index [4]: 'is'
Match at index [7]: 'the'
Match at index [11]: 'time'
Mathias' answer shows an iterative approach to retrieving all matches, which may or may not be needed.
Building on your own attempt to use [regex]::Matches(), the solution is as simple as:
$s = 'now is the time'
[regex]::Matches($s, '\S+').Value # -> #('now', 'is', 'the', 'time')
As noted, \S+ matches any non-empty run (+) of non-whitespace characters (\S).
Thanks to member-access enumeration, accessing the .Value property on the method call's result, which is a collection of System.Text.RegularExpressions.Match instances, each instance's .Value property is returned, which in the case of two or more instances results in an array of values.
I guess the following will work for you
[^\s]+
[^\s] means "not a space"
+ means 1 or more characters

How do I pass a RegEx token to a PowerShell subexpression in a RegEx substitution?

I have the following code:-
'\u0026' -replace '(\u)(\d{4})', '$$([char]0x$2)'
That will obviously result with:-
$([char]0x0026)
If I make the RegEx substitution into an expandable string with:-
'\u0026' -replace '(\\u)(\d{4})', "$([char]0x`${2})"
Then I will get:-
Unexpected token '0x`$' in expression or statement.
If I simplify things to:-
'\u0026' -replace '(\\u)(\d{4})', "0x`${2}"
Then I can get:-
0x0026
But, what I want is to cast that '0x0026' to a char so it replaces '\u0026' to '&'. However, it seems impossible to pass a RegEx substituted token to a PowerShell subexpression in this way. If you separate the two languages with:-
'\u0026' -replace '(\\u)(\d{4})', "$([char]0x0026) 0x`${2}"
Then the below will result:-
& 0x0026
Which is great as it shows PowerShell subexpressions do work in RegEx substitutions as the converted ampersand shows.
I am new to RegEx. Have I hit my limit already?
Apperently, you want to unescape an escaped regular expression. You can do this using the .net [regex] unescape method:
[Regex]::Unescape('Jack\u0026Jill')
Yields:
Jack&Jill
There's a way in powershell 7, where -replace's 2nd arg can be a scriptblock. Getting the 2nd matching group takes a bit more doing using $_:
'\u0026' -replace '(\\u)(\d{4})', { $b = $_ }
$b.groups
Groups : {0, 1, 2}
Success : True
Name : 0
Captures : {0}
Index : 0
Length : 6
Value : \u0026
Success : True
Name : 1
Captures : {1}
Index : 0
Length : 2
Value : \u
Success : True
Name : 2
Captures : {2}
Index : 2
Length : 4
Value : 0026
'\u0026' -replace '(\\u)(\d{4})', { [char][int]('0x' + $_.groups[2]) }
&
Note that \d won't match all hex numbers. ([[:xdigit:]] doesn't work.)
'\u002b' -replace '(\\u)([0-9a-f]{4})', { [char][int]('0x' + $_.groups[2]) }
+
Use a scriptblock substitution (6.2 and up):
'\u0026' -replace '(\\u)(\d{4})', {"0x$($_.Groups[2].Value)"}
In earlier versions of PowerShell you can do the same by calling [Regex]::Replace():
[regex]::Replace('\u0026', '(\\u)(\d{4})', {param($m) "0x$($m.Groups[2].Value)"})
In both cases, the block will act as a callback for every single match, allowing you to construct the replacement string after getting access to the matched substring(s), but before the substitution takes place:
PS ~> [regex]::Replace('\u0026', '(\\u)(\d{4})', {param($m) "0x$($m.Groups[2].Value)"})
0x0026

Using Regex for complicated naming convention

I'm writing a Computer naming convention validation PowerShell script. I'm following a documentation my client provided. My first path I went down was to break each computer name into bits and process each part, but I seem to have issues near the end. I was thinking it would be easier to just use one regex to ensure its valid. Here is what I am trying to do
Naming rules:
Required: 1st is an alpha character; must be: A,B, or C
Required: 2nd s an alpha character; must be: L,K, or H
Required: The next 4 characters must be alpha and match either: DTLB,SOCK, or NUUB
Required: The next 6 characters are digits but the first part can either be: 1, 8, 13, or 83; the rest doesn't matter as long as they are digits
Optional: The next two characters can be alpha or alpha numeric but must be either: PE, Y1, or AC
Here are some tests with a regex that I understand, which is basic but works. It doesn't validate the actual characters or position, it just checks the first 12 characters and if there are two alphas characters at the end:
$regex = '^\w{12}(?:\w{2})$|^\w{12}$'
'AKDTLB123456' -match $regex; $Matches
True
'ALSOCK834561PE' -match $regex; $Matches
True
What I am trying to do is split these up into named parts and determine if the value matches the right area like (this is an example):
$regex = '(<type>^\w{1}[ABC])(?<form>\w{1,1}[LKH])(<locale>\w{4,4}[DTLB|SOCK|NUUB])(<identifier>\d{1,1}[1|8]|<identifier>\d{1,2}[13|83])(<unique>\d+)(<role>\w{2,2}[PE|AC]$|<role>\w{1}[Y]\d{1}$)'
My goal is to get it to output like this:
Example 1
'AKDTLB893456' -match $regex; $Matches
True
Name Value
---- -----
type A
form K
locale DTLB
identifier 1
Unique 23456
0 AKDTLB893456
Example 2:
'ALSOCK123456PE' -match $regex; $Matches
True
Name Value
---- -----
type A
form L
locale SOCK
identifier 1
Unique 23456
0 ALSOCK123456PE
Example 3
'ALSUCK123456PE' -match $regex; $Matches
False
Name Value
---- -----
type A
form L
locale <--not matched
identifier 1
Unique 23456
0 ALSuCK123456PE
The best I can do is:
$regex = '(^\w{1})(\w{1,1})(\w{4,4})(\d{2}[13|83]|\d{1}[1|2|3])(\d{4,5})(\w{2,2}$|$)'
'ALSOCK124561PE' -match $regex; $Matches
True
Name Value
---- -----
6 PE
5 4561
4 12
3 SOCK
2 L
1 A
0 ALSOCK834561PE
However this doesn't check for if the name is just 112345 not 13
I have been all over the internet and using the regex online tool, but am unable to come up with a solution. This may not be possible to do both. I find if the value is false, its not outputting the one that match and not the rest. Is there a way to get the "why it doesn't match" as an output?
Any ideas?
You may use
^(?<type>[ABC])(?<form>[LKH])(?<locale>DTLB|SOCK|NUUB)(?<identifier>[18](3)?)(?<unique>\d{4}(?(1)|\d))(?<role>PE|Y1|AC)?$
Note: If the pattern must be matched in a case sensitive way, replace -match with -cmatch.
See the .NET regex demo (do not test at regex101.com!)
Details
^ - start of a string
(?<type>[ABC]) - Group "type": A, B or C
(?<form>[LKH]) - Group "type": L, K or H
(?<locale>DTLB|SOCK|NUUB) - Group "locale": DTLB, SOCK or NUUB
(?<identifier>[18](3)?) - Group "identifier": 1 or 8 and then an optional 3 digit captured into Group 1
(?<unique>\d{4}(?(1)|\d)) - Group "unique": four digits and if Group 1 did not match, one more digit is required to match then
(?<role>PE|Y1|AC)? - an optional Group "role": PE, Y1 or AC
$ - end of string.
The second part of your question asked if you could determine where the match failed. You can do something similar to below to test each part of the regex in sections starting from the left:
$strings = 'AKDTLB893456','ALSOCK123456PE','ALSuCK123456PE','ALSOCK123456R2'
$regexes = [ordered]#{}
$regexes.Add('type','^(?<type>[ABC])')
$regexes.Add('form','(?<form>[LKH])')
$regexes.Add('locale','(?<locale>DTLB|SOCK|NUUB)')
$regexes.Add('identifier','(?<identifier>[18])')
$regexes.Add('unique','(?<unique>\d{5})')
$regexes.Add('optional','(?<optional>PE|Y1|AC)?$')
foreach ($string in $strings) {
$test = [text.stringbuilder]''
$regexes.GetEnumerator() | Foreach-Object {
$null = $test.Append($_.Value)
if (!($string -match $test.ToString())) {
"$String failed at $($_.Key)"
Continue
}
}
$matches
}
Explanation:
$regexes is an ordered hash table where we can append some key-value pairs. Order is good here because we want to test matching from left to right in your string.
With a [Text.StringBuilder] object, we can append strings with the Append() method. The idea is to append the new regex string you want to test. If $string continues to match through each value of the hash table, $matches will output. Otherwise, the failed match will cause the current, testing $regex key value to output.
Note that this will not perform as well as a single -match operation. You could test the entire match first and only perform sectional testing when a $false is returned. That will increase performance.

Search group in bin text with regex

I need to found the groups in a big text by knowing of:
Word that define the start of a group
Word contained in the group
Word that define the finish of group group
the start word is : begin
the contained word is: 536916223
the finish word is: end
On the text , in the bottom, I need to find 2 groups..
I have tried to use:
\bbegin.*(\n*.*)*536916223(\n*.*)*\bbegin
but if I will be try to use the previous regex on the site "http://regexr.com/"
it respond with timeout... and I think the regex is not very good :(
The text is:
begin active link
export-version : 11
actlink-order : 2
wk-conn-type : 1
schema-name : HelpDesk
actlink-mask : 1
actlink-control: 750000002
enable : 1
action {
set-field : 0\536916222\101\4\1\1\
}
errhandler-name:
end
begin active link
export-version : 11
actlink-order : 2
wk-conn-type : 1
schema-name : HelpDesk
actlink-mask : 1
actlink-control: 610000092
enable : 1
permission : 0
action {
id : 536916223
focus : 0
access-opt : 1
option : 0
}
action {
set-field : 0\536916222\101\4\1\1\
}
errhandler-opt : 0
errhandler-name:
end
begin active link
actlink-order : 12
wk-conn-type : 1
schema-name : HelpDesk
actlink-mask : 2064
enable : 1
permission : 0
action {
id : 536916223
focus : 0
access-opt : 1
option : 0
}
action {
set-field : 0\536916222\101\4\1\1\
}
errhandler-opt : 0
errhandler-name:
end
Can someone suggest me a optimize regex for this work?
Regards,
Vincenzo
Use an unrolled tempered greedy token:
/\bbegin.*(?:\n(?!begin|end(?:$|\n)).*)*\b536916223\b.*(?:\n(?!begin|end(?:$|\n)).*)*\nend/g
or a shorter version if we add MULTILINE modifier:
/^begin.*(?:\n(?!begin|end$).*)*\b536916223\b.*(?:\n(?!begin|end$).*)*\nend$/gm
See the regex demo (a version with MULTILINE modifier)
Details:
\bbegin - a word begin (a word boundary \b can be added after it for surer matches)
.* - the rest of the line after begin
(?:\n(?!begin|end(?:$|\n)).*)* - the unrolled tempered greedy token (?:(?!\n(?:begin|end(?:$|\n)))[\s\S])* matching any sequence but begin at the beginning of a line and end as a whole line
\b536916223\b - the whole word 536916223
.* - the rest of the line after the number
(?:\n(?!begin|end(?:$|\n)).*)* - another unrolled tempered greedy token
\nend - the end word after a newline (a (?:$|\n) can be added after it for surer matches)
The .*(\n*.*)* part is a bit complicated and results in many backtrack.
Since . does not match whitespace character, we can use a global wildcard such as [\S\s] to match any character. Another possible improvement (and possibly correction) is to use lazy match, i.e. *?
The following pattern seems to work fine
\bbegin[\S\s]*?536916223[\S\s]*?\bend
Regex (with m modifier set):
^begin(?:(?!^end)[\s\S])*?536916223[\s\S]*?end
Explanation:
^begin # Match `begin` at start of line
(?: # Start of non-capturing group (a)
(?!^end)[\s\S] # A character which is not followed by `end` delimiter
)*? # Zero or more times (un-greedy)
536916223 # Up to special word
[\s\S]*? # Match any other characters
end # Up to first `end` delimiter
Live demo
Much more efficient version - (with m modifier set):
^begin.*(?:\n(?!^end).*)*536916223(?:.*\n)*?^end
Live demo

.NET regex with quote and space

I'm trying to create a regex to match this:
/tags/ud617/?sort=active&page=2" >2
So basically, "[number]" is the only dynamic part:
/tags/ud617/?sort=active&page=[number]" >[number]
The closest I've been able to get (in PowerShell) is:
[regex]::matches('/tags/ud617/?sort=active&page=2" >2
','/tags/ud617/\?sort=active&page=[0-9]+')
But this doesn't provide me with a full match of the dynamic string.
Ultimately, I'll be creating a capture group:
/tags/ud617/?sort=active&page=([number])
Seems easy enough:
$regex = '/tags/ud617/\?sort=active&page=(\d+)"\s>2'
'/tags/ud617/?sort=active&page=2" >2' -match $regex > $nul
$matches[1]
2
[regex]::matches('/tags/ud617/?sort=active&page=3000 >2','/tags/ud617/\?sort=active&page=(\d+) >(\d+)')
Outputs:
Groups : {/tags/ud617/?sort=active&page=3000 >2, 3000, 2}
Success : True
Captures : {/tags/ud617/?sort=active&page=3000 >2}
Index : 0
Length : 41
Value : /tags/ud617/?sort=active&page=3000 >2
This captures the page value and the number after the greater than i.e. 2