Using Regex for complicated naming convention - regex

I'm writing a Computer naming convention validation PowerShell script. I'm following a documentation my client provided. My first path I went down was to break each computer name into bits and process each part, but I seem to have issues near the end. I was thinking it would be easier to just use one regex to ensure its valid. Here is what I am trying to do
Naming rules:
Required: 1st is an alpha character; must be: A,B, or C
Required: 2nd s an alpha character; must be: L,K, or H
Required: The next 4 characters must be alpha and match either: DTLB,SOCK, or NUUB
Required: The next 6 characters are digits but the first part can either be: 1, 8, 13, or 83; the rest doesn't matter as long as they are digits
Optional: The next two characters can be alpha or alpha numeric but must be either: PE, Y1, or AC
Here are some tests with a regex that I understand, which is basic but works. It doesn't validate the actual characters or position, it just checks the first 12 characters and if there are two alphas characters at the end:
$regex = '^\w{12}(?:\w{2})$|^\w{12}$'
'AKDTLB123456' -match $regex; $Matches
True
'ALSOCK834561PE' -match $regex; $Matches
True
What I am trying to do is split these up into named parts and determine if the value matches the right area like (this is an example):
$regex = '(<type>^\w{1}[ABC])(?<form>\w{1,1}[LKH])(<locale>\w{4,4}[DTLB|SOCK|NUUB])(<identifier>\d{1,1}[1|8]|<identifier>\d{1,2}[13|83])(<unique>\d+)(<role>\w{2,2}[PE|AC]$|<role>\w{1}[Y]\d{1}$)'
My goal is to get it to output like this:
Example 1
'AKDTLB893456' -match $regex; $Matches
True
Name Value
---- -----
type A
form K
locale DTLB
identifier 1
Unique 23456
0 AKDTLB893456
Example 2:
'ALSOCK123456PE' -match $regex; $Matches
True
Name Value
---- -----
type A
form L
locale SOCK
identifier 1
Unique 23456
0 ALSOCK123456PE
Example 3
'ALSUCK123456PE' -match $regex; $Matches
False
Name Value
---- -----
type A
form L
locale <--not matched
identifier 1
Unique 23456
0 ALSuCK123456PE
The best I can do is:
$regex = '(^\w{1})(\w{1,1})(\w{4,4})(\d{2}[13|83]|\d{1}[1|2|3])(\d{4,5})(\w{2,2}$|$)'
'ALSOCK124561PE' -match $regex; $Matches
True
Name Value
---- -----
6 PE
5 4561
4 12
3 SOCK
2 L
1 A
0 ALSOCK834561PE
However this doesn't check for if the name is just 112345 not 13
I have been all over the internet and using the regex online tool, but am unable to come up with a solution. This may not be possible to do both. I find if the value is false, its not outputting the one that match and not the rest. Is there a way to get the "why it doesn't match" as an output?
Any ideas?

You may use
^(?<type>[ABC])(?<form>[LKH])(?<locale>DTLB|SOCK|NUUB)(?<identifier>[18](3)?)(?<unique>\d{4}(?(1)|\d))(?<role>PE|Y1|AC)?$
Note: If the pattern must be matched in a case sensitive way, replace -match with -cmatch.
See the .NET regex demo (do not test at regex101.com!)
Details
^ - start of a string
(?<type>[ABC]) - Group "type": A, B or C
(?<form>[LKH]) - Group "type": L, K or H
(?<locale>DTLB|SOCK|NUUB) - Group "locale": DTLB, SOCK or NUUB
(?<identifier>[18](3)?) - Group "identifier": 1 or 8 and then an optional 3 digit captured into Group 1
(?<unique>\d{4}(?(1)|\d)) - Group "unique": four digits and if Group 1 did not match, one more digit is required to match then
(?<role>PE|Y1|AC)? - an optional Group "role": PE, Y1 or AC
$ - end of string.

The second part of your question asked if you could determine where the match failed. You can do something similar to below to test each part of the regex in sections starting from the left:
$strings = 'AKDTLB893456','ALSOCK123456PE','ALSuCK123456PE','ALSOCK123456R2'
$regexes = [ordered]#{}
$regexes.Add('type','^(?<type>[ABC])')
$regexes.Add('form','(?<form>[LKH])')
$regexes.Add('locale','(?<locale>DTLB|SOCK|NUUB)')
$regexes.Add('identifier','(?<identifier>[18])')
$regexes.Add('unique','(?<unique>\d{5})')
$regexes.Add('optional','(?<optional>PE|Y1|AC)?$')
foreach ($string in $strings) {
$test = [text.stringbuilder]''
$regexes.GetEnumerator() | Foreach-Object {
$null = $test.Append($_.Value)
if (!($string -match $test.ToString())) {
"$String failed at $($_.Key)"
Continue
}
}
$matches
}
Explanation:
$regexes is an ordered hash table where we can append some key-value pairs. Order is good here because we want to test matching from left to right in your string.
With a [Text.StringBuilder] object, we can append strings with the Append() method. The idea is to append the new regex string you want to test. If $string continues to match through each value of the hash table, $matches will output. Otherwise, the failed match will cause the current, testing $regex key value to output.
Note that this will not perform as well as a single -match operation. You could test the entire match first and only perform sectional testing when a $false is returned. That will increase performance.

Related

Regex to capture a variable number of items?

I am trying to use a regex to capture values from SPACE delimited items. Yes, I know that I could use [string]::Split() or -split. The goal is to use a regex in order fit it into the regex of another, larger regex.
There are a variable number of items in the string. In this example there are four (4). The resulting $Matches variable has the full string for all Value members. I also tried the regex '^((.*)\s*)+', but that resulted in '' for all except the first .\value.txt
How can I write a regex to capture a variable number of items.
PS C:\src\t> $s = 'now is the time'
PS C:\src\t> $m = [regex]::Matches($s, '^((.*)\s*)')
PS C:\src\t> $m
Groups : {0, 1, 2}
Success : True
Name : 0
Captures : {0}
Index : 0
Length : 15
Value : now is the time
ValueSpan :
PS C:\src\t> $m.Groups.Value
now is the time
now is the time
now is the time
PS C:\src\t> $PSVersionTable.PSVersion.ToString()
7.2.2
You can use [regex]::Match() to find the first matching substring, then call NextMatch() to advance through the input string until no further matches can be made.
I've taken the liberty of simplifying the expression to \S+ (consecutive non-whitespace characters):
$string = 'now is the time'
$regex = [regex]'\S+'
$match = $regex.Match($string)
while($match.Success){
Write-Host "Match at index [$($match.Index)]: '$($match.Value)'"
# advance to the next match, if any
$match = $match.NextMatch()
}
Which will print:
Match at index [0]: 'now'
Match at index [4]: 'is'
Match at index [7]: 'the'
Match at index [11]: 'time'
Mathias' answer shows an iterative approach to retrieving all matches, which may or may not be needed.
Building on your own attempt to use [regex]::Matches(), the solution is as simple as:
$s = 'now is the time'
[regex]::Matches($s, '\S+').Value # -> #('now', 'is', 'the', 'time')
As noted, \S+ matches any non-empty run (+) of non-whitespace characters (\S).
Thanks to member-access enumeration, accessing the .Value property on the method call's result, which is a collection of System.Text.RegularExpressions.Match instances, each instance's .Value property is returned, which in the case of two or more instances results in an array of values.
I guess the following will work for you
[^\s]+
[^\s] means "not a space"
+ means 1 or more characters

Change 3rd octet of IP in string format using PowerShell

Think I've found the worst way to do this:
$ip = "192.168.13.1"
$a,$b,$c,$d = $ip.Split(".")
[int]$c = $c
$c = $c+1
[string]$c = $c
$newIP = $a+"."+$b+"."+$c+"."+$d
$newIP
But what is the best way? Has to be string when completed. Not bothered about validating its a legit IP.
Using your example for how you want to modify the third octet, I'd do it pretty much the same way, but I'd compress some of the steps together:
$IP = "192.168.13.1"
$octets = $IP.Split(".") # or $octets = $IP -split "\."
$octets[2] = [string]([int]$octets[2] + 1) # or other manipulation of the third octet
$newIP = $octets -join "."
$newIP
You can simply use the -replace operator of PowerShell and a look ahead pattern. Look at this script below
Set-StrictMode -Version "2.0"
$ErrorActionPreference="Stop"
cls
$ip1 = "192.168.13.123"
$tests=#("192.168.13.123" , "192.168.13.1" , "192.168.13.12")
foreach($test in $tests)
{
$patternRegex="\d{1,3}(?=\.\d{1,3}$)"
$newOctet="420"
$ipNew=$test -replace $patternRegex,$newOctet
$msg="OLD ip={0} NEW ip={1}" -f $test,$ipNew
Write-Host $msg
}
This will produce the following:
OLD ip=192.168.13.123 NEW ip=192.168.420.123
OLD ip=192.168.13.1 NEW ip=192.168.420.1
OLD ip=192.168.13.12 NEW ip=192.168.420.12
How to use the -replace operator?
https://powershell.org/2013/08/regular-expressions-are-a-replaces-best-friend/
Understanding the pattern that I have used
The (?=) in \d{1,3}(?=.\d{1,3}$) means look behind.
The (?=.\d{1,3}$ in \d{1,3}(?=.\d{1,3}$) means anything behind a DOT and 1-3 digits.
The leading \d{1,3} is an instruction to specifically match 1-3 digits
All combined in plain english "Give me 1-3 digits which is behind a period and 1-3 digits located towards the right side boundary of the string"
Look ahead regex
https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference
CORRECTION
The regex pattern is a look ahead and not look behind.
If you have PowerShell Core (v6.1 or higher), you can combine -replace with a script block-based replacement:
PS> '192.168.13.1' -replace '(?<=^(\d+\.){2})\d+', { 1 + $_.Value }
192.168.14.1
Negative look-behind assertion (?<=^(\d+\.){2}) matches everything up to, but not including, the 3rd octet - without considering it part of the overall match to replace.
(?<=...) is the look-behind assertion, \d+ matches one or more (+) digits (\d), \. a literal ., and {2} matches the preceding subexpression ((...)) 2 times.
\d+ then matches just the 3rd octet; since nothing more is matched, the remainder of the string (. and the 4th octet) is left in place.
Inside the replacement script block ({ ... }), $_ refers to the results of the match, in the form of a [MatchInfo] instance; its .Value is the matched string, i.e. the 3rd octet, to which 1 can be added.
Data type note: by using 1, an implicit [int], as the LHS, the RHS (the .Value string) is implicitly coerced to [int] (you may choose to use an explicit cast).
On output, whatever the script block returns is automatically coerced back to a string.
If you must remain compatible with Windows PowerShell, consider Jeff Zeitlin's helpful answer.
For complete your method but shortly :
$a,$b,$c,$d = "192.168.13.1".Split(".")
$IP="$a.$b.$([int]$c+1).$d"
function Replace-3rdOctet {
Param(
[string]$GivenIP,
[string]$New3rdOctet
)
$GivenIP -match '(\d{1,3}).(\d{1,3}).(\d{1,3}).(\d{1,3})' | Out-Null
$Output = "$($matches[1]).$($matches[2]).$New3rdOctet.$($matches[4])"
Return $Output
}
Copy to a ps1 file and dot source it from command line, then type
Replace-3rdOctet -GivenIP '100.201.190.150' -New3rdOctet '42'
Output: 100.201.42.150
From there you could add extra error handling etc for random input etc.
here's a slightly different method. [grin] i managed to not notice the answer by JeffZeitlin until after i finished this.
[edit - thanks to JeffZeitlin for reminding me that the OP wants the final result as a string. oops! [*blush*]]
what it does ...
splits the string on the dots
puts that into an [int] array & coerces the items into that type
increments the item in the targeted slot
joins the items back into a string with a dot for the delimiter
converts that to an IP address type
adds a line to convert the IP address to a string
here's the code ...
$OriginalIPv4 = '1.1.1.1'
$TargetOctet = 3
$OctetList = [int[]]$OriginalIPv4.Split('.')
$OctetList[$TargetOctet - 1]++
$NewIPv4 = [ipaddress]($OctetList -join '.')
$NewIPv4
'=' * 30
$NewIPv4.IPAddressToString
output ...
Address : 16908545
AddressFamily : InterNetwork
ScopeId :
IsIPv6Multicast : False
IsIPv6LinkLocal : False
IsIPv6SiteLocal : False
IsIPv6Teredo : False
IsIPv4MappedToIPv6 : False
IPAddressToString : 1.1.2.1
==============================
1.1.2.1

Regex Get a substring from a string nearest to the end

I'm trying to get a substring from a string using a powershell script and regex.
For example I'm trying to get a year that's part of a filename.
Example Filename "Expo.2000.Brazilian.Pavillon.after.Something.2016.SomeTextIDontNeed.jpg"
The problem is that the result of the regex gives me "2000" and no other matches. I need to get "2016" matched. Sadly $matches only has one matched instance. Do I have missed something? I feel getting nuts ;)
If $matches would contain all instances found I could handle getting the nearest to end instance with:
$Year = $matches[$matches.Count-1]
Powershell Code:
# Function to get the images year and clean up image information after it.
Function Remove-String-Behind-Year
{
param
(
[string]$OriginalFileName # Provide the BaseName of the image file.
)
[Regex]$RegExYear = [Regex]"(?<=\.)\d{4}(?=\.|$)" Regex to match a four digit string, prepended by a dot and followed by a dot or the end of the string.
$OriginalFileName -match $RegExYear # Matches the Original Filename with the Regex
Write-Host "Count: " $matches.Count # Why I only get 1 result?
Write-Host "BLA: " $matches[0] # First and only match is "2000"
}
Wanted Result Table:
"x.2000.y.2016.z" => "2016" (Does not work)
"x.y.2016" => "2016" (Works)
"x.y.2016.z" => "2016" (Works)
"x.y.20164.z" => "" (Works)
"x.y.201.z" => "" (Works)
PowerShell's -match operator only ever finds (at most) one match (although multiple substrings of that one match may be found with capture groups).
However, using the fact that quantifier * is greedy (by default), we can still use that one match to find the last match in the input:
-match '^.*\.(\d{4})\b' finds the longest prefix of the input that ends in a 4-digit sequence preceded by a literal . and followed by a word boundary, so that $matches[1] then contains the last occurrence of such a 4-digit sequence.
Function Extract-Year
{
param
(
[string] $OriginalFileName # Provide the BaseName of the image file.
)
if ($OriginalFileName -match '^.*\.(\d{4})\b') {
$matches[1] # output last 4-digit sequence found
} else {
'' # output empty string to indicate that no 4-digit sequence was found.
}
}
'x.2000.y.2016.z', 'x.y.2016', 'x.y.2016.z', 'x.y.20164.z', 'x.y.201.z' |
% { Extract-Year $_ }
yields
2016
2016
2016
# empty line
# empty line

Regex against an odd value

I am parsing through a csv to export into two catagories: Matches and NonMatches via powershell.
The data in the csv is formated as the following:
SamAccountName
xx9999xx
aa0000aa
ab0909xc
etc
I need a regex that will allow me to filter out all the matches that follow the aa0000bb naming convention and export that into a csv (same with the ones that dont match the aa0000bb convention)
Convention (because it's odd): the first 2 are letters ranging from a-z,then 4 numbers 0-9, then 2 letters a-z. Meaning you can have aa0000aa and by7690zi
Right now I have:
Import-Csv "C:\Path\to\csv" | ForEach {
$Name =$_.SamAccountName;
$Name -Match '[a-z]+[a-z]+[0-9]+[0-9]+[0-9]+[0-9]+[a-z]+[a-z]'} | Out-Null; $Matches
or
ForEach ($Name in $CSV)
{
If ($Name -Match '[a-z]+[a-z]+[0-9]+[0-9]+[0-9]+[0-9]+[a-z]+[a-z]')
{
Write-Host "$Matches"
}else{
Write-Host "No Match"
}
}
Both seem to output random things that are not correct.
I'm thinking (hoping really) that there is a way to match by the following:
a-z #For the first set of letters
0-9 #For the 4 digits
a-z #For the second set of letters
ie [a-z{2}]+[0-9{4}]+[a-z{2}]
I can run a normal 'xx9999xx' -Match '[a-z]+[a-z]+[0-9]+[0-9]+[0-9]+[0-9]+[a-z]+[a-z]' and it returns true and then
PS C:\Windows> $Matches
Name Value
---- -----
0 xx9999xx
But I still don't understand how to do that on a mass scale.
Correct regex to match your convention should be as following.
Regex: [a-z]{2}[0-9]{4}[a-z]{2}
Regex101 Demo
noob has a correct regex to match your problem. To understand why yours was matching more than you expected it is because the plus sign is a quantifier meta character in regex. It means
Between one and unlimited times, as many times as possible, giving back as needed
So if you have more numbers than two it would try to match more. So this would be a valid match
aaaa00000000000aaaa
Using a fixed quantifier {2}, {4} etc. will match exactly what you want.
Regex101.com is a great resource for quick testing. It also gives a very good explanation of your regex.

Arithmetic Calculation in Perl Substitute Pattern Matching

Using just one Perl substitute regular expression statement (s///), how can we write below:
Every success match contains just a string of Alphabetic characters A..Z. We need to substitute the match string with a substitution that will be the sum of character index (in alphabetical order) of every character in the match string.
Note: For A, character index would be 1, for B, 2 ... and for Z would be 26.
Please see example below:
success match: ABCDMNA
substitution result: 38
Note:
1 + 2 + 3 + 4 + 13 + 14 + 1 = 38;
since
A = 1, B = 2, C = 3, D = 4, M = 13, N = 14 and A = 1.
I will post this as an answer, I guess, though the credit for coming up with the idea should go to abiessu for the idea presented in his answer.
perl -ple'1 while s/(\d*)([A-Z])/$1+ord($2)-64/e'
Since this is clearly homework and/or of academic interest, I will post the explanation in spoiler tags.
- We match an optional number (\d*), followed by a letter ([A-Z]). The number is the running sum, and the letter is what we need to add to the sum.
- By using the /e modifier, we can do the math, which is add the captured number to the ord() value of the captured letter, minus 64. The sum is returned and inserted instead of the number and the letter.
- We use a while loop to rinse and repeat until all letters have been replaced, and all that is left is a number. We use a while loop instead of the /g modifier to reset the match to the start of the string.
Just split, translate, and sum:
use strict;
use warnings;
use List::Util qw(sum);
my $string = 'ABCDMNA';
my $sum = sum map {ord($_) - ord('A') + 1} split //, $string;
print $sum, "\n";
Outputs:
38
Can you use the /e modifier in the substitution?
$s = "ABCDMNA";
$s =~ s/(.)/$S += ord($1) - ord "#"; 1 + pos $s == length $s ? $S : ""/ge;
print "$s\n"
Consider the following matching scenario:
my $text = "ABCDMNA";
my $val = $text ~= s!(\d)*([A-Z])!($1+ord($2)-ord('A')+1)!gr;
(Without having tested it...) This should repeatedly go through the string, replacing one character at a time with its ordinal value added to the current sum which has been placed at the beginning. Once there are no more characters the copy (/r) is placed in $val which should contain the translated value.
Or an short alternative:
echo ABCDMNA | perl -nlE 'm/(.)(?{$s+=-64+ord$1})(?!)/;say$s'
or readable
$s = "ABCDMNA";
$s =~ m/(.)(?{ $sum += ord($1) - ord('A')+1 })(?!)/;
print "$sum\n";
prints
38
Explanation:
trying to match any character what must not followed by "empty regex". /.(?!)/
Because, an empty regex matches everything, the "not follow by anything", isn't true ever.
therefore the regex engine move to the next character, and tries the match again
this is repeated until is exhausted the whole string.
because we want capture the character, using capture group /(.)(?!)/
the (?{...}) runs the perl code, what sums the value of the captured character stored in $1
when the regex is exhausted (and fails), the last say $s prints the value of sum
from the perlre
(?{ code })
This zero-width assertion executes any embedded Perl code. It always
succeeds, and its return value is set as $^R .
WARNING: Using this feature safely requires that you understand its
limitations. Code executed that has side effects may not perform
identically from version to version due to the effect of future
optimisations in the regex engine. For more information on this, see
Embedded Code Execution Frequency.