RegEx omit optional prefix in UPN or displayName - regex

I am trying to get only the "nonpersonalizedusername" including its number or the surname.
To add more detail, I'd like to accomplish something like:
If there's an #-Symbol, get me everything that is in front of that #-Symbol, otherwise get me the whole string.
Plus, if then there's a dot "." in it, get me everything after that dot.
Let's assume I have the following stringsof userPrincipalNames and/or displayNames:
nonpersonalizedusername004
nonpersonalizedusername019#domaina.local
prefixc.nonpersonalizedusername044#domaina.local
nonpersonalizedusername038#domainb.local
prefixa.nonpersonalizedusername002#domaina.local
prefixb.nonpersonalizedusername038#domainb.local
givenname.surname
givenname.surname#domaina.local
What I got so far is this expression:
^(?:.*?\.)?(.+?)(?:#.*)?$
but this only works, if there's an #-Symbol AND that "prefixing"-Dot in the string OR neither Dot nor #-Symbol.
If there's an #-Symbol, but no prefixing-dot, I'm getting only that "local"-part from the end.
https://regex101.com/r/1aflGH/1

You can use
^(?:[^#.]*\.)?([^#]+)(?:#.*)?$
See the regex demo. The \n is added to the negated character classes at regex101 as the test is run against a single multiline string.
Details:
^ - start of string
(?:[^#.]*\.)? - an optional sequence of any zero or more chars other than # and . and then a .
([^#]+) - Group 1: one or more chars other than # char
(?:#.*)? - an optional sequence of # and then the rest of the line
$ - end of string.

You might optionally repeat matches until the last dot before the #, and then capture the rest after that do till the # in group 1.
^(?:[^#.]*\.)*([^#.]+)
The pattern matches:
^ Start of string
(?: Non capture group
[^#.]*\. Optionally repeat matching any char except # or ., then match .
)* Close non capture group and optionally repeat
( Capture group 1
[^#.]+
) Close group 1
Regex demo
Powershell example
$s = #"
nonpersonalizedusername004
nonpersonalizedusername019#domaina.local
prefixc.nonpersonalizedusername044#domaina.local
nonpersonalizedusername038#domainb.local
prefixa.nonpersonalizedusername002#domaina.local
prefixb.nonpersonalizedusername038#domainb.local
givenname.surname
givenname.surname#domaina.local
"#
Select-String '(?m)^(?:[^#.\n]*\.)*([^#.\n]+)' -input $s -AllMatches | Foreach-Object {$_.Matches} | Foreach-Object {$_.Groups[1].Value}
Output
nonpersonalizedusername004
nonpersonalizedusername019
nonpersonalizedusername044
nonpersonalizedusername038
nonpersonalizedusername002
nonpersonalizedusername038
surname
surname

Related

Powershell - Should take only set of numbers from file name

I have a script that read a file name from path location and then he takes only the numbers and do something with them. Its working fine until I encounter with this situation.
For an example:
For the file name Patch_1348968.vip it takes the number 1348968.
In the case the file name is Patch_1348968_v1.zip it takes the number 13489681 that is wrong.
I am using this to fetch the numbers. In general it always start with patch_#####.vip with 7-8 digits so I want to take only the digits
before any sign like _ or -.
$PatchNumber = $file.Name -replace "[^0-9]" , ''
You can use
$PatchNumber = $file.Name -replace '.*[-_](\d+).*', '$1'
See the regex demo.
Details:
.* - any chars other than newline char as many as possible
[-_] - a - or _
(\d+) - Group 1 ($1): one or more digits
.* - any chars other than newline char as many as possible.
I suggest to use -match instead, so you don't have to think inverted:
if( $file.Name -match '\d+' ) {
$PatchNumber = $matches[0]
}
\d+ matches the first consecutive sequence of digits. The automatic variable $matches contains the full match at index 0, if the -match operator successfully matched the input string against the pattern.
If you want to be more specific, you could use a more complex pattern and extract the desired sub string using a capture group:
if( $file.Name -match '^Patch_(\d+)' ) {
$PatchNumber = $matches[1]
}
Here, the anchor ^ makes sure the match starts at the beginning of the input string, then Patch_ gets matched literally (case-insensitive), followed by a group of consecutive digits which gets captured () and can be extracted using $matches[1].
You can get an even more detailed explanation of the RegEx and the ability to experiment with it at regex101.com.

How do I capture the 2nd match for each line?

Basically, I need to match with 1 per line but right now, my regex is matching 2 per line.
https://regex101.com/r/KmgGwS/8
My regex is looking for 2 slashes and it returns the string in between but the problem is my path has multiple slashes and I only need to match it with the 2nd match per each line
(?<=\\).*?(?=\\)
This is my PowerShell code:
if ( $_.PSPath -match ("(?<=::).*?(?=\\)")) {
$user = $matches.Values
}
For example:
Microsoft.PowerShell.Security\Certificate::CurrentUser\Root\CDD4EEAE6000AC7F40C3802C171E30148030C072
Microsoft.PowerShell.Security\Certificate::CurrentUser\Root\BE36A4562FB2EE05DBB3D32323ADF445084ED656
What my code does is it gets
Certificate::CurrentUserRoot
Certificate::CurrentUserRoot
but what I only really need is get the string to the 2nd match \ ___\ which is:
Root
Root
You could make use of an anchor ^ to assert the start of the string. Repeat 2 times matching not a backslash or a newline followed by a backslash.
Use a capturing group to match what follows;
^[^\\\r\n]*\\[^\\\r\n]*\\([^\\\r\n]+)
About the pattern
^ Start of string
[^\\\r\n]*\\[^\\\r\n]*\\ Match 2 times not \ or a newline, then \
( Capture group 1
[^\\\r\n]+ Match 1+ times not \ or a newline
) Close group 1
Regex demo | Try it online
The value is in the first capturing group:
$user = $matches[1]
If you want the match only to use your script instead of group 1, you could make use of a positive lookbehind to assert what is on the left is 2 times not \ followed by \
(?<=^[^\\\r\n]*\\[^\\\r\n]*\\)[^\\\r\n]+
Regex demo | Try it online
I'm guessing that, maybe an expression similar to,
(?<=\\)[^\\]*(?=\\[A-Z0-9]{40}$)
might be an option to look into.
Demo 1
Or maybe just,
[^\\]*(?=\\[A-Z0-9]{40}$)
or
[^\\]*(?=\\[A-F0-9]{40}$)
would simply return Root and 40 is the length of [A-F0-9] ending substring. For more flexible quantifier, this expression might work:
[^\\]*(?=\\[A-F0-9]*$)
Demo 2
To offer a pragmatic alternative using PowerShell's -split operator:
PS> 'Microsoft.PowerShell.Security\Certificate::CurrentUser\Root\CDD4EEAE6000AC7F40C3802C171E30148030C072',
'Microsoft.PowerShell.Security\Certificate::CurrentUser\Root\BE36A4562FB2EE05DBB3D32323ADF445084ED656' |
ForEach-Object { ($_ -split '[::|\\]')[4] }
Root
Root
The above tokenizes each input string by separators \ or ::, and extracts the 4th token.

Regex: Matches a multi-line pattern until the same one occurs again

I need to match 3 parts in the following bit:
# [1.3.3] (2019-04-16)
### Blah
* Loreum ipsum
# [1.3.0] (2019-04-01)
### Foo
* Loreum ipsum
# [1.2.0] (2019-03-05)
### Foo
* Loreum ipsum
Basically the first one would be
# [1.3.3] (2019-04-16)
### Blah
* Loreum ipsum
and so on.
I tried the following:
(# \[.*\] \([0-9\-]{10}\)(\n|.)*)
But that basically would go on to match the whole document. I need to tell him to stop matching until a new line start with (# \[) (what would be ^(?!(# \[)).*$)
You could use the first part of your pattern to match the first line and then use a negative lookahead (?!# ) to match the following lines if they don't start with # followed by a space:
^# \[[^]]+\] \([\d-]{10}\)\n(?:(?!# ).*(?:\n|$))*
About the pattern
^# Start of string followd by # and space
\[[^]]+\] Match from opening till closing square bracket using a negated character class
\([\d-]{10}\)\n Match opening parenthesis then match 10 times what is listed in the character class followed by a closing parenthesis and a newline
(?: Non capturing group
(?!# ) Negative lookahead, assert what is on the right is not # and a space
.*(?:\n|$) Match any char except newline and match either a newline or assert end of the string
)* Close non capturing group and repeat 0+ times
Regex demo
You can use the following regex:
(# \[.*\] \([0-9\-]{10}\)(\n|[^#]|###)*)`
This will match any text until the next hash (except if that hash is part of a group of three hashes ###) .
If you need to modify it for a varying number of hashes (strictly superior to 1), you could use
(# \[.*\] \([0-9\-]{10}\)(\n|[^#]|##+)*)
You may use
^\#\s+\[.+?(?=^\#\s+\[|\Z)
See a demo on regex101.com and mind the modifiers (singleline and multiline, s and m).
Broken down this is
^\#\s+\[ # start of the line, followed by "# ["
.+? # everything else afterwards until ...
(?=
^\#\s+\[ # ... the pattern from above right at the start of a new line
| # or
\Z # the very end of the string
)
The fastest way to go would be:
^#.*(\r?\n(?!# ).*)+
To make it more precise:
^# \[\d.*(?:\r?\n(?!# ).*)+
See live demo here

PowerShell -replace to get string between two different characters

I am current using split to get what I need, but I am hoping I can use a better way in powershell.
Here is the string:
server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000
I want to get the server and database with out the database= or the server=
here is the method I am currently using and this is what I am currently doing:
$databaseserver = (($details.value).split(';')[0]).split('=')[1]
$database = (($details.value).split(';')[1]).split('=')[1]
This outputs to:
ss8.server.com
CSSDatabase
I would like it to be as simple as possible.
Thank you in advance
Replacing approach
You may use the following regex replace:
$s = 'server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000'
$dbserver = $s -replace '^server=([^;]+).*', '$1'
$db = $s -replace '^[^;]*;database=([^;]+).*', '$1'
The technique is to match and capture (with (...)) what we need and just match what we need to remove.
Pattern details:
^ - start of the line
server= - a literal substring
([^;]+) - Group 1 (what $1 refers to) matching 1+ chars other than ;
.* - any 0+ chars other than a newline, as many as possible
Pattern 2 is almost the same, the capturing group is shifted a bit to capture another detail, and some more literal values are added to match the right context.
Note: if the values you need to extract may appear anywhere in the string, replace ^ in the first one and ^[^;]*; pattern in the second one with .*?\b (any 0+ chars other than a newline, as few as possible followed with a word boundary).
Matching approach
With a -match, you may do it the following way:
$s -match '^server=(.+?);database=([^;]+)'
The $Matches[1] will contain the server details and $Matches[2] will hold the DB info:
Name Value
---- -----
2 CSSDatabase
1 ss8.server.com
0 server=ss8.server.com;database=CSSDatabase
Pattern details
^ - start of string
server= - literal substring
(.+?) - Group 1: any 1+ non-linebreak chars as few as possible
;database= - literal substring
([^;]+) - 1+ chars other than ;
Another solution with a RegEx and named capture groups, similar to Wiktor's Matching Approach.
$s = 'server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000'
$RegEx = '^server=(?<databaseserver>[^;]+);database=(?<database>[^;]+)'
if ($s -match $RegEx){
$Matches.databaseserver
$Matches.database
}

Selecting if no delimiter, and no selecting if it is

I have string like "smth 2sg. smth", and sometimes "smth 2sg.| smth.".
What mask should I use for selecting "2sg." if string does not contains"|", and select nothing if string does contains "|"?
I have 2 methods. They both use something called a Negative Lookahead, which is used like so:
(?!data)
When this is inserted into a RegEx, it means if data exists, the RegEx will not match.
More info on the Negative Lookahead can be found here
Method 1 (shorter)
Just capture 2sg.
Try this RegEx:
(\dsg\.)(?!\|)
Use (\d+... if the number could be longer than 1 digit
Live Demo on RegExr
How it works:
( # To capture (2sg.)
\d # Digit (2)
sg # (sg)
\. # . (Dot)
)
(?!\|) # Do not match if contains |
Method 2 (longer but safer)
Match the whole string and capture 2sg.
Try this RegEx:
^\w+\s*(\dsg\.)(?!\|)\s*\w+\.?$
Use (\d+sg... if the number could be longer than 1 digit
Live Demo on RegExr
How it works:
^ # String starts with ...
\w+\s* # Letters then Optional Whitespace (smth )
( # To capture (2sg.)
\d # Digit (2)
sg # (sg)
\. # . (Dot)
)
(?!\|) # Do not match if contains |
\s* # Optional Whitespace
\w+ # Letters (smth)
\.? # Optional . (Dot)
$ # ... Strings ends with
Something like this might work for you:
(\d*sg\.)(?!\|)
It assumes that there is(or there is no)number followed by sg. and not followed by |.
^.*(\dsg\.)[^\|]*$
Explanation:
^ : starts from the beginning of the string
.* : accepts any number of initial characters (even nothing)
(\dsg\.) : looks for the group of digit + "sg."
[^\|]* : considers any number of following characters except for |
$ : stops at the end of the string
You can now select your string by getting the first group from your regex
Try:
(\d+sg.(?!\|))
depending on your programming environment, it can be little bit different but will get your result.
For more information see Negative Lookahead