PowerShell Regular Expression match Y or Z - regex

I am trying to match some strings using a regular expression in PowerShell but due to the differing format of the original string that I'm extracting from, encountering difficulty. I admittedly am not very strong with creating regular expressions.
I need to extract the numbers from each of these strings. These can vary in length but in both cases will be preceded by Foo
PC1-FOO1234567
PC2-FOO1234567/FOO98765
This works for the second example:
'PC2-FOO1234567/FOO98765' -match 'FOO(.*?)\/FOO(.*?)\z'
It lets me access the matched strings using $matches[1] and $matches[2] which is great.
It obviously doesn't work for the first example. I suspect I need some way to match on either / or the end of the string but I'm not sure how to do this and end up with my desired match.
Suggestions?

You may use
'FOO(.*?)(?:/FOO(.*))?$'
It will match FOO, then capture any 0 or more chars as few as possible into Group 1 and then will attempt to optionally match a sequence of patterns: /FOO, any 0 or more chars as many as possible captured into Group 2 and then the end of string position should follow.
See the regex demo
Details
FOO - literal substring
(.*?) - Group 1: any zero or more chars other than newline, as few as possible
(?:/FOO(.*))? - an optional non-capturing group matching 1 or 0 repetitions of:
/FOO - a literal substring
(.*) - Group 2: any 0+ chars other than newline as many as possible (* is greedy)
$ - end of string.

[edit - removed the unneeded pipe to Where-Object. thanks to mklement0 for that! [*grin*]]
this is a somewhat different approach. it splits on the foo, then replaces the unwanted / with nothing, and finally filters out any string that contains letters.
the pure regex solutions others offered will likely be faster, but this may be slightly easier to understand - and therefore to maintain. [grin]
# fake reading in a text file
# in real life, use Get-Content
$InStuff = #'
PC1-FOO1234567
PC2-FOO1234567/FOO98765
'# -split [environment]::NewLine
$InStuff -split 'foo' -replace '/' -notmatch '[a-z]'
output ...
1234567
1234567
98765

To offer a more concise alternative with the -split operator, which obviates the need to access $Matches afterwards to extract the numbers:
PS> 'PC1-FOO1234568', 'PC2-FOO1234567/FOO98765' -split '(?:^PC\d+-|/)FOO' -ne ''
1234568 # single match from 1st input string
1234567 # first of 2 matches from 2nd input string
98765
Note: -split always returns a [string[]] array, even if only 1 string is returned; result strings from multiple input strings are combined into a single, flat array.
^PC\d+-|/ matches PC followed by 1 or more (+) digits (\d) at the start of the string (^) or (|) a / char., which matches both PC2-FOO at the beginning and /FOO.
(?:...), a non-capturing subexpression, must be used to prevent -split from including what the subexpression matched in the results array.
-ne '' filters out the empty elements that result from the input strings starting with a separator.
To learn more about the regex-based -split operator and in what ways it is more powerful than the string literal-based .NET String.Split() method, see this answer.

Related

Powershell - Should take only set of numbers from file name

I have a script that read a file name from path location and then he takes only the numbers and do something with them. Its working fine until I encounter with this situation.
For an example:
For the file name Patch_1348968.vip it takes the number 1348968.
In the case the file name is Patch_1348968_v1.zip it takes the number 13489681 that is wrong.
I am using this to fetch the numbers. In general it always start with patch_#####.vip with 7-8 digits so I want to take only the digits
before any sign like _ or -.
$PatchNumber = $file.Name -replace "[^0-9]" , ''
You can use
$PatchNumber = $file.Name -replace '.*[-_](\d+).*', '$1'
See the regex demo.
Details:
.* - any chars other than newline char as many as possible
[-_] - a - or _
(\d+) - Group 1 ($1): one or more digits
.* - any chars other than newline char as many as possible.
I suggest to use -match instead, so you don't have to think inverted:
if( $file.Name -match '\d+' ) {
$PatchNumber = $matches[0]
}
\d+ matches the first consecutive sequence of digits. The automatic variable $matches contains the full match at index 0, if the -match operator successfully matched the input string against the pattern.
If you want to be more specific, you could use a more complex pattern and extract the desired sub string using a capture group:
if( $file.Name -match '^Patch_(\d+)' ) {
$PatchNumber = $matches[1]
}
Here, the anchor ^ makes sure the match starts at the beginning of the input string, then Patch_ gets matched literally (case-insensitive), followed by a group of consecutive digits which gets captured () and can be extracted using $matches[1].
You can get an even more detailed explanation of the RegEx and the ability to experiment with it at regex101.com.

Perl Regular expression to replace the last matching string after dot

I have string $someString = "XXX.v2016.12.016". Now I am trying to replace the last three digits (after dot) by incrementing one (output: "XXX.v2016.12.017"). Does anyone have idea how to do this with regex?
This problem has two parts: Matching the digits after the last dot, and replacing/incrementing them.
It's possible to do this with s///:
$someString =~ s{\.([0-9]+)\z}{
my $n = $1;
"." . ++$n
}e;
The regex matches a dot, followed by 1 or more digits, followed by the end of the string. This takes care of matching the last digit group.
The replacement part of a substitution normally behaves like a double-quoted string, but with the e flag it turns into a block of code.
We assign the captured group of digits ($1) to a temporary variable, $n. This is because we want to use the increment operator ++ on it, not just add 1. The ++ operator is a bit special in that it handles strings: For numeric strings it preserves leading zeroes, for example.
The return value of the replacement block is a string consisting of a . (to replace the one we matched), followed by the incremented digit string.
$someString =~ s{\.([0-9]+)\z}{ sprintf ".%03d", $1 + 1 }e;
If you don't want to hardcod the length (maybe because it varies), you can use the following:
$someString =~ s{\.([0-9]+)\z}{ sprintf ".%0*d", length($1), $1 + 1 }e;
In both cases, you can use \K to avoid having to re-add the ., but it actually makes the solution slightly longer.

PowerShell -replace to get string between two different characters

I am current using split to get what I need, but I am hoping I can use a better way in powershell.
Here is the string:
server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000
I want to get the server and database with out the database= or the server=
here is the method I am currently using and this is what I am currently doing:
$databaseserver = (($details.value).split(';')[0]).split('=')[1]
$database = (($details.value).split(';')[1]).split('=')[1]
This outputs to:
ss8.server.com
CSSDatabase
I would like it to be as simple as possible.
Thank you in advance
Replacing approach
You may use the following regex replace:
$s = 'server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000'
$dbserver = $s -replace '^server=([^;]+).*', '$1'
$db = $s -replace '^[^;]*;database=([^;]+).*', '$1'
The technique is to match and capture (with (...)) what we need and just match what we need to remove.
Pattern details:
^ - start of the line
server= - a literal substring
([^;]+) - Group 1 (what $1 refers to) matching 1+ chars other than ;
.* - any 0+ chars other than a newline, as many as possible
Pattern 2 is almost the same, the capturing group is shifted a bit to capture another detail, and some more literal values are added to match the right context.
Note: if the values you need to extract may appear anywhere in the string, replace ^ in the first one and ^[^;]*; pattern in the second one with .*?\b (any 0+ chars other than a newline, as few as possible followed with a word boundary).
Matching approach
With a -match, you may do it the following way:
$s -match '^server=(.+?);database=([^;]+)'
The $Matches[1] will contain the server details and $Matches[2] will hold the DB info:
Name Value
---- -----
2 CSSDatabase
1 ss8.server.com
0 server=ss8.server.com;database=CSSDatabase
Pattern details
^ - start of string
server= - literal substring
(.+?) - Group 1: any 1+ non-linebreak chars as few as possible
;database= - literal substring
([^;]+) - 1+ chars other than ;
Another solution with a RegEx and named capture groups, similar to Wiktor's Matching Approach.
$s = 'server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000'
$RegEx = '^server=(?<databaseserver>[^;]+);database=(?<database>[^;]+)'
if ($s -match $RegEx){
$Matches.databaseserver
$Matches.database
}

Convert line uri to UK friendly format

I have downloaded a script from Microsoft which will allow us to take a string and convert it into a friendly format to display on user profiles.
The original string is tel:+441234123456;ext=3456.
What I need to do is convert it into a UK friendly format so that the converted string is 01234 123456.
The steps I think I need to take are :
Removing the tel:+44 and replacing with 0.
After first 4 digits add a space.
Finish the variable with the last 6 digits.
Remove the ;ext=3456
There was a similar process but for US suggested, unfortunately no knowing regex this goes over my head slightly!
$tel = $LineURI -replace 'tel:(\+1)([2-9]\d{2})([2-9]\d{2})(\d{4});ext=\d{4}','$1 ($2) $3-$4;'
this is a way using more than one -replace To simplify things at the cost of some performance:
$tel = $LineURI-replace 'tel:\+\d\d','0' -replace ';.+' -replace '(^.{5})','$1 '
A single regular expression should suffice:
PS C:\> 'tel:+441234123456;ext=3456' -replace '^tel:\+\d{2}(\d{4})(\d+);.*$', '0$1 $2'
01234 123456
Regular expression breakdown:
^tel:\+\d{2} matches a literal tel:+ followed by two digits at the beginning of the string (^).
(\d{4}) matches four subsequent digits. The parentheses group the match so that it can be referenced in the replacement as $1.
(\d+) matches the longest sequence of subsequent digits after the above, but at least one digit. This too is grouped by parentheses so that it can be referenced in the replacement as $2.
;.*$ matches the remainder of the string starting with a semicolon.

Match a number in a string with letters and numbers

I need to write a Perl regex to match numbers in a word with both letters and numbers.
Example: test123. I want to write a regex that matches only the number part and capture it
I am trying this \S*(\d+)\S* and it captures only the 3 but not 123.
Regex atoms will match as much as they can.
Initially, the first \S* matched "test123", but the regex engine had to backtrack to allow \d+ to match. The result is:
+------------------- Matches "test12"
| +-------------- Matches "3"
| | +--------- Matches ""
| | |
--- --- ---
\S* (\d+) \S*
All you need is:
my ($num) = "test123" =~ /(\d+)/;
It'll try to match at position 0, then position 1, ... until it finds a digit, then it will match as many digits it can.
The * in your regex are greedy, that's why they "eat" also numbers. Exactly what #Marc said, you don't need them.
perl -e '$_ = "qwe123qwe"; s/(\d+)/$numbers=$1/e; print $numbers . "\n";'
"something122320" =~ /(\d+)/ will return 122320; this is probably what you're trying to do ;)
\S matches any non-whitespace characters, including digits. You want \d+:
my ($number) = 'test123' =~ /(\d+)/;
Were it a case where a non-digit was required (say before, per your example), you could use the following non-greedy expressions:
/\w+?(\d+)/ or /\S+?(\d+)/
(The second one is more in tune with your \S* specification.)
Your expression satisfies any condition with one or more digits, and that may be what you want. It could be a string of digits surrounded by spaces (" 123 "), because the border between the last space and the first digit satisfies zero-or-more non-space, same thing is true about the border between the '3' and the following space.
Chances are that you don't need any specification and capturing the first digits in the string is enough. But when it's not, it's good to know how to specify expected patterns.
I think parentheses signify capture groups, which is exactly what you don't want. Remove them. You're looking for /\d+/ or /[0-9]+/