Powershell - Should take only set of numbers from file name - regex

I have a script that read a file name from path location and then he takes only the numbers and do something with them. Its working fine until I encounter with this situation.
For an example:
For the file name Patch_1348968.vip it takes the number 1348968.
In the case the file name is Patch_1348968_v1.zip it takes the number 13489681 that is wrong.
I am using this to fetch the numbers. In general it always start with patch_#####.vip with 7-8 digits so I want to take only the digits
before any sign like _ or -.
$PatchNumber = $file.Name -replace "[^0-9]" , ''

You can use
$PatchNumber = $file.Name -replace '.*[-_](\d+).*', '$1'
See the regex demo.
Details:
.* - any chars other than newline char as many as possible
[-_] - a - or _
(\d+) - Group 1 ($1): one or more digits
.* - any chars other than newline char as many as possible.

I suggest to use -match instead, so you don't have to think inverted:
if( $file.Name -match '\d+' ) {
$PatchNumber = $matches[0]
}
\d+ matches the first consecutive sequence of digits. The automatic variable $matches contains the full match at index 0, if the -match operator successfully matched the input string against the pattern.
If you want to be more specific, you could use a more complex pattern and extract the desired sub string using a capture group:
if( $file.Name -match '^Patch_(\d+)' ) {
$PatchNumber = $matches[1]
}
Here, the anchor ^ makes sure the match starts at the beginning of the input string, then Patch_ gets matched literally (case-insensitive), followed by a group of consecutive digits which gets captured () and can be extracted using $matches[1].
You can get an even more detailed explanation of the RegEx and the ability to experiment with it at regex101.com.

Related

RegEx omit optional prefix in UPN or displayName

I am trying to get only the "nonpersonalizedusername" including its number or the surname.
To add more detail, I'd like to accomplish something like:
If there's an #-Symbol, get me everything that is in front of that #-Symbol, otherwise get me the whole string.
Plus, if then there's a dot "." in it, get me everything after that dot.
Let's assume I have the following stringsof userPrincipalNames and/or displayNames:
nonpersonalizedusername004
nonpersonalizedusername019#domaina.local
prefixc.nonpersonalizedusername044#domaina.local
nonpersonalizedusername038#domainb.local
prefixa.nonpersonalizedusername002#domaina.local
prefixb.nonpersonalizedusername038#domainb.local
givenname.surname
givenname.surname#domaina.local
What I got so far is this expression:
^(?:.*?\.)?(.+?)(?:#.*)?$
but this only works, if there's an #-Symbol AND that "prefixing"-Dot in the string OR neither Dot nor #-Symbol.
If there's an #-Symbol, but no prefixing-dot, I'm getting only that "local"-part from the end.
https://regex101.com/r/1aflGH/1
You can use
^(?:[^#.]*\.)?([^#]+)(?:#.*)?$
See the regex demo. The \n is added to the negated character classes at regex101 as the test is run against a single multiline string.
Details:
^ - start of string
(?:[^#.]*\.)? - an optional sequence of any zero or more chars other than # and . and then a .
([^#]+) - Group 1: one or more chars other than # char
(?:#.*)? - an optional sequence of # and then the rest of the line
$ - end of string.
You might optionally repeat matches until the last dot before the #, and then capture the rest after that do till the # in group 1.
^(?:[^#.]*\.)*([^#.]+)
The pattern matches:
^ Start of string
(?: Non capture group
[^#.]*\. Optionally repeat matching any char except # or ., then match .
)* Close non capture group and optionally repeat
( Capture group 1
[^#.]+
) Close group 1
Regex demo
Powershell example
$s = #"
nonpersonalizedusername004
nonpersonalizedusername019#domaina.local
prefixc.nonpersonalizedusername044#domaina.local
nonpersonalizedusername038#domainb.local
prefixa.nonpersonalizedusername002#domaina.local
prefixb.nonpersonalizedusername038#domainb.local
givenname.surname
givenname.surname#domaina.local
"#
Select-String '(?m)^(?:[^#.\n]*\.)*([^#.\n]+)' -input $s -AllMatches | Foreach-Object {$_.Matches} | Foreach-Object {$_.Groups[1].Value}
Output
nonpersonalizedusername004
nonpersonalizedusername019
nonpersonalizedusername044
nonpersonalizedusername038
nonpersonalizedusername002
nonpersonalizedusername038
surname
surname

Renaming files with criteria

Need some advice. I'm trying to do something with regular expressions that might not be possible, and if it is possible it's over my head. I can't get anything to work. I'm trying to create a tagging system for my PDF files. So if I have this file name:
"csharp 8 in a nutshell[studying programming csharp ebooks].pdf"
I would like all the words inside the '[ ]' to have a '#' in from of them. So the above file name would look like this:
"csharp 8 in a nutshell[#studying #programming #csharp #ebooks].pdf"
The problem is keeping the '#' inside the '[ ]'. For example I'd rather the 'csharp' at the very front of the file name not have the '#'.
Also, I'm using a bulk renamer called 'Bulk Rename Utility' to help me.
Can this be done?
If it can, any hints on how?
Thanks.
Bulk Rename Utility does not support replacing multiple matches, you can only match the whole file name and perform replacements using capturing groups/backreferences.
Since you are using Windows, I suggest using Powershell:
cd 'C:\YOUR_FOLDER\HERE'
Get-ChildItem -File | Rename-Item -NewName { $_.Name -replace '(?<=\[[^][]*?)\w+(?=[^][]*])','#$&' }
See this regex demo and the proof it works with .NET regex flavor.
(?<=\[[^][]*?) - right before this location, there must be a [ and then any amount of chars other than [ and ], as few as possible
\w+ - 1+ word chars
(?=[^][]*]) - right after this location, there must be any amount of chars other than [ and ], as many as possible, and then a ] char.
The replacement is # + the whole match value ($&).
Also, you may use
Get-ChildItem -File | Rename-Item -NewName { $_.Name -replace '(\G(?!\A)[^][\w]+|\[)(\w+)','$1#$2' }
See this regex demo and .NET regex test.
(\G(?!\A)[^][\w]+|\[) - Group 1 ($1): either the end of the previous match and 1+ chars other than ], [ and word chars, or a [ char
(\w+) - Group 2 ($2): one or more word chars.
If you only want to rename *.pdf files, replace Get-ChildItem -File with Get-ChildItem *.pdf.
I assume there is at most one bracket-delimited substring.
You can replace zero-length matches of the following regular expression with '#' when using Perl (click "Perl" then check global and case-different options), Ruby, Python's alternative regex engine, R with perl=true or languages that uses the PCRE regex engine, which includes PHP. With the exception of Ruby, the case-different (\i) and general (\g) flags need be set. Ruby only requires the case-indifferent flag.
r = /(?:^.*\[ *|\G(?<!^)|[a-z]+ +)\K(?<=\[| )(?=[a-z][^\[\]]*\])/
If using Ruby, for example, one would execute
str = "csharp 8 in a nutshell[studying programming csharp ebooks].pdf"
str.gsub(r,'#')
#=> "csharp 8 in a nutshell[#studying #programming #csharp #ebooks].pdf"
I believe all of the languages I named above allow one to run a short script from the command line. (I provide a Ruby script below.)
The regex engine performs the following operations.
(?: : begin non-capture group
^.*\[ * : match beginning of string then 0+ characters then '['
then 0+ spaces
| : or
\G : asserts the position at the end of the previous match
or at the start of the string for the first match
(?<!^) : use a negative lookbehind to assert that the current
location is not the start of the string
| : or
[a-z]+ + : match 1+ letters then 1+ spaces
) : end non-capture group
\K : reset beginning of reported match to current location
and discard all previously-matched characters from match
to be returned
(?<= : begin positive lookbehind
\[|[ ] : match '[' or a space
) : end positive lookbehind
(?= : begin positive lookahead
[a-z][^\[\]]*\] : match a letter then 0+ characters other than '[' and ']'
then ']'
) : end positive lookahead
Another possibility (illustrated with Ruby) is to break the string into three pieces, modify the middle one, then rejoin the pieces:
first, mid, last = str.split /(?<=\[)|(?=\])/
#=> ["csharp 8 in a nutshell[",
# "studying programming csharp ebooks",
# "].pdf"]
first + mid.gsub(/(?<=\A| )(?! )/,'#') + last
#=> "csharp 8 in a nutshell[#studying #programming #csharp #ebooks].pdf"
The regex used by split reads, "match a (zero-width) string that is preceded by '[' ((?<=\[) being a positive lookbehind) or is followed by ']' ((?=\]) being a positive lookahead.) By matching zero-width strings split does not remove any characters.
gsub's regex reads, "match a zero-width string that is at the start of the string or is preceded by a space and is followed by a character other than a space ((?! ) being a negative lookahead). It could alternatively be written /(?<![^ ])(?! )/ ((?<![^ ]) being a negative lookbehind).
A variant:
first + mid.split.map { |s| '#' + s }.join(' ') + last
#=> "csharp 8 in a nutshell[#studying #programming #csharp #ebooks].pdf"
I created a file named 'in' that contains the following two lines:
Little [Miss Muffet sat on her] tuffet
eating her [curds and] whey
Here is an example of a (Ruby) script that could be run from the command line to perform the necessary replacements.
ruby -e "File.open('out', 'w') do |fout|
File.foreach('in') do |str|
first, mid, last = str.split(/(?<=\[)|(?=\])/)
fout.puts(first + mid.gsub(/(?<=\A| )(?! )/,'#') + last)
end
end"
This produces a file named 'out' that contains these two lines:
Little [#Miss #Muffet #sat #on #her] tuffet
eating her [#curds #and] whey

PowerShell Regular Expression match Y or Z

I am trying to match some strings using a regular expression in PowerShell but due to the differing format of the original string that I'm extracting from, encountering difficulty. I admittedly am not very strong with creating regular expressions.
I need to extract the numbers from each of these strings. These can vary in length but in both cases will be preceded by Foo
PC1-FOO1234567
PC2-FOO1234567/FOO98765
This works for the second example:
'PC2-FOO1234567/FOO98765' -match 'FOO(.*?)\/FOO(.*?)\z'
It lets me access the matched strings using $matches[1] and $matches[2] which is great.
It obviously doesn't work for the first example. I suspect I need some way to match on either / or the end of the string but I'm not sure how to do this and end up with my desired match.
Suggestions?
You may use
'FOO(.*?)(?:/FOO(.*))?$'
It will match FOO, then capture any 0 or more chars as few as possible into Group 1 and then will attempt to optionally match a sequence of patterns: /FOO, any 0 or more chars as many as possible captured into Group 2 and then the end of string position should follow.
See the regex demo
Details
FOO - literal substring
(.*?) - Group 1: any zero or more chars other than newline, as few as possible
(?:/FOO(.*))? - an optional non-capturing group matching 1 or 0 repetitions of:
/FOO - a literal substring
(.*) - Group 2: any 0+ chars other than newline as many as possible (* is greedy)
$ - end of string.
[edit - removed the unneeded pipe to Where-Object. thanks to mklement0 for that! [*grin*]]
this is a somewhat different approach. it splits on the foo, then replaces the unwanted / with nothing, and finally filters out any string that contains letters.
the pure regex solutions others offered will likely be faster, but this may be slightly easier to understand - and therefore to maintain. [grin]
# fake reading in a text file
# in real life, use Get-Content
$InStuff = #'
PC1-FOO1234567
PC2-FOO1234567/FOO98765
'# -split [environment]::NewLine
$InStuff -split 'foo' -replace '/' -notmatch '[a-z]'
output ...
1234567
1234567
98765
To offer a more concise alternative with the -split operator, which obviates the need to access $Matches afterwards to extract the numbers:
PS> 'PC1-FOO1234568', 'PC2-FOO1234567/FOO98765' -split '(?:^PC\d+-|/)FOO' -ne ''
1234568 # single match from 1st input string
1234567 # first of 2 matches from 2nd input string
98765
Note: -split always returns a [string[]] array, even if only 1 string is returned; result strings from multiple input strings are combined into a single, flat array.
^PC\d+-|/ matches PC followed by 1 or more (+) digits (\d) at the start of the string (^) or (|) a / char., which matches both PC2-FOO at the beginning and /FOO.
(?:...), a non-capturing subexpression, must be used to prevent -split from including what the subexpression matched in the results array.
-ne '' filters out the empty elements that result from the input strings starting with a separator.
To learn more about the regex-based -split operator and in what ways it is more powerful than the string literal-based .NET String.Split() method, see this answer.

How to detect the character before a number in RegEx

I have a string test_demo_0.1.1.
I want in PowerShell script to add before the 0.1.1 some text, for example: test_demo_shay_0.1.1.
I succeeded to detect the first number with RegEx and add the text:
$str = "test_demo_0.1.1"
if ($str - match "(?<number>\d)")
{
$newStr = $str.Insert($str.IndexOf($Matches.number) - 1, "_shay")-
}
# $newStr = test_demo_shay_0.1.1
The problem is, sometimes my string includes a number in another location, for example: test_demo2_0.1.1 (and then the insert is not good).
So I want to detect the first number which the character before is _, how can I do it?
I tried "(_<number>\d)" and "([_]<number>\d)" but it doesn't work.
What you ask for is called a positive lookbehind (a construct that checks for the presence of some pattern immediately to the left of thew current location):
"(?<=_)(?<number>\d)"
^^^^^^
However, it seems all you want is to insert _shay before the first digit preceded with _. A replace operation will suit here best:
$str -replace '_(\d.*)', '_shay_$1'
Result: test_demo_shay_0.1.1.
Details
_ - an underscore
(\d.*) - Capturing group #1: a digit and then any 0+ chars to the end of the line.
The $1 in the replacement pattern is the contents matched by the capturing group #1.

PowerShell -replace to get string between two different characters

I am current using split to get what I need, but I am hoping I can use a better way in powershell.
Here is the string:
server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000
I want to get the server and database with out the database= or the server=
here is the method I am currently using and this is what I am currently doing:
$databaseserver = (($details.value).split(';')[0]).split('=')[1]
$database = (($details.value).split(';')[1]).split('=')[1]
This outputs to:
ss8.server.com
CSSDatabase
I would like it to be as simple as possible.
Thank you in advance
Replacing approach
You may use the following regex replace:
$s = 'server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000'
$dbserver = $s -replace '^server=([^;]+).*', '$1'
$db = $s -replace '^[^;]*;database=([^;]+).*', '$1'
The technique is to match and capture (with (...)) what we need and just match what we need to remove.
Pattern details:
^ - start of the line
server= - a literal substring
([^;]+) - Group 1 (what $1 refers to) matching 1+ chars other than ;
.* - any 0+ chars other than a newline, as many as possible
Pattern 2 is almost the same, the capturing group is shifted a bit to capture another detail, and some more literal values are added to match the right context.
Note: if the values you need to extract may appear anywhere in the string, replace ^ in the first one and ^[^;]*; pattern in the second one with .*?\b (any 0+ chars other than a newline, as few as possible followed with a word boundary).
Matching approach
With a -match, you may do it the following way:
$s -match '^server=(.+?);database=([^;]+)'
The $Matches[1] will contain the server details and $Matches[2] will hold the DB info:
Name Value
---- -----
2 CSSDatabase
1 ss8.server.com
0 server=ss8.server.com;database=CSSDatabase
Pattern details
^ - start of string
server= - literal substring
(.+?) - Group 1: any 1+ non-linebreak chars as few as possible
;database= - literal substring
([^;]+) - 1+ chars other than ;
Another solution with a RegEx and named capture groups, similar to Wiktor's Matching Approach.
$s = 'server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000'
$RegEx = '^server=(?<databaseserver>[^;]+);database=(?<database>[^;]+)'
if ($s -match $RegEx){
$Matches.databaseserver
$Matches.database
}