How do I capture the 2nd match for each line? - regex

Basically, I need to match with 1 per line but right now, my regex is matching 2 per line.
https://regex101.com/r/KmgGwS/8
My regex is looking for 2 slashes and it returns the string in between but the problem is my path has multiple slashes and I only need to match it with the 2nd match per each line
(?<=\\).*?(?=\\)
This is my PowerShell code:
if ( $_.PSPath -match ("(?<=::).*?(?=\\)")) {
$user = $matches.Values
}
For example:
Microsoft.PowerShell.Security\Certificate::CurrentUser\Root\CDD4EEAE6000AC7F40C3802C171E30148030C072
Microsoft.PowerShell.Security\Certificate::CurrentUser\Root\BE36A4562FB2EE05DBB3D32323ADF445084ED656
What my code does is it gets
Certificate::CurrentUserRoot
Certificate::CurrentUserRoot
but what I only really need is get the string to the 2nd match \ ___\ which is:
Root
Root

You could make use of an anchor ^ to assert the start of the string. Repeat 2 times matching not a backslash or a newline followed by a backslash.
Use a capturing group to match what follows;
^[^\\\r\n]*\\[^\\\r\n]*\\([^\\\r\n]+)
About the pattern
^ Start of string
[^\\\r\n]*\\[^\\\r\n]*\\ Match 2 times not \ or a newline, then \
( Capture group 1
[^\\\r\n]+ Match 1+ times not \ or a newline
) Close group 1
Regex demo | Try it online
The value is in the first capturing group:
$user = $matches[1]
If you want the match only to use your script instead of group 1, you could make use of a positive lookbehind to assert what is on the left is 2 times not \ followed by \
(?<=^[^\\\r\n]*\\[^\\\r\n]*\\)[^\\\r\n]+
Regex demo | Try it online

I'm guessing that, maybe an expression similar to,
(?<=\\)[^\\]*(?=\\[A-Z0-9]{40}$)
might be an option to look into.
Demo 1
Or maybe just,
[^\\]*(?=\\[A-Z0-9]{40}$)
or
[^\\]*(?=\\[A-F0-9]{40}$)
would simply return Root and 40 is the length of [A-F0-9] ending substring. For more flexible quantifier, this expression might work:
[^\\]*(?=\\[A-F0-9]*$)
Demo 2

To offer a pragmatic alternative using PowerShell's -split operator:
PS> 'Microsoft.PowerShell.Security\Certificate::CurrentUser\Root\CDD4EEAE6000AC7F40C3802C171E30148030C072',
'Microsoft.PowerShell.Security\Certificate::CurrentUser\Root\BE36A4562FB2EE05DBB3D32323ADF445084ED656' |
ForEach-Object { ($_ -split '[::|\\]')[4] }
Root
Root
The above tokenizes each input string by separators \ or ::, and extracts the 4th token.

Related

Powershell - Should take only set of numbers from file name

I have a script that read a file name from path location and then he takes only the numbers and do something with them. Its working fine until I encounter with this situation.
For an example:
For the file name Patch_1348968.vip it takes the number 1348968.
In the case the file name is Patch_1348968_v1.zip it takes the number 13489681 that is wrong.
I am using this to fetch the numbers. In general it always start with patch_#####.vip with 7-8 digits so I want to take only the digits
before any sign like _ or -.
$PatchNumber = $file.Name -replace "[^0-9]" , ''
You can use
$PatchNumber = $file.Name -replace '.*[-_](\d+).*', '$1'
See the regex demo.
Details:
.* - any chars other than newline char as many as possible
[-_] - a - or _
(\d+) - Group 1 ($1): one or more digits
.* - any chars other than newline char as many as possible.
I suggest to use -match instead, so you don't have to think inverted:
if( $file.Name -match '\d+' ) {
$PatchNumber = $matches[0]
}
\d+ matches the first consecutive sequence of digits. The automatic variable $matches contains the full match at index 0, if the -match operator successfully matched the input string against the pattern.
If you want to be more specific, you could use a more complex pattern and extract the desired sub string using a capture group:
if( $file.Name -match '^Patch_(\d+)' ) {
$PatchNumber = $matches[1]
}
Here, the anchor ^ makes sure the match starts at the beginning of the input string, then Patch_ gets matched literally (case-insensitive), followed by a group of consecutive digits which gets captured () and can be extracted using $matches[1].
You can get an even more detailed explanation of the RegEx and the ability to experiment with it at regex101.com.

RegEx omit optional prefix in UPN or displayName

I am trying to get only the "nonpersonalizedusername" including its number or the surname.
To add more detail, I'd like to accomplish something like:
If there's an #-Symbol, get me everything that is in front of that #-Symbol, otherwise get me the whole string.
Plus, if then there's a dot "." in it, get me everything after that dot.
Let's assume I have the following stringsof userPrincipalNames and/or displayNames:
nonpersonalizedusername004
nonpersonalizedusername019#domaina.local
prefixc.nonpersonalizedusername044#domaina.local
nonpersonalizedusername038#domainb.local
prefixa.nonpersonalizedusername002#domaina.local
prefixb.nonpersonalizedusername038#domainb.local
givenname.surname
givenname.surname#domaina.local
What I got so far is this expression:
^(?:.*?\.)?(.+?)(?:#.*)?$
but this only works, if there's an #-Symbol AND that "prefixing"-Dot in the string OR neither Dot nor #-Symbol.
If there's an #-Symbol, but no prefixing-dot, I'm getting only that "local"-part from the end.
https://regex101.com/r/1aflGH/1
You can use
^(?:[^#.]*\.)?([^#]+)(?:#.*)?$
See the regex demo. The \n is added to the negated character classes at regex101 as the test is run against a single multiline string.
Details:
^ - start of string
(?:[^#.]*\.)? - an optional sequence of any zero or more chars other than # and . and then a .
([^#]+) - Group 1: one or more chars other than # char
(?:#.*)? - an optional sequence of # and then the rest of the line
$ - end of string.
You might optionally repeat matches until the last dot before the #, and then capture the rest after that do till the # in group 1.
^(?:[^#.]*\.)*([^#.]+)
The pattern matches:
^ Start of string
(?: Non capture group
[^#.]*\. Optionally repeat matching any char except # or ., then match .
)* Close non capture group and optionally repeat
( Capture group 1
[^#.]+
) Close group 1
Regex demo
Powershell example
$s = #"
nonpersonalizedusername004
nonpersonalizedusername019#domaina.local
prefixc.nonpersonalizedusername044#domaina.local
nonpersonalizedusername038#domainb.local
prefixa.nonpersonalizedusername002#domaina.local
prefixb.nonpersonalizedusername038#domainb.local
givenname.surname
givenname.surname#domaina.local
"#
Select-String '(?m)^(?:[^#.\n]*\.)*([^#.\n]+)' -input $s -AllMatches | Foreach-Object {$_.Matches} | Foreach-Object {$_.Groups[1].Value}
Output
nonpersonalizedusername004
nonpersonalizedusername019
nonpersonalizedusername044
nonpersonalizedusername038
nonpersonalizedusername002
nonpersonalizedusername038
surname
surname

Renaming files with criteria

Need some advice. I'm trying to do something with regular expressions that might not be possible, and if it is possible it's over my head. I can't get anything to work. I'm trying to create a tagging system for my PDF files. So if I have this file name:
"csharp 8 in a nutshell[studying programming csharp ebooks].pdf"
I would like all the words inside the '[ ]' to have a '#' in from of them. So the above file name would look like this:
"csharp 8 in a nutshell[#studying #programming #csharp #ebooks].pdf"
The problem is keeping the '#' inside the '[ ]'. For example I'd rather the 'csharp' at the very front of the file name not have the '#'.
Also, I'm using a bulk renamer called 'Bulk Rename Utility' to help me.
Can this be done?
If it can, any hints on how?
Thanks.
Bulk Rename Utility does not support replacing multiple matches, you can only match the whole file name and perform replacements using capturing groups/backreferences.
Since you are using Windows, I suggest using Powershell:
cd 'C:\YOUR_FOLDER\HERE'
Get-ChildItem -File | Rename-Item -NewName { $_.Name -replace '(?<=\[[^][]*?)\w+(?=[^][]*])','#$&' }
See this regex demo and the proof it works with .NET regex flavor.
(?<=\[[^][]*?) - right before this location, there must be a [ and then any amount of chars other than [ and ], as few as possible
\w+ - 1+ word chars
(?=[^][]*]) - right after this location, there must be any amount of chars other than [ and ], as many as possible, and then a ] char.
The replacement is # + the whole match value ($&).
Also, you may use
Get-ChildItem -File | Rename-Item -NewName { $_.Name -replace '(\G(?!\A)[^][\w]+|\[)(\w+)','$1#$2' }
See this regex demo and .NET regex test.
(\G(?!\A)[^][\w]+|\[) - Group 1 ($1): either the end of the previous match and 1+ chars other than ], [ and word chars, or a [ char
(\w+) - Group 2 ($2): one or more word chars.
If you only want to rename *.pdf files, replace Get-ChildItem -File with Get-ChildItem *.pdf.
I assume there is at most one bracket-delimited substring.
You can replace zero-length matches of the following regular expression with '#' when using Perl (click "Perl" then check global and case-different options), Ruby, Python's alternative regex engine, R with perl=true or languages that uses the PCRE regex engine, which includes PHP. With the exception of Ruby, the case-different (\i) and general (\g) flags need be set. Ruby only requires the case-indifferent flag.
r = /(?:^.*\[ *|\G(?<!^)|[a-z]+ +)\K(?<=\[| )(?=[a-z][^\[\]]*\])/
If using Ruby, for example, one would execute
str = "csharp 8 in a nutshell[studying programming csharp ebooks].pdf"
str.gsub(r,'#')
#=> "csharp 8 in a nutshell[#studying #programming #csharp #ebooks].pdf"
I believe all of the languages I named above allow one to run a short script from the command line. (I provide a Ruby script below.)
The regex engine performs the following operations.
(?: : begin non-capture group
^.*\[ * : match beginning of string then 0+ characters then '['
then 0+ spaces
| : or
\G : asserts the position at the end of the previous match
or at the start of the string for the first match
(?<!^) : use a negative lookbehind to assert that the current
location is not the start of the string
| : or
[a-z]+ + : match 1+ letters then 1+ spaces
) : end non-capture group
\K : reset beginning of reported match to current location
and discard all previously-matched characters from match
to be returned
(?<= : begin positive lookbehind
\[|[ ] : match '[' or a space
) : end positive lookbehind
(?= : begin positive lookahead
[a-z][^\[\]]*\] : match a letter then 0+ characters other than '[' and ']'
then ']'
) : end positive lookahead
Another possibility (illustrated with Ruby) is to break the string into three pieces, modify the middle one, then rejoin the pieces:
first, mid, last = str.split /(?<=\[)|(?=\])/
#=> ["csharp 8 in a nutshell[",
# "studying programming csharp ebooks",
# "].pdf"]
first + mid.gsub(/(?<=\A| )(?! )/,'#') + last
#=> "csharp 8 in a nutshell[#studying #programming #csharp #ebooks].pdf"
The regex used by split reads, "match a (zero-width) string that is preceded by '[' ((?<=\[) being a positive lookbehind) or is followed by ']' ((?=\]) being a positive lookahead.) By matching zero-width strings split does not remove any characters.
gsub's regex reads, "match a zero-width string that is at the start of the string or is preceded by a space and is followed by a character other than a space ((?! ) being a negative lookahead). It could alternatively be written /(?<![^ ])(?! )/ ((?<![^ ]) being a negative lookbehind).
A variant:
first + mid.split.map { |s| '#' + s }.join(' ') + last
#=> "csharp 8 in a nutshell[#studying #programming #csharp #ebooks].pdf"
I created a file named 'in' that contains the following two lines:
Little [Miss Muffet sat on her] tuffet
eating her [curds and] whey
Here is an example of a (Ruby) script that could be run from the command line to perform the necessary replacements.
ruby -e "File.open('out', 'w') do |fout|
File.foreach('in') do |str|
first, mid, last = str.split(/(?<=\[)|(?=\])/)
fout.puts(first + mid.gsub(/(?<=\A| )(?! )/,'#') + last)
end
end"
This produces a file named 'out' that contains these two lines:
Little [#Miss #Muffet #sat #on #her] tuffet
eating her [#curds #and] whey

PowerShell -replace to get string between two different characters

I am current using split to get what I need, but I am hoping I can use a better way in powershell.
Here is the string:
server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000
I want to get the server and database with out the database= or the server=
here is the method I am currently using and this is what I am currently doing:
$databaseserver = (($details.value).split(';')[0]).split('=')[1]
$database = (($details.value).split(';')[1]).split('=')[1]
This outputs to:
ss8.server.com
CSSDatabase
I would like it to be as simple as possible.
Thank you in advance
Replacing approach
You may use the following regex replace:
$s = 'server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000'
$dbserver = $s -replace '^server=([^;]+).*', '$1'
$db = $s -replace '^[^;]*;database=([^;]+).*', '$1'
The technique is to match and capture (with (...)) what we need and just match what we need to remove.
Pattern details:
^ - start of the line
server= - a literal substring
([^;]+) - Group 1 (what $1 refers to) matching 1+ chars other than ;
.* - any 0+ chars other than a newline, as many as possible
Pattern 2 is almost the same, the capturing group is shifted a bit to capture another detail, and some more literal values are added to match the right context.
Note: if the values you need to extract may appear anywhere in the string, replace ^ in the first one and ^[^;]*; pattern in the second one with .*?\b (any 0+ chars other than a newline, as few as possible followed with a word boundary).
Matching approach
With a -match, you may do it the following way:
$s -match '^server=(.+?);database=([^;]+)'
The $Matches[1] will contain the server details and $Matches[2] will hold the DB info:
Name Value
---- -----
2 CSSDatabase
1 ss8.server.com
0 server=ss8.server.com;database=CSSDatabase
Pattern details
^ - start of string
server= - literal substring
(.+?) - Group 1: any 1+ non-linebreak chars as few as possible
;database= - literal substring
([^;]+) - 1+ chars other than ;
Another solution with a RegEx and named capture groups, similar to Wiktor's Matching Approach.
$s = 'server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000'
$RegEx = '^server=(?<databaseserver>[^;]+);database=(?<database>[^;]+)'
if ($s -match $RegEx){
$Matches.databaseserver
$Matches.database
}

Regex will not match all of my patterns

I have been trying to get this to work and I am nearly there but can quite get the last match. This is the regex im using:
^`.*` (.*?)(\(.*?\))?\s
These are some examples of the patterns I'm trying to match
1.`asgKey` tinyblob
2.`is_asg` bit(1) DEFAULT NULL
3.`lastModified` datetime DEFAULT NULL
This regex will match 2 and 3 but not 1. I have tried adding ? and * to the space char but it then doesnt match anything. I think I am misunderstanding the matching groups
(.*?) - match any number of characters
(\(.*?\))? - if there are brackets match anything inside them else ignore
\s - space character
group 1 is the string group 2 is the contents of the brackets if they exist
You're matching them one at a time, right? Then what's the \s meant to match for #1?
`asgKey` tinyblob
^ ^ ^^ ^
| | || |
` .* ` (.*?)
There's nothing left, so \s can't match. Maybe you want (?:\s|$) to match a space or EOL.
That said, consider using (\S+) instead of (.*?), as it'll only match non-spaces, and thus will do the same thing, but faster.