How do parse this string in Powershell? - regex

I have a block of text I need to parse (saved in a variable) but I'm unsure how to go about it. This block of text, saved in a variable we can call $block for simplicity's sake, includes all the whitespace shown below.
I would like the result to be an iterable list, the first value being Health_AEPOEP_Membership_Summary - Dev and the second one being Health_AEPOEP_YoY_Comparison_Summary - Dev. Assume this list of workbooks can be longer (up to 50) or shorter (minimum 1 workbook), and all workbooks are formatted similarly (in terms of name_with_underscores - Dev. I'd try the $block.split(" ") method, but this method gives many spaces which may be hard to enumerate and account for.
Workbooks : Health_AEPOEP_Membership_Summary - Dev [Project: Health - Dev]
Health_AEPOEP_YoY_Comparison_Summary - Dev [Project: Health - Dev]
Any help is much appreciated!

You could write a multi-line regex pattern and try to extract the names, but it might be easier to reason about if you just breaking it into simple(r) steps:
$string = #'
Workbooks : Health_AEPOEP_Membership_Summary - Dev [Project: Health - Dev]
Health_AEPOEP_YoY_Comparison_Summary - Dev [Project: Health - Dev]
'#
# Split into one string per line
$strings = $string -split '\r?\n'
# Remove leading whitespace
$strings = $strings -replace '^\s*'
# Remove `Workbooks : ` prefix (strings that don't match will be left untouched)
$strings = $strings -replace '^Workbooks :\s*'
# Remove `[Project $NAME]` suffix
$strings = $strings -replace '\s*\[Project: [^\]]+\]'
# Get rid of empty lines
$strings = $strings |Where-Object Length
$strings now contains the two project names

If the text is in a file it would make this a little easier, and I would recommend this approach
switch -Regex -file ($file){
'(\w+_.+- Dev)' {$matches.1}
}
Regex details
() - capture group
\w+ - match one or more letter characters
_ - match literal underscore
.+ - match one or more of any character
- Dev - literal match of dash space Dev
If it's already in a variable, it would depend if it's a string array or a single string. Assuming it's a single string, I'd recommend this approach
$regex = [regex]'(\w+_.+)(?=(\s\[.+))'
$regex.Matches($block).value
Health_AEPOEP_Membership_Summary - Dev
Health_AEPOEP_YoY_Comparison_Summary - Dev
Regex details
Same as above but added the following
(?=) - Look ahead
\s\[.+ - match a space, a left square bracket, one or more characters
Simply add a variable assignment $strings = before either of these to capture the output. Either would work on one or 500 workbooks.

Related

PwSh RegEx capture Version information - omit (surrounding) x(32|64|86) & 32|64(-)Bit

I'm pulling my hair, to RegEx-tract the bare version information from some filenames.
e.g. "1.2.3.4"
Let's assume, I have the following Filenames:
VendorSetup-x64-1.23.4.exe
VendorSetup-1-2-3-4.exe
Vendor Setup 1.23.456Update.exe
SoftwareName-1.2.34.5-x64.msi
SoftwareName-1.2.3.4-64bit.msi
SoftwareName-64-Bit-1.2.3.4.msi
VendorName_SoftwareName_64_1.2.3_Setup.exe
(And I know there are still some filenames out there, that have "x32" as well as "x86" in them, so I've added them to the title)
First of all, I replaced the _'s & -'s by .'s which I'd like to avoid in general, but haven't found a cleverer approach and to be honest - only works well if there's no other "digit"-information in the String for example like the 2nd Filename.
I then tried to extract the Version information using Regex like
-replace '^(?:\D+)?(\d+((\.\d+){1,4})?)(?:.*)?', '$1'
Which lacks the ability to omit "x64", "64Bit", "64-Bit" or any variation of that generally.
Additionally, I played around with RegExes like
-replace '^(?:[xX]*\d{2})?(?:\D+)?(\d+((\.\d+){1,4})?)(?:.*)?$', '$1'
to try to omit a leading "x64" or "64", but with no success (most probably because of the replacement from -'s to .'s.
And before it gets even worse, I'd like to ask if there's anybody who could help me or lead me in the right direction?
Thanks in advance!
This could be done using a single pattern, but by splitting it up into two separate patterns and let PowerShell do some of the work, the overall solution can be much easier.
Pattern 1 matches version numbers that are separated by . (dot):
(?<=[\s_-])\d+(?:\.\d+){1,3}
Pattern 2 matches version numbers that are separated by - (dash):
(?<=[\s_-])\d+(?:-\d+){1,3}
The patterns start with (?<=[\s_-]) which is a positive lookbehind assertion that makes sure that the version is separated by space, underscore or dash on the left side, without including these in the captured value. This prevents sub string 64-1 from the first sample to match as a version.
Detailed explanations of the pattern can be found at regex101.
Powershell code:
# Create an array of sample filenames
$names = #'
VendorSetup-2022-05-x64-1.23.4.exe
VendorSetup-x64-1.23.4-2022-05.exe
VendorSetup-1-2-3-4.exe
VendorSetup_2022-05_1-2-3-4.exe
Vendor Setup 1.23.456Update.exe
SoftwareName-1.2.34.5-x64.msi
SoftwareName-1.2.3.4-64bit.msi
SoftwareName-64-Bit-1.2.3.4.msi
VendorName_SoftwareName_64_1.2.3_Setup.exe
NoVersion.exe
'# -split '\r?\n'
# Array of RegEx patterns in order of precedence.
$versionPatterns = '(?<=[\s_-])\d+(?:\.\d+){1,3}', # 2..4 numbers separated by '.'
'(?<=[\s_-])\d+(?:-\d+){1,3}' # 2..4 numbers separated by '-'
foreach( $name in $names ) {
$version = $versionPatterns.
ForEach{ [regex]::Match( $name, $_, 'RightToLeft' ).Value }. # Apply each pattern from right to left of string.
Where({ $_ }, 'First'). # Get first matching pattern (non-empty value).
ForEach{ $_ -replace '\D+', '.' }[0] # Normalize the number separator and get single string.
# Output custom object for nice table formatting
[PSCustomObject]#{ Name = $name; Version = $version }
}
Output:
Name Version
---- -------
VendorSetup-2022-05-x64-1.23.4.exe 1.23.4
VendorSetup-x64-1.23.4-2022-05.exe 1.23.4
VendorSetup-1-2-3-4.exe 1.2.3.4
VendorSetup_2022-05_1-2-3-4.exe 1.2.3.4
Vendor Setup 1.23.456Update.exe 1.23.456
SoftwareName-1.2.34.5-x64.msi 1.2.34.5
SoftwareName-1.2.3.4-64bit.msi 1.2.3.4
SoftwareName-64-Bit-1.2.3.4.msi 1.2.3.4
VendorName_SoftwareName_64_1.2.3_Setup.exe 1.2.3
NoVersion.exe
Explanation of the Powershell code:
To resolve ambiguities when a filename has multiple matches of the patterns, we use the following rules:
Version with . separator is preferred over version with - separator. We simply apply the patterns in this order and stop when the first pattern matches.
Rightmost version is preferred (by passing the RightToLeft flag to [regex]::Match()).
.ForEach and .Where are PowerShell intrinsic methods. They are basically faster variants of the ForEach-Object and Where-Object cmdlets.
The index [0] operator after the last .ForEach is required because .ForEach and .Where always return arrays, even if there is only a single value, contrary to the behaviour cmdlets.

Powershell: Can't Get RegEx to work on multiple lines

I am getting notes from a ticket that come in the form of:
[Employee ID]:
[First Name]: Test
[Last Name]: User
[Middle Initial]:
[Email]:
[Phone]:
[* Last 4 of SSN]: 1234
I've tried the following code to get the first name (in this example it would be 'Test':
if ($incNotes -match '(^\[First Name\]:)(. * ?$)')
{
Write-Host $_.matches.groups[0].value
Write-Host $_.matches.groups[1].value
}
But I get nothing. Is there a way I can use just one long regex pattern to get the information I need? The information stays in the same format on every ticket that comes through.
How would I get the information after the [First Name]: and so on....
You can use
if ($incNotes -match '(?m)^\[First Name]: *(\S+)') {
Write-Host $matches[1]
}
See the regex demo. If you can have any kind of horizontal whitespace chars between : and the name, replace the space with [\p{Zs}\t], or some kind of [\s-[\r\n]].
Details:
(?m) - a RegexOptions.Multiline option that makes ^ match start of any line position, and $ match end of lines
^ - start of a line
\[First Name]: - a [First Name]: string
* - zero or more spaces
(\S+) - Capturing group 1: one or more non-whitespace chars (replace with \S.* or \S[^\n\r]* to match any text till end of string).
Note that -match is a case insensitive regex matching operator, use -cmatch if you need a case sensitive behavior. Also, it only finds the first match and $matches[1] returns the Group 1 value.

Powershell regex missing ones with CR etc

I'm working on a regular expression to extract a map of key and associated string.
For some reason, it's working for lines that don't show a line split, but misses where there are line splits.
This is what I'm using:
$errorMap = [ordered]#{}
# process the lines one-by-one
switch -Regex ($fileContent -split ';') {
'InsertCodeInfo\(([\w]*), "(.*)"' { # key etc., followed by string like "Media size cassette missing"
$key,$value = ($matches[1,2])|ForEach-Object Trim
$errorMap[$key] = $value
}
}
This is an example of $fileContent:
InsertCodeInfo(pjlWarnCommunications,
"communications error");
InsertCodeInfo(pjlNormalOnline,
"Online");
InsertCodeInfo(pjlWarnOffline,
"offline");
InsertCodeInfo(pjlNormalAccessing, "Accessing"); #this is first match :(
InsertCodeInfo(pjlNormalArrive, "Normal arrive");
InsertCodeInfo(pljNormalProcessing, "Processing");
InsertCodeInfo(pjlNormalDataInBuffer, "Data in buffer");
It's returning the pairs from pjlNormalAccessing down, where it doesn't have a line split. I thought that using the semicolon to split the regex content would fix it, but it didn't help. I was formerly splitting regex content with
'\r?\n'
I thought maybe there was something going on with VSCode so I have exited and re-opened it, and re-running the script had the same result. Any idea how to get it to match every InsertCodeInfo through the semicolon line with the key-value pair?
This is using VSCode and Powershell 5.1.
Update:
Someone asked how $fileContent is created:
I call my method with the filenamepath ($FileHandler), and from/to strings/methodNames ($matchFound2 becomes $fileContent later as a method parameter):
$matchFound2 = Get-MethodContents -codePath $FileHandler -methodNameToReturn "OkStatusHandler::PopulateCodeInfo" -followingMethodName "OkStatusHandler::InsertCodeInfo"
Function Get-MethodContents{
[cmdletbinding()]
Param ( [string]$codePath, [string]$methodNameToReturn, [string]$followingMethodName)
Process
{
$contents = ""
Write-Host "In GetMethodContents method File:$codePath method:$methodNameToReturn followingMethod:$followingMethodName" -ForegroundColor Green
$contents = Get-Content $codePath -Raw #raw gives content as single string instead of a list of strings
$null = $contents -match "($methodNameToReturn[\s\S]*)$followingMethodName" #| Out-Null
return $Matches.Item(1)
}#End of Process
}#End of Function
You can use
InsertCodeInfo\((\w+),\s*"([^"]*)
See the online regex demo.
Details:
InsertCodeInfo\( - a literal InsertCodeInfo( text
(\w+) - Group 1: one or more word chars (letters, digits, diacritics or underscores (connector punctuation)
, - a comma
\s* - zero or more whitespaces
" - a " char
([^"]*) - Group 2: zero or more chars other than a " char.
See the regex graph:
This regular expression seems to be catching all lines, including ones with newline in the middle. Thanks for the suggestion #WiktorStribizew. I tweaked your suggestion, and it helped.
InsertCodeInfo\(([\w]*),[\s]*"([^"]*)
It might be the most succinct, but it's catching all lines. Feel free as always to post alternative suggestions. This is why I didn't accept my own answer.

Remove all values in a string before a backslash

I have this string
AttendanceList
XXXXXX
US\abraham
EU\sarah
US\gerber
when i try to use -replace it replaces all characters inserted in square bracket (including the first line AttendanceList)
$attendance_new = $attendance -replace "[EU\\]", "" -replace"[US\\], ""
echo $attendance_new
AttndancLit
XXXXXX
abraham
arah
grbr
i was hoping to get this sample output (and possibly concatenate a string "_IN" after all values)
AttendanceList
XXXXXX
abraham_IN
sarah_IN
gerber_IN
I'm new to regex and still trying to figure out the regex code for special characters
You can use
$attendance_new = $attendance -replace '(?m)^(?:US|EU)\\(.*)', '$1_IN'
See this demo (.NET regex demo here). Details:
(?m) - multiline option enabling ^ to match start of any line position
^ - line start
(?:US|EU) - EU or US
\\ - a \ char
(.*) - Group 1: any zero or more chars other than a line feed char (note you might need to replace it with ([^\r\n]*) if you start getting weird results)

Powershell Regex expression to get part of a string

I would like to take part of a string to use it elsewhere. For example, I have the following strings:
Project XYZ is the project name - 20-12-11
I would like to get the value "XYZ is the project name" from the string. The word "Project" and character "-" before the number will always be there.
I think a lookaround regular expression would work here since "Project" and "-" are always there:
(?<=Project ).+?(?= -)
A lookaround can be useful for cases that deal with getting a sub string.
Explanation:
(?<= = negative lookbehind
Project = starting string (including space)
) = closing negative lookbehind
.+? = matches anything in between
(?= = positive lookahead
- = ending string
) = closing positive lookahead
Example in PowerShell:
Function GetProjectName($InputString) {
$regExResult = $InputString | Select-String -Pattern '(?<=Project ).+?(?= -)'
$regExResult.Matches[0].Value
}
$projectName = GetProjectName -InputString "Project XYZ is the project name - 20-12-11"
Write-Host "Result = '$($projectName)'"
here is yet another regex version. [grin] it may be easier to understand since it uses somewhat basic regex patterns.
what it does ...
defines the input string
defines the prefix to match on
this will keep only what comes after it.
defines the suffix to match on
this part will keep only what is before it.
trigger the replace
the part in the () is what will be placed into the 1st capture group.
show what was kept
the code ...
$InString = 'Project XYZ is the project name - 20-12-11'
# "^" = start of string
$Prefix = '^project '
# ".+' = one or more of any character
# "$" = end of string
$Suffix = ' - .+$'
# "$1" holds the content of the 1st [and only] capture group
$OutString = $InString -replace "$Prefix(.+)$Suffix", '$1'
$OutString
# define the input string
$str = 'Project XYZ is the project name - 20-12-11'
# use regex (-match) including the .*? regex pattern
# this patterns means (.)any char, (*) any times, (?) maximum greed
# to capture (into brackets) the desired pattern substring
$str -match "(Project.*?is the project name)"
# show result (the first capturing group)
$matches[1]