Powershell: Can't Get RegEx to work on multiple lines - regex

I am getting notes from a ticket that come in the form of:
[Employee ID]:
[First Name]: Test
[Last Name]: User
[Middle Initial]:
[Email]:
[Phone]:
[* Last 4 of SSN]: 1234
I've tried the following code to get the first name (in this example it would be 'Test':
if ($incNotes -match '(^\[First Name\]:)(. * ?$)')
{
Write-Host $_.matches.groups[0].value
Write-Host $_.matches.groups[1].value
}
But I get nothing. Is there a way I can use just one long regex pattern to get the information I need? The information stays in the same format on every ticket that comes through.
How would I get the information after the [First Name]: and so on....

You can use
if ($incNotes -match '(?m)^\[First Name]: *(\S+)') {
Write-Host $matches[1]
}
See the regex demo. If you can have any kind of horizontal whitespace chars between : and the name, replace the space with [\p{Zs}\t], or some kind of [\s-[\r\n]].
Details:
(?m) - a RegexOptions.Multiline option that makes ^ match start of any line position, and $ match end of lines
^ - start of a line
\[First Name]: - a [First Name]: string
* - zero or more spaces
(\S+) - Capturing group 1: one or more non-whitespace chars (replace with \S.* or \S[^\n\r]* to match any text till end of string).
Note that -match is a case insensitive regex matching operator, use -cmatch if you need a case sensitive behavior. Also, it only finds the first match and $matches[1] returns the Group 1 value.

Related

Powershell regex missing ones with CR etc

I'm working on a regular expression to extract a map of key and associated string.
For some reason, it's working for lines that don't show a line split, but misses where there are line splits.
This is what I'm using:
$errorMap = [ordered]#{}
# process the lines one-by-one
switch -Regex ($fileContent -split ';') {
'InsertCodeInfo\(([\w]*), "(.*)"' { # key etc., followed by string like "Media size cassette missing"
$key,$value = ($matches[1,2])|ForEach-Object Trim
$errorMap[$key] = $value
}
}
This is an example of $fileContent:
InsertCodeInfo(pjlWarnCommunications,
"communications error");
InsertCodeInfo(pjlNormalOnline,
"Online");
InsertCodeInfo(pjlWarnOffline,
"offline");
InsertCodeInfo(pjlNormalAccessing, "Accessing"); #this is first match :(
InsertCodeInfo(pjlNormalArrive, "Normal arrive");
InsertCodeInfo(pljNormalProcessing, "Processing");
InsertCodeInfo(pjlNormalDataInBuffer, "Data in buffer");
It's returning the pairs from pjlNormalAccessing down, where it doesn't have a line split. I thought that using the semicolon to split the regex content would fix it, but it didn't help. I was formerly splitting regex content with
'\r?\n'
I thought maybe there was something going on with VSCode so I have exited and re-opened it, and re-running the script had the same result. Any idea how to get it to match every InsertCodeInfo through the semicolon line with the key-value pair?
This is using VSCode and Powershell 5.1.
Update:
Someone asked how $fileContent is created:
I call my method with the filenamepath ($FileHandler), and from/to strings/methodNames ($matchFound2 becomes $fileContent later as a method parameter):
$matchFound2 = Get-MethodContents -codePath $FileHandler -methodNameToReturn "OkStatusHandler::PopulateCodeInfo" -followingMethodName "OkStatusHandler::InsertCodeInfo"
Function Get-MethodContents{
[cmdletbinding()]
Param ( [string]$codePath, [string]$methodNameToReturn, [string]$followingMethodName)
Process
{
$contents = ""
Write-Host "In GetMethodContents method File:$codePath method:$methodNameToReturn followingMethod:$followingMethodName" -ForegroundColor Green
$contents = Get-Content $codePath -Raw #raw gives content as single string instead of a list of strings
$null = $contents -match "($methodNameToReturn[\s\S]*)$followingMethodName" #| Out-Null
return $Matches.Item(1)
}#End of Process
}#End of Function
You can use
InsertCodeInfo\((\w+),\s*"([^"]*)
See the online regex demo.
Details:
InsertCodeInfo\( - a literal InsertCodeInfo( text
(\w+) - Group 1: one or more word chars (letters, digits, diacritics or underscores (connector punctuation)
, - a comma
\s* - zero or more whitespaces
" - a " char
([^"]*) - Group 2: zero or more chars other than a " char.
See the regex graph:
This regular expression seems to be catching all lines, including ones with newline in the middle. Thanks for the suggestion #WiktorStribizew. I tweaked your suggestion, and it helped.
InsertCodeInfo\(([\w]*),[\s]*"([^"]*)
It might be the most succinct, but it's catching all lines. Feel free as always to post alternative suggestions. This is why I didn't accept my own answer.

Regex capture multi matches in Group

I'm not sure if this is possible. I'am searching for a way to capture multiple matches in a group.
This it work perfectly fine:
"Catch me if you can" -match "(?=.*(Catch))"
Result: Catch
I would like to have the result of two matches in the group:
"Catch me if you can" -match "(?=.*Catch)(?=.*me)"
Expected Result: Catch me
Note: If hard-coding the result of both regex subexpressions matching is sufficient, simply use:
if ('Catch me if you can' -match '(?=.*Catch)(?=.*me)') { 'Catch me' }
You're trying to:
match two separate regex subexpressions,
and report what specific strings they matched only if BOTH matched.
Note:
While it is possible to use a variation of your regex, which concatenates two look-ahead assertions ((?=.*(Catch))(?=.*(me))), to extract what the two subexpressions of interest captured, the captured substrings would be reported in the order in which the subexpressions are specified in the regex, not in the order in which the substrings appear in the input string. E.g., input string 'me Catch if you can' would also result in output string 'Catch me'
The following solution uses the [regex]::Match() .NET API for preserving the input order of the captured substrings by sorting the captures by their starting position in the input string:
$match = [regex]::Match('me Catch if you can', '(?=.*(Catch))(?=.*(me))', 'IgnoreCase')
if ($match.Success) { ($match.Groups | Select-Object -Skip 1 | Sort-Object Position).Value -join ' ' }
Note the use of the IgnoreCase option, so as to match PowerShell's default behavior of case-insensitive matching.
The above outputs 'me Catch', i.e. the captured substrings in the order in which they appear in the input string.
If instead you prefer that the captured substrings be reported in the order in which the subexpressions that matched them appear in the regex ('Catch me'), simply omit | Sort-Object Position from the command above.
Alternatively, you then could make your -match operation work, as follows, by enclosing the subexpressions of interest in (...) to form capture groups and then accessing the captured substrings via the automatic $Matches variable - but note that no information about matching positions is then available:
if ('me Catch if you can' -match '(?=.*(Catch))(?=.*(me))') {
$Matches[1..2] -join ' ' # -> 'Catch me'
}
Note that this only works because a single match result captures both substrings of interest, due to the concatenation of two look-ahead assertions ((?=...)); because -match only ever looks for one match, the simpler 'Catch|me' regex would not work, as it would stop matching once either subexpression is found.
See also:
GitHub issue #7867, which suggests introducing a -matchall operator that returns all matches found in the input string.
The (?= is a LookAhead, but you don't have it looking ahead of anything. In this example LookAhead is looking ahead of "Catch" to see if it can find ".*me".
Catch(?=.*me)
Also, do you really want to match "catchABCme"? I would think you would want to match "catch ABC me", but not "catchABCme", "catchABC me", or "catch ABCme".
Here is some test code to play with:
$Lines = #(
'catch ABC me if you can',
'catch ABCme if you can',
'catchABC me if you can'
)
$RegExCheckers = #(
'Catch(?=.*me)',
'Catch(?=.*\s+me)',
'Catch\s(?=(.*\s+)?me)'
)
foreach ($RegEx in $RegExCheckers) {
$RegExOut = "`"$RegEx`"".PadLeft(22,' ')
foreach ($Line in $Lines) {
$LineOut = "`"$Line`"".PadLeft(26,' ')
if($Line -match $RegEx) {
Write-Host "$RegExOut matches $LineOut"
} else {
Write-Host "$RegExOut didn't match $LineOut"
}
}
Write-Host
}
And here is the output:
"Catch(?=.*me)" matches "catch ABC me if you can"
"Catch(?=.*me)" matches "catch ABCme if you can"
"Catch(?=.*me)" matches "catchABC me if you can"
"Catch(?=.*\s+me)" matches "catch ABC me if you can"
"Catch(?=.*\s+me)" didn't match "catch ABCme if you can"
"Catch(?=.*\s+me)" matches "catchABC me if you can"
"Catch\s(?=(.*\s+)?me)" matches "catch ABC me if you can"
"Catch\s(?=(.*\s+)?me)" didn't match "catch ABCme if you can"
"Catch\s(?=(.*\s+)?me)" didn't match "catchABC me if you can"
As you can see, the last RegEx expression requires a space after "catch" and before "me".
Also, a great place to test RegEx is regex101.com, you can place the RegEx at the top and multiple lines you want to test it against in the box in the middle.

Remove all values in a string before a backslash

I have this string
AttendanceList
XXXXXX
US\abraham
EU\sarah
US\gerber
when i try to use -replace it replaces all characters inserted in square bracket (including the first line AttendanceList)
$attendance_new = $attendance -replace "[EU\\]", "" -replace"[US\\], ""
echo $attendance_new
AttndancLit
XXXXXX
abraham
arah
grbr
i was hoping to get this sample output (and possibly concatenate a string "_IN" after all values)
AttendanceList
XXXXXX
abraham_IN
sarah_IN
gerber_IN
I'm new to regex and still trying to figure out the regex code for special characters
You can use
$attendance_new = $attendance -replace '(?m)^(?:US|EU)\\(.*)', '$1_IN'
See this demo (.NET regex demo here). Details:
(?m) - multiline option enabling ^ to match start of any line position
^ - line start
(?:US|EU) - EU or US
\\ - a \ char
(.*) - Group 1: any zero or more chars other than a line feed char (note you might need to replace it with ([^\r\n]*) if you start getting weird results)

How do parse this string in Powershell?

I have a block of text I need to parse (saved in a variable) but I'm unsure how to go about it. This block of text, saved in a variable we can call $block for simplicity's sake, includes all the whitespace shown below.
I would like the result to be an iterable list, the first value being Health_AEPOEP_Membership_Summary - Dev and the second one being Health_AEPOEP_YoY_Comparison_Summary - Dev. Assume this list of workbooks can be longer (up to 50) or shorter (minimum 1 workbook), and all workbooks are formatted similarly (in terms of name_with_underscores - Dev. I'd try the $block.split(" ") method, but this method gives many spaces which may be hard to enumerate and account for.
Workbooks : Health_AEPOEP_Membership_Summary - Dev [Project: Health - Dev]
Health_AEPOEP_YoY_Comparison_Summary - Dev [Project: Health - Dev]
Any help is much appreciated!
You could write a multi-line regex pattern and try to extract the names, but it might be easier to reason about if you just breaking it into simple(r) steps:
$string = #'
Workbooks : Health_AEPOEP_Membership_Summary - Dev [Project: Health - Dev]
Health_AEPOEP_YoY_Comparison_Summary - Dev [Project: Health - Dev]
'#
# Split into one string per line
$strings = $string -split '\r?\n'
# Remove leading whitespace
$strings = $strings -replace '^\s*'
# Remove `Workbooks : ` prefix (strings that don't match will be left untouched)
$strings = $strings -replace '^Workbooks :\s*'
# Remove `[Project $NAME]` suffix
$strings = $strings -replace '\s*\[Project: [^\]]+\]'
# Get rid of empty lines
$strings = $strings |Where-Object Length
$strings now contains the two project names
If the text is in a file it would make this a little easier, and I would recommend this approach
switch -Regex -file ($file){
'(\w+_.+- Dev)' {$matches.1}
}
Regex details
() - capture group
\w+ - match one or more letter characters
_ - match literal underscore
.+ - match one or more of any character
- Dev - literal match of dash space Dev
If it's already in a variable, it would depend if it's a string array or a single string. Assuming it's a single string, I'd recommend this approach
$regex = [regex]'(\w+_.+)(?=(\s\[.+))'
$regex.Matches($block).value
Health_AEPOEP_Membership_Summary - Dev
Health_AEPOEP_YoY_Comparison_Summary - Dev
Regex details
Same as above but added the following
(?=) - Look ahead
\s\[.+ - match a space, a left square bracket, one or more characters
Simply add a variable assignment $strings = before either of these to capture the output. Either would work on one or 500 workbooks.

Regex Get a substring from a string nearest to the end

I'm trying to get a substring from a string using a powershell script and regex.
For example I'm trying to get a year that's part of a filename.
Example Filename "Expo.2000.Brazilian.Pavillon.after.Something.2016.SomeTextIDontNeed.jpg"
The problem is that the result of the regex gives me "2000" and no other matches. I need to get "2016" matched. Sadly $matches only has one matched instance. Do I have missed something? I feel getting nuts ;)
If $matches would contain all instances found I could handle getting the nearest to end instance with:
$Year = $matches[$matches.Count-1]
Powershell Code:
# Function to get the images year and clean up image information after it.
Function Remove-String-Behind-Year
{
param
(
[string]$OriginalFileName # Provide the BaseName of the image file.
)
[Regex]$RegExYear = [Regex]"(?<=\.)\d{4}(?=\.|$)" Regex to match a four digit string, prepended by a dot and followed by a dot or the end of the string.
$OriginalFileName -match $RegExYear # Matches the Original Filename with the Regex
Write-Host "Count: " $matches.Count # Why I only get 1 result?
Write-Host "BLA: " $matches[0] # First and only match is "2000"
}
Wanted Result Table:
"x.2000.y.2016.z" => "2016" (Does not work)
"x.y.2016" => "2016" (Works)
"x.y.2016.z" => "2016" (Works)
"x.y.20164.z" => "" (Works)
"x.y.201.z" => "" (Works)
PowerShell's -match operator only ever finds (at most) one match (although multiple substrings of that one match may be found with capture groups).
However, using the fact that quantifier * is greedy (by default), we can still use that one match to find the last match in the input:
-match '^.*\.(\d{4})\b' finds the longest prefix of the input that ends in a 4-digit sequence preceded by a literal . and followed by a word boundary, so that $matches[1] then contains the last occurrence of such a 4-digit sequence.
Function Extract-Year
{
param
(
[string] $OriginalFileName # Provide the BaseName of the image file.
)
if ($OriginalFileName -match '^.*\.(\d{4})\b') {
$matches[1] # output last 4-digit sequence found
} else {
'' # output empty string to indicate that no 4-digit sequence was found.
}
}
'x.2000.y.2016.z', 'x.y.2016', 'x.y.2016.z', 'x.y.20164.z', 'x.y.201.z' |
% { Extract-Year $_ }
yields
2016
2016
2016
# empty line
# empty line