Remove all values in a string before a backslash - regex

I have this string
AttendanceList
XXXXXX
US\abraham
EU\sarah
US\gerber
when i try to use -replace it replaces all characters inserted in square bracket (including the first line AttendanceList)
$attendance_new = $attendance -replace "[EU\\]", "" -replace"[US\\], ""
echo $attendance_new
AttndancLit
XXXXXX
abraham
arah
grbr
i was hoping to get this sample output (and possibly concatenate a string "_IN" after all values)
AttendanceList
XXXXXX
abraham_IN
sarah_IN
gerber_IN
I'm new to regex and still trying to figure out the regex code for special characters

You can use
$attendance_new = $attendance -replace '(?m)^(?:US|EU)\\(.*)', '$1_IN'
See this demo (.NET regex demo here). Details:
(?m) - multiline option enabling ^ to match start of any line position
^ - line start
(?:US|EU) - EU or US
\\ - a \ char
(.*) - Group 1: any zero or more chars other than a line feed char (note you might need to replace it with ([^\r\n]*) if you start getting weird results)

Related

Powershell: Can't Get RegEx to work on multiple lines

I am getting notes from a ticket that come in the form of:
[Employee ID]:
[First Name]: Test
[Last Name]: User
[Middle Initial]:
[Email]:
[Phone]:
[* Last 4 of SSN]: 1234
I've tried the following code to get the first name (in this example it would be 'Test':
if ($incNotes -match '(^\[First Name\]:)(. * ?$)')
{
Write-Host $_.matches.groups[0].value
Write-Host $_.matches.groups[1].value
}
But I get nothing. Is there a way I can use just one long regex pattern to get the information I need? The information stays in the same format on every ticket that comes through.
How would I get the information after the [First Name]: and so on....
You can use
if ($incNotes -match '(?m)^\[First Name]: *(\S+)') {
Write-Host $matches[1]
}
See the regex demo. If you can have any kind of horizontal whitespace chars between : and the name, replace the space with [\p{Zs}\t], or some kind of [\s-[\r\n]].
Details:
(?m) - a RegexOptions.Multiline option that makes ^ match start of any line position, and $ match end of lines
^ - start of a line
\[First Name]: - a [First Name]: string
* - zero or more spaces
(\S+) - Capturing group 1: one or more non-whitespace chars (replace with \S.* or \S[^\n\r]* to match any text till end of string).
Note that -match is a case insensitive regex matching operator, use -cmatch if you need a case sensitive behavior. Also, it only finds the first match and $matches[1] returns the Group 1 value.

Powershell regex missing ones with CR etc

I'm working on a regular expression to extract a map of key and associated string.
For some reason, it's working for lines that don't show a line split, but misses where there are line splits.
This is what I'm using:
$errorMap = [ordered]#{}
# process the lines one-by-one
switch -Regex ($fileContent -split ';') {
'InsertCodeInfo\(([\w]*), "(.*)"' { # key etc., followed by string like "Media size cassette missing"
$key,$value = ($matches[1,2])|ForEach-Object Trim
$errorMap[$key] = $value
}
}
This is an example of $fileContent:
InsertCodeInfo(pjlWarnCommunications,
"communications error");
InsertCodeInfo(pjlNormalOnline,
"Online");
InsertCodeInfo(pjlWarnOffline,
"offline");
InsertCodeInfo(pjlNormalAccessing, "Accessing"); #this is first match :(
InsertCodeInfo(pjlNormalArrive, "Normal arrive");
InsertCodeInfo(pljNormalProcessing, "Processing");
InsertCodeInfo(pjlNormalDataInBuffer, "Data in buffer");
It's returning the pairs from pjlNormalAccessing down, where it doesn't have a line split. I thought that using the semicolon to split the regex content would fix it, but it didn't help. I was formerly splitting regex content with
'\r?\n'
I thought maybe there was something going on with VSCode so I have exited and re-opened it, and re-running the script had the same result. Any idea how to get it to match every InsertCodeInfo through the semicolon line with the key-value pair?
This is using VSCode and Powershell 5.1.
Update:
Someone asked how $fileContent is created:
I call my method with the filenamepath ($FileHandler), and from/to strings/methodNames ($matchFound2 becomes $fileContent later as a method parameter):
$matchFound2 = Get-MethodContents -codePath $FileHandler -methodNameToReturn "OkStatusHandler::PopulateCodeInfo" -followingMethodName "OkStatusHandler::InsertCodeInfo"
Function Get-MethodContents{
[cmdletbinding()]
Param ( [string]$codePath, [string]$methodNameToReturn, [string]$followingMethodName)
Process
{
$contents = ""
Write-Host "In GetMethodContents method File:$codePath method:$methodNameToReturn followingMethod:$followingMethodName" -ForegroundColor Green
$contents = Get-Content $codePath -Raw #raw gives content as single string instead of a list of strings
$null = $contents -match "($methodNameToReturn[\s\S]*)$followingMethodName" #| Out-Null
return $Matches.Item(1)
}#End of Process
}#End of Function
You can use
InsertCodeInfo\((\w+),\s*"([^"]*)
See the online regex demo.
Details:
InsertCodeInfo\( - a literal InsertCodeInfo( text
(\w+) - Group 1: one or more word chars (letters, digits, diacritics or underscores (connector punctuation)
, - a comma
\s* - zero or more whitespaces
" - a " char
([^"]*) - Group 2: zero or more chars other than a " char.
See the regex graph:
This regular expression seems to be catching all lines, including ones with newline in the middle. Thanks for the suggestion #WiktorStribizew. I tweaked your suggestion, and it helped.
InsertCodeInfo\(([\w]*),[\s]*"([^"]*)
It might be the most succinct, but it's catching all lines. Feel free as always to post alternative suggestions. This is why I didn't accept my own answer.

Capture (remove) all double-quotes after colon

I'm trying to clean up a string. An example string:
{
"NodeID": "${NodeID}",
"EventID": "${EventID}"
}
I want to capture all double quotes which occur after the colon, so that the end string will be:
{
"NodeID": ${NodeID},
"EventID": ${EventID}
}
I know that it's JSON, and that technically it is a string in those positions, but they're macros that will be interpreted by a system which generates the actual JSON string and replaces the macros with data, so in my use case this text isn't JSON yet. I can deal with the text line-by-line to make it easier.
I'll be using the regex pattern in both PowerShell and Python.
The closest I've gotten so far have been: (?<=[^*:])("), and (?<=:)(.*)(?<!,)
This is working, but seems incredibly kludgy and inelegant:
$String = '{
"NodeID": "${NodeID}",
"EventID": "${EventID}"
}'
# The Regex to match the text after the colon
[regex]$Regex = '(?<=:)(.*)'
# Splitting each line of the string into an ArrayList element
[System.Collections.ArrayList]$StringArray = $String.Split([string[]][Environment]::NewLine, [StringSplitOptions]::None)
# Declaring an output string
$OutPutString = ''
# Loop through the ArrayList
$i = 1
foreach ($Row in $StringArray) {
# Split each element string at the RegEx match
$RowArray = $Row -split $Regex
[String]$RowString1 = $RowArray[0]
[String]$RowString2 = $RowArray[1]
# Reassemble the element string after replacing the double quotes in the 2nd half
$FullRowString = $RowString1 + $RowString2.Replace('"','')
# If this is the first line in the string, don't add a new line charact in front
if ($i -gt 1) {
$NewLine = "`n"
}
# Reassemble the string
$OutPutString += $NewLine + $FullRowString
$i++
}
$OutPutString
Any better ideas?
👉ī¸ For the regex to be functional as expected, the regex-engine indicated by scripting/programming language is important to know.
Please always add this information as tags besides regex.
Here: powershell, python
Regex to match a JSON text-field and capture the raw-value
Tested on Python, see regex101 demo:
(?<=:\s\s)\"([^\"]*)\"
💡ī¸ Components
To explain the composition of the regex and its working in steps:
(?<=:\s\s): positive look behind ?<=: for 2 white-spaces \s\s
to neglect the field-name also enclosed in double-quotes
\" and \": matching double-quotes before and after the capture group
the unwanted enclosing of the field-value
([^\"]*): capture-group denoted by parentheses surround any non-double-quote character [^\"]*
the wanted raw field-value (string) without enclosing double-quotes
ℹī¸ Note:
The character-group [^\"] matches any non (^) double-quote \".
It will start matching at the leading double-quote and stop matching as soon as a double-quote is detected. So the final \" in the regex is optional: It is not required for matching/capturing, but will ensure that each matched field-value is correctly enclosed by double-quotes.
Result
Matching following input lines:
{
"NodeID": "${NodeID}",
"EventID": "${EventID}"
}
Will give the desired raw field-values in group 1 for each match:
e.g.
${NodeID} for the first match
${EventID} for the second match
📚ī¸ Working with JSON in PowerShell
For your context assumed as parsing JSON following related links may be useful:
Microsoft Scripting Blog: Working with JSON data in PowerShell
Related Question: PowerShell parsing JSON
PowerShell Explained: Powershell: The many ways to use regex

How can I delete the rest of the line after the second pipe character "|" for every line with python?

I am using notepad++ and I want to get rid of everything after one second (including the second pipe character) for every line in my txt file.
Basically, the txt file has the following format:
3.1_1.wav|I like apples.|I like apples|I like bananas
3.1_2.wav|Isn't today a lovely day?|Right now it is 1 in the afternoon.|....
The result should be:
3.1_1.wav|I like apples.
3.1_2.wav|Isn't today a lovely day?
I have tried using \|.* but then everything after the first pipe character is matched.
In Notepad++ do this:
Find what: ^([^\|]*\|[^\|]*).*
Replace with: $1
check "Regular expression", and "Replace All"
Explanation:
^ - anchor at start of line
( - start group, can be referenced as $1
[^\|]* - scan over any character other than |
\| - scan over |
[^\|]* - scan over any character other than |
) - end group
.* - scan over everything until end of line
in replace reference the captured group with $1
I'm not sure if this is the best way to do it, but try this:
[^wav]\|.*

How do parse this string in Powershell?

I have a block of text I need to parse (saved in a variable) but I'm unsure how to go about it. This block of text, saved in a variable we can call $block for simplicity's sake, includes all the whitespace shown below.
I would like the result to be an iterable list, the first value being Health_AEPOEP_Membership_Summary - Dev and the second one being Health_AEPOEP_YoY_Comparison_Summary - Dev. Assume this list of workbooks can be longer (up to 50) or shorter (minimum 1 workbook), and all workbooks are formatted similarly (in terms of name_with_underscores - Dev. I'd try the $block.split(" ") method, but this method gives many spaces which may be hard to enumerate and account for.
Workbooks : Health_AEPOEP_Membership_Summary - Dev [Project: Health - Dev]
Health_AEPOEP_YoY_Comparison_Summary - Dev [Project: Health - Dev]
Any help is much appreciated!
You could write a multi-line regex pattern and try to extract the names, but it might be easier to reason about if you just breaking it into simple(r) steps:
$string = #'
Workbooks : Health_AEPOEP_Membership_Summary - Dev [Project: Health - Dev]
Health_AEPOEP_YoY_Comparison_Summary - Dev [Project: Health - Dev]
'#
# Split into one string per line
$strings = $string -split '\r?\n'
# Remove leading whitespace
$strings = $strings -replace '^\s*'
# Remove `Workbooks : ` prefix (strings that don't match will be left untouched)
$strings = $strings -replace '^Workbooks :\s*'
# Remove `[Project $NAME]` suffix
$strings = $strings -replace '\s*\[Project: [^\]]+\]'
# Get rid of empty lines
$strings = $strings |Where-Object Length
$strings now contains the two project names
If the text is in a file it would make this a little easier, and I would recommend this approach
switch -Regex -file ($file){
'(\w+_.+- Dev)' {$matches.1}
}
Regex details
() - capture group
\w+ - match one or more letter characters
_ - match literal underscore
.+ - match one or more of any character
- Dev - literal match of dash space Dev
If it's already in a variable, it would depend if it's a string array or a single string. Assuming it's a single string, I'd recommend this approach
$regex = [regex]'(\w+_.+)(?=(\s\[.+))'
$regex.Matches($block).value
Health_AEPOEP_Membership_Summary - Dev
Health_AEPOEP_YoY_Comparison_Summary - Dev
Regex details
Same as above but added the following
(?=) - Look ahead
\s\[.+ - match a space, a left square bracket, one or more characters
Simply add a variable assignment $strings = before either of these to capture the output. Either would work on one or 500 workbooks.