Regex Get a substring from a string nearest to the end - regex

I'm trying to get a substring from a string using a powershell script and regex.
For example I'm trying to get a year that's part of a filename.
Example Filename "Expo.2000.Brazilian.Pavillon.after.Something.2016.SomeTextIDontNeed.jpg"
The problem is that the result of the regex gives me "2000" and no other matches. I need to get "2016" matched. Sadly $matches only has one matched instance. Do I have missed something? I feel getting nuts ;)
If $matches would contain all instances found I could handle getting the nearest to end instance with:
$Year = $matches[$matches.Count-1]
Powershell Code:
# Function to get the images year and clean up image information after it.
Function Remove-String-Behind-Year
{
param
(
[string]$OriginalFileName # Provide the BaseName of the image file.
)
[Regex]$RegExYear = [Regex]"(?<=\.)\d{4}(?=\.|$)" Regex to match a four digit string, prepended by a dot and followed by a dot or the end of the string.
$OriginalFileName -match $RegExYear # Matches the Original Filename with the Regex
Write-Host "Count: " $matches.Count # Why I only get 1 result?
Write-Host "BLA: " $matches[0] # First and only match is "2000"
}
Wanted Result Table:
"x.2000.y.2016.z" => "2016" (Does not work)
"x.y.2016" => "2016" (Works)
"x.y.2016.z" => "2016" (Works)
"x.y.20164.z" => "" (Works)
"x.y.201.z" => "" (Works)

PowerShell's -match operator only ever finds (at most) one match (although multiple substrings of that one match may be found with capture groups).
However, using the fact that quantifier * is greedy (by default), we can still use that one match to find the last match in the input:
-match '^.*\.(\d{4})\b' finds the longest prefix of the input that ends in a 4-digit sequence preceded by a literal . and followed by a word boundary, so that $matches[1] then contains the last occurrence of such a 4-digit sequence.
Function Extract-Year
{
param
(
[string] $OriginalFileName # Provide the BaseName of the image file.
)
if ($OriginalFileName -match '^.*\.(\d{4})\b') {
$matches[1] # output last 4-digit sequence found
} else {
'' # output empty string to indicate that no 4-digit sequence was found.
}
}
'x.2000.y.2016.z', 'x.y.2016', 'x.y.2016.z', 'x.y.20164.z', 'x.y.201.z' |
% { Extract-Year $_ }
yields
2016
2016
2016
# empty line
# empty line

Related

Powershell: Can't Get RegEx to work on multiple lines

I am getting notes from a ticket that come in the form of:
[Employee ID]:
[First Name]: Test
[Last Name]: User
[Middle Initial]:
[Email]:
[Phone]:
[* Last 4 of SSN]: 1234
I've tried the following code to get the first name (in this example it would be 'Test':
if ($incNotes -match '(^\[First Name\]:)(. * ?$)')
{
Write-Host $_.matches.groups[0].value
Write-Host $_.matches.groups[1].value
}
But I get nothing. Is there a way I can use just one long regex pattern to get the information I need? The information stays in the same format on every ticket that comes through.
How would I get the information after the [First Name]: and so on....
You can use
if ($incNotes -match '(?m)^\[First Name]: *(\S+)') {
Write-Host $matches[1]
}
See the regex demo. If you can have any kind of horizontal whitespace chars between : and the name, replace the space with [\p{Zs}\t], or some kind of [\s-[\r\n]].
Details:
(?m) - a RegexOptions.Multiline option that makes ^ match start of any line position, and $ match end of lines
^ - start of a line
\[First Name]: - a [First Name]: string
* - zero or more spaces
(\S+) - Capturing group 1: one or more non-whitespace chars (replace with \S.* or \S[^\n\r]* to match any text till end of string).
Note that -match is a case insensitive regex matching operator, use -cmatch if you need a case sensitive behavior. Also, it only finds the first match and $matches[1] returns the Group 1 value.

Powershell regex missing ones with CR etc

I'm working on a regular expression to extract a map of key and associated string.
For some reason, it's working for lines that don't show a line split, but misses where there are line splits.
This is what I'm using:
$errorMap = [ordered]#{}
# process the lines one-by-one
switch -Regex ($fileContent -split ';') {
'InsertCodeInfo\(([\w]*), "(.*)"' { # key etc., followed by string like "Media size cassette missing"
$key,$value = ($matches[1,2])|ForEach-Object Trim
$errorMap[$key] = $value
}
}
This is an example of $fileContent:
InsertCodeInfo(pjlWarnCommunications,
"communications error");
InsertCodeInfo(pjlNormalOnline,
"Online");
InsertCodeInfo(pjlWarnOffline,
"offline");
InsertCodeInfo(pjlNormalAccessing, "Accessing"); #this is first match :(
InsertCodeInfo(pjlNormalArrive, "Normal arrive");
InsertCodeInfo(pljNormalProcessing, "Processing");
InsertCodeInfo(pjlNormalDataInBuffer, "Data in buffer");
It's returning the pairs from pjlNormalAccessing down, where it doesn't have a line split. I thought that using the semicolon to split the regex content would fix it, but it didn't help. I was formerly splitting regex content with
'\r?\n'
I thought maybe there was something going on with VSCode so I have exited and re-opened it, and re-running the script had the same result. Any idea how to get it to match every InsertCodeInfo through the semicolon line with the key-value pair?
This is using VSCode and Powershell 5.1.
Update:
Someone asked how $fileContent is created:
I call my method with the filenamepath ($FileHandler), and from/to strings/methodNames ($matchFound2 becomes $fileContent later as a method parameter):
$matchFound2 = Get-MethodContents -codePath $FileHandler -methodNameToReturn "OkStatusHandler::PopulateCodeInfo" -followingMethodName "OkStatusHandler::InsertCodeInfo"
Function Get-MethodContents{
[cmdletbinding()]
Param ( [string]$codePath, [string]$methodNameToReturn, [string]$followingMethodName)
Process
{
$contents = ""
Write-Host "In GetMethodContents method File:$codePath method:$methodNameToReturn followingMethod:$followingMethodName" -ForegroundColor Green
$contents = Get-Content $codePath -Raw #raw gives content as single string instead of a list of strings
$null = $contents -match "($methodNameToReturn[\s\S]*)$followingMethodName" #| Out-Null
return $Matches.Item(1)
}#End of Process
}#End of Function
You can use
InsertCodeInfo\((\w+),\s*"([^"]*)
See the online regex demo.
Details:
InsertCodeInfo\( - a literal InsertCodeInfo( text
(\w+) - Group 1: one or more word chars (letters, digits, diacritics or underscores (connector punctuation)
, - a comma
\s* - zero or more whitespaces
" - a " char
([^"]*) - Group 2: zero or more chars other than a " char.
See the regex graph:
This regular expression seems to be catching all lines, including ones with newline in the middle. Thanks for the suggestion #WiktorStribizew. I tweaked your suggestion, and it helped.
InsertCodeInfo\(([\w]*),[\s]*"([^"]*)
It might be the most succinct, but it's catching all lines. Feel free as always to post alternative suggestions. This is why I didn't accept my own answer.

Capture (remove) all double-quotes after colon

I'm trying to clean up a string. An example string:
{
"NodeID": "${NodeID}",
"EventID": "${EventID}"
}
I want to capture all double quotes which occur after the colon, so that the end string will be:
{
"NodeID": ${NodeID},
"EventID": ${EventID}
}
I know that it's JSON, and that technically it is a string in those positions, but they're macros that will be interpreted by a system which generates the actual JSON string and replaces the macros with data, so in my use case this text isn't JSON yet. I can deal with the text line-by-line to make it easier.
I'll be using the regex pattern in both PowerShell and Python.
The closest I've gotten so far have been: (?<=[^*:])("), and (?<=:)(.*)(?<!,)
This is working, but seems incredibly kludgy and inelegant:
$String = '{
"NodeID": "${NodeID}",
"EventID": "${EventID}"
}'
# The Regex to match the text after the colon
[regex]$Regex = '(?<=:)(.*)'
# Splitting each line of the string into an ArrayList element
[System.Collections.ArrayList]$StringArray = $String.Split([string[]][Environment]::NewLine, [StringSplitOptions]::None)
# Declaring an output string
$OutPutString = ''
# Loop through the ArrayList
$i = 1
foreach ($Row in $StringArray) {
# Split each element string at the RegEx match
$RowArray = $Row -split $Regex
[String]$RowString1 = $RowArray[0]
[String]$RowString2 = $RowArray[1]
# Reassemble the element string after replacing the double quotes in the 2nd half
$FullRowString = $RowString1 + $RowString2.Replace('"','')
# If this is the first line in the string, don't add a new line charact in front
if ($i -gt 1) {
$NewLine = "`n"
}
# Reassemble the string
$OutPutString += $NewLine + $FullRowString
$i++
}
$OutPutString
Any better ideas?
👉ī¸ For the regex to be functional as expected, the regex-engine indicated by scripting/programming language is important to know.
Please always add this information as tags besides regex.
Here: powershell, python
Regex to match a JSON text-field and capture the raw-value
Tested on Python, see regex101 demo:
(?<=:\s\s)\"([^\"]*)\"
💡ī¸ Components
To explain the composition of the regex and its working in steps:
(?<=:\s\s): positive look behind ?<=: for 2 white-spaces \s\s
to neglect the field-name also enclosed in double-quotes
\" and \": matching double-quotes before and after the capture group
the unwanted enclosing of the field-value
([^\"]*): capture-group denoted by parentheses surround any non-double-quote character [^\"]*
the wanted raw field-value (string) without enclosing double-quotes
ℹī¸ Note:
The character-group [^\"] matches any non (^) double-quote \".
It will start matching at the leading double-quote and stop matching as soon as a double-quote is detected. So the final \" in the regex is optional: It is not required for matching/capturing, but will ensure that each matched field-value is correctly enclosed by double-quotes.
Result
Matching following input lines:
{
"NodeID": "${NodeID}",
"EventID": "${EventID}"
}
Will give the desired raw field-values in group 1 for each match:
e.g.
${NodeID} for the first match
${EventID} for the second match
📚ī¸ Working with JSON in PowerShell
For your context assumed as parsing JSON following related links may be useful:
Microsoft Scripting Blog: Working with JSON data in PowerShell
Related Question: PowerShell parsing JSON
PowerShell Explained: Powershell: The many ways to use regex

Powershell Regex expression to get part of a string

I would like to take part of a string to use it elsewhere. For example, I have the following strings:
Project XYZ is the project name - 20-12-11
I would like to get the value "XYZ is the project name" from the string. The word "Project" and character "-" before the number will always be there.
I think a lookaround regular expression would work here since "Project" and "-" are always there:
(?<=Project ).+?(?= -)
A lookaround can be useful for cases that deal with getting a sub string.
Explanation:
(?<= = negative lookbehind
Project = starting string (including space)
) = closing negative lookbehind
.+? = matches anything in between
(?= = positive lookahead
- = ending string
) = closing positive lookahead
Example in PowerShell:
Function GetProjectName($InputString) {
$regExResult = $InputString | Select-String -Pattern '(?<=Project ).+?(?= -)'
$regExResult.Matches[0].Value
}
$projectName = GetProjectName -InputString "Project XYZ is the project name - 20-12-11"
Write-Host "Result = '$($projectName)'"
here is yet another regex version. [grin] it may be easier to understand since it uses somewhat basic regex patterns.
what it does ...
defines the input string
defines the prefix to match on
this will keep only what comes after it.
defines the suffix to match on
this part will keep only what is before it.
trigger the replace
the part in the () is what will be placed into the 1st capture group.
show what was kept
the code ...
$InString = 'Project XYZ is the project name - 20-12-11'
# "^" = start of string
$Prefix = '^project '
# ".+' = one or more of any character
# "$" = end of string
$Suffix = ' - .+$'
# "$1" holds the content of the 1st [and only] capture group
$OutString = $InString -replace "$Prefix(.+)$Suffix", '$1'
$OutString
# define the input string
$str = 'Project XYZ is the project name - 20-12-11'
# use regex (-match) including the .*? regex pattern
# this patterns means (.)any char, (*) any times, (?) maximum greed
# to capture (into brackets) the desired pattern substring
$str -match "(Project.*?is the project name)"
# show result (the first capturing group)
$matches[1]

Change 3rd octet of IP in string format using PowerShell

Think I've found the worst way to do this:
$ip = "192.168.13.1"
$a,$b,$c,$d = $ip.Split(".")
[int]$c = $c
$c = $c+1
[string]$c = $c
$newIP = $a+"."+$b+"."+$c+"."+$d
$newIP
But what is the best way? Has to be string when completed. Not bothered about validating its a legit IP.
Using your example for how you want to modify the third octet, I'd do it pretty much the same way, but I'd compress some of the steps together:
$IP = "192.168.13.1"
$octets = $IP.Split(".") # or $octets = $IP -split "\."
$octets[2] = [string]([int]$octets[2] + 1) # or other manipulation of the third octet
$newIP = $octets -join "."
$newIP
You can simply use the -replace operator of PowerShell and a look ahead pattern. Look at this script below
Set-StrictMode -Version "2.0"
$ErrorActionPreference="Stop"
cls
$ip1 = "192.168.13.123"
$tests=#("192.168.13.123" , "192.168.13.1" , "192.168.13.12")
foreach($test in $tests)
{
$patternRegex="\d{1,3}(?=\.\d{1,3}$)"
$newOctet="420"
$ipNew=$test -replace $patternRegex,$newOctet
$msg="OLD ip={0} NEW ip={1}" -f $test,$ipNew
Write-Host $msg
}
This will produce the following:
OLD ip=192.168.13.123 NEW ip=192.168.420.123
OLD ip=192.168.13.1 NEW ip=192.168.420.1
OLD ip=192.168.13.12 NEW ip=192.168.420.12
How to use the -replace operator?
https://powershell.org/2013/08/regular-expressions-are-a-replaces-best-friend/
Understanding the pattern that I have used
The (?=) in \d{1,3}(?=.\d{1,3}$) means look behind.
The (?=.\d{1,3}$ in \d{1,3}(?=.\d{1,3}$) means anything behind a DOT and 1-3 digits.
The leading \d{1,3} is an instruction to specifically match 1-3 digits
All combined in plain english "Give me 1-3 digits which is behind a period and 1-3 digits located towards the right side boundary of the string"
Look ahead regex
https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference
CORRECTION
The regex pattern is a look ahead and not look behind.
If you have PowerShell Core (v6.1 or higher), you can combine -replace with a script block-based replacement:
PS> '192.168.13.1' -replace '(?<=^(\d+\.){2})\d+', { 1 + $_.Value }
192.168.14.1
Negative look-behind assertion (?<=^(\d+\.){2}) matches everything up to, but not including, the 3rd octet - without considering it part of the overall match to replace.
(?<=...) is the look-behind assertion, \d+ matches one or more (+) digits (\d), \. a literal ., and {2} matches the preceding subexpression ((...)) 2 times.
\d+ then matches just the 3rd octet; since nothing more is matched, the remainder of the string (. and the 4th octet) is left in place.
Inside the replacement script block ({ ... }), $_ refers to the results of the match, in the form of a [MatchInfo] instance; its .Value is the matched string, i.e. the 3rd octet, to which 1 can be added.
Data type note: by using 1, an implicit [int], as the LHS, the RHS (the .Value string) is implicitly coerced to [int] (you may choose to use an explicit cast).
On output, whatever the script block returns is automatically coerced back to a string.
If you must remain compatible with Windows PowerShell, consider Jeff Zeitlin's helpful answer.
For complete your method but shortly :
$a,$b,$c,$d = "192.168.13.1".Split(".")
$IP="$a.$b.$([int]$c+1).$d"
function Replace-3rdOctet {
Param(
[string]$GivenIP,
[string]$New3rdOctet
)
$GivenIP -match '(\d{1,3}).(\d{1,3}).(\d{1,3}).(\d{1,3})' | Out-Null
$Output = "$($matches[1]).$($matches[2]).$New3rdOctet.$($matches[4])"
Return $Output
}
Copy to a ps1 file and dot source it from command line, then type
Replace-3rdOctet -GivenIP '100.201.190.150' -New3rdOctet '42'
Output: 100.201.42.150
From there you could add extra error handling etc for random input etc.
here's a slightly different method. [grin] i managed to not notice the answer by JeffZeitlin until after i finished this.
[edit - thanks to JeffZeitlin for reminding me that the OP wants the final result as a string. oops! [*blush*]]
what it does ...
splits the string on the dots
puts that into an [int] array & coerces the items into that type
increments the item in the targeted slot
joins the items back into a string with a dot for the delimiter
converts that to an IP address type
adds a line to convert the IP address to a string
here's the code ...
$OriginalIPv4 = '1.1.1.1'
$TargetOctet = 3
$OctetList = [int[]]$OriginalIPv4.Split('.')
$OctetList[$TargetOctet - 1]++
$NewIPv4 = [ipaddress]($OctetList -join '.')
$NewIPv4
'=' * 30
$NewIPv4.IPAddressToString
output ...
Address : 16908545
AddressFamily : InterNetwork
ScopeId :
IsIPv6Multicast : False
IsIPv6LinkLocal : False
IsIPv6SiteLocal : False
IsIPv6Teredo : False
IsIPv4MappedToIPv6 : False
IPAddressToString : 1.1.2.1
==============================
1.1.2.1