Unexpected result from PowerShell regex - regex

I am trying to identify errors in a log file. The application uses five uppercase letters followed by three digits followed by 'E' as an error code. The error code is followed by a non-word character. I was identifying cases with:
$errors=Select-string -Path "logfile.txt" -Pattern "[A-Z]{5}[0-9]{3}E\W"
However the remainder of the content now includes
ab1bea8a-a00e-4211-b1db-2facecfd725e.
Which is being matched by the regex and flagged as an error. I changed the regex to
\p{Lu}{5}[0-9]{3}E\W
(which I expected to match five upper case characters), but why does it still match the non-error lower case pattern?

The "case-insensitive" regex flag is set by Select-String, which makes \p{Lu} case-insensitive, just as it does with [A-Z].
Try adding the -CaseSensitive parameter to the command.
You can confirm this by running some .NET code, for example in LINQPad:
(new Regex(#"\p{Lu}", RegexOptions.IgnoreCase)).IsMatch("a")

PowerShell regular expression matching is case-insensitive by default. There are several ways for making matches case-sensitive, though.
Add the -CaseSensitive switch when using the Select-String cmdlet:
-CaseSensitive
Makes matches case-sensitive. By default, matches are not case-sensitive.
C:\> 'abc' | Select-String -Pattern 'A'
abc
C:\> 'ABC' | Select-String -Pattern 'A'
ABC
C:\> 'abc' | Select-String -Pattern 'A' -CaseSensitive # ← no match here
C:\> 'ABC' | Select-String -Pattern 'A' -CaseSensitive
ABC
Use the case-sensitive version of the regular expression matching operators:
By default, all comparison operators are case-insensitive. To make a comparison operator case-sensitive, precede the operator name with a c. For example, the case-sensitive version of -eq is -ceq. To make the case-insensitivity explicit, precede the operator with an i. For example, the explicitly case-insensitive version of -eq is -ieq.
C:\> 'abc' -match 'A'
True
C:\> 'ABC' -match 'A'
True
C:\> 'abc' -cmatch 'A' # ← no match here
False
C:\> 'ABC' -cmatch 'A'
True
Force a case-sensitive match by adding a miscellaneous construct ((?...), not to be confused with non-capturing groups (?:...)) with the inverted "case-insensitive" regex option to the regular expression (this works with both Select-String cmdlet and -match operator):
C:\> 'abc' | Select-String -Pattern '(?-i)A' # ← no match here
C:\> 'ABC' | Select-String -Pattern '(?-i)A'
ABC
C:\> 'abc' -match '(?-i)A' # ← no match here
False
C:\> 'ABC' -match '(?-i)A'
True

Related

Powershell regex only select digits

I have a script that I am working on to parse each line in the log. My issue is the regex I use matches from src= until space.
I only want the ip address not the src= part. But I do still need to match from src= up to space but in the result only store digits. Below is what I use but it sucks really badly. So any help would be helpful
#example text
$destination=“src=192.168.96.112 dst=192.168.5.22”
$destination -match 'src=[^\s]+'
$result = $matches.Values
#turn it into string since trim doesn’t work
$result=echo $result
$result=$result.trim(“src=”)
You can use a lookbehind here, and since -match only returns the first match, you will be able to access the matched value using $matches[0]:
$destination -match '(?<=src=)\S+' | Out-Null
$matches[0]
# => 192.168.96.112
See the .NET regex demo.
(?<=src=) - matches a location immediately preceded with src=
\S+ - one or more non-whitespace chars.
To extract all these values, use
Select-String '(?<=src=)\S+' -input $destination -AllMatches | Foreach {$_.Matches} | Foreach-Object {$_.Value}
or
Select-String '(?<=src=)\S+' -input $destination -AllMatches | % {$_.Matches} | % {$_.Value}
Another way could be using a capturing group:
src=(\S+)
Regex demo | Powershell demo
For example
$destination=“src=192.168.96.112 dst=192.168.5.22”
$pattern = 'src=(\S+)'
Select-String $pattern -input $destination -AllMatches | Foreach-Object {$_.Matches} | Foreach-Object {$_.Groups[1].Value}
Output
192.168.96.112
Or a bit more specific matching the dot and the digits (or see this page for an even more specific match for an ip number)
src=(\d{1,3}(?:\.\d{1,3}){3})

Powershell capture multiple values?

The following code returns only one match.
$s = 'x.a,
x.b,
x.c
'
$s -match 'x\.(.*?)[,$]'
$Matches.Count # return 2
$Matches[1] # returns a only
Excepted to return a, b, c.
The -match operator only finds the first match. The -AllMatches with Select-String will fetch all matches in the input. Also, [,$] matches a , or $ literal chars, the $ is not a string/line end metacharacter.
A possible solution may look like
Select-String 'x\.([^,]+)' -input $s -AllMatches | Foreach {$_.Matches} | Foreach-Object {$_.Groups[1].Value}
The pattern is x\.([^,]+), it matches x. and then captures into Group 1 any one or more chars other than ,.

Select-string apply to multiple lines of input

I have a file like:
abc WANT THIS
def NOT THIS
ghijk WANT THIS
lmno DO NOT LIKE
pqr WANT THIS
...
From which I want to extract:
abc
ghijk
pqr
When I apply the following:
(Select-String -Path $infile -Pattern "([^ ]+) WANT THIS").Matches.Groups[1].Value >$outfile
It only returns the match for the first line:
abc
(adding -AllMatches did not change the behaviour)
You may use
Select-String -path $infile -Pattern '^\s*(\S+) WANT THIS' -AllMatches | Foreach-Object {$_.Matches} | Foreach-Object {$_.Groups[1].Value} > $outfile
The ^\s*(\S+) WANT THIS pattern will match
^ - start of a line
\s* - 0+ whitespaces
(\S+) - Group 1: one or more non-whitespace chars
WANT THIS - a literal substring.
Now, -AllMatches will collect all matches, then, you need to iterate over all matches with Foreach-Object {$_.Matches} and access Group 1 values (with Foreach-Object {$_.Groups[1].Value}), and then save the results to the output file.
Re-reading the code, its matching them all, but only writing the value of the first match (doh!):
(Select-String -Path $scriptfile -Pattern "([^ ]+) WANT THIS").Matches.Groups.Value >$tmpfile
OTOH, it appears that the "captures" in the pattern output object also contain the "non-captured" content!!!!

How to modify this regex to work in Powershell

So I have this regex https://regex101.com/r/xG8oX2/2 which gives me the matches I want.
But when I run this powershell script, it give me no matches. What should I modify in this regex to be able to get the same matches in powershell?
$pattern2 = '\d{4}\/\d{2}\/\d{2}.*]\s(?<reportHash>.*):.*Start.*\r*\n*.*\n.*ReportLayoutID=(\d{1,7})';
$reportLayoutIDList = Get-Content -Path bigOptions.txt | Out-String |
Select-String -Pattern $pattern2 -AllMatches |
Select-Object -ExpandProperty Matches |
Select-Object #{n="ReportHash";e={$_.Groups["reportHash"]}},
#{n="LayoutID";e={$_.Groups["reportLayoutID"]}};$reportLayoutIDList |
Export-csv reportLayoutIDList.csv;
The problem is your linebreaks. In windows, linebreaks are CRLF (\r\n) while in UNIX etc. they're just LF \n.
So either you need to modify the input to only use LF or you need to replace \n with \r\n in your regex.
As #briantist mentioned, using \r?\n will match either way.
Thank you to both Frode F and briantist.
This is the regex pattern that worked in Powershell:
$pattern2 = '\d{4}\/\d{2}\/\d{2}.*]\s(?<reportHash>.*):.*Start.*[\r?\n]*.*[\r?\n].*ReportLayoutID=(?<reportLayoutID>\d+)';

Find multiple lines spanning text and replace using PowerShell

I am using a regular expression search to match up and replace some text. The text can span multiple lines (may or may not have line breaks).
Currently I have this:
$regex = "\<\?php eval.*?\>"
Get-ChildItem -exclude *.bak | Where-Object {$_.Attributes -ne "Directory"} |ForEach-Object {
$text = [string]::Join("`n", (Get-Content $_))
$text -replace $RegEx ,"REPLACED"}
Try this:
$regex = New-Object Text.RegularExpressions.Regex "\<\?php eval.*?\>", ('singleline', 'multiline')
Get-ChildItem -exclude *.bak |
Where-Object {!$_.PsIsContainer} |
ForEach-Object {
$text = (Get-Content $_.FullName) -join "`n"
$regex.Replace($text, "REPLACED")
}
A regular expression is explicitly created via New-Object so that options can be passed in.
Try changing your regex pattern to:
"(?s)\<\?php eval.*?\>"
to get singleline (dot matches any char including line terminators). Since you aren't using the ^ or $ metacharacters I don't think you need to specify multiline (^ & $ match embedded line terminators).
Update: It seems that -replace makes sure the regex is case-insensitive so the i option isn't needed.
One should use the (.|\n)+ expression to cross line boundaries
since . doesn't match new lines.