Powershell - Regular Expression Multiple Matches - regex

Maybe my reasoning is faulty, but I can't get this working.
Here's my regex: (Device\s#\d(\n.*)*?(?=\n\s*Device\s#|\Z))
Try it: http://regex101.com/r/jQ6uC8/6
$getdevice is the input string. I'm getting this string from the Stream/Output from a command line tool.
$dstate = $getdevice |
select-string -pattern '(Device\s#\d(\n.*)*?(?=\n\s*SSD\s+|\Z))' -AllMatches |
% { $_ -match '(Device\s#\d(\n.*)*?(?=\n\s*SSD\s+|\Z))' > $null; $matches[0] }
Write-Host $dstate
Output:
Device #0 Device #1 Device #2 Device #3 Device #4
Same output for the $matches[1], $matches[2] is empty.
Is there a way I can get all matches, like on regex101.com? I'm trying to split the Output/String into separate variables (one for Device0, one for Device1, Device2, and so on).
Update: Here's the Output from the command line tool: http://pastebin.com/BaywGtFE

I used your sample data in a here-string for my testing. This should work although it can depend on where your sample data comes from.
Using powershell 3.0 I have the following
$getdevice |
select-string -pattern '(?smi)(Device\s#\d+?(.*?)*?(?=Device\s#|\Z))' -AllMatches |
ForEach-Object {$_.Matches} |
ForEach-Object {$_.Value}
or if your PowerShell Verison supports it...
($getdevice | select-string -pattern '(?smi)(Device\s#\d+?(.*?)*?(?=Device\s#|\Z))' -AllMatches).Matches.Value
Which returns 4 objects with their device id's. I don't know if you wanted those or not but the regex can be modified with lookarounds if you don't need those. I updated the regex to account for device id with more that one digit as well in case that happens.
The modifiers that I used
s modifier: single line. Dot matches newline characters
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not
only begin/end of string)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
Another regex pattern thats works in this way that is shorter
'(?smi)(Device\s#).*?(?=Device\s#|\Z)'

With your existing regex, to get a list of all matches in a string, use one of these options:
Option 1
$regex = [regex] '(Device\s#\d(\n.*)*?(?=\n\s*Device\s#|\Z))'
$allmatches = $regex.Matches($yourString);
if ($allmatches.Count > 0) {
# Get the individual matches with $allmatches.Item[]
} else {
# Nah, no match
}
Option 2
$resultlist = new-object System.Collections.Specialized.StringCollection
$regex = [regex] '(Device\s#\d(\n.*)*?(?=\n\s*Device\s#|\Z))'
$match = $regex.Match($yourString)
while ($match.Success) {
$resultlist.Add($match.Value) | out-null
$match = $match.NextMatch()
}

While it doesn't exactly answer your question, I'll offer a slightly different approach:
($getdevice) -split '\s+(?=Device #\d)' | select -Skip 1
Just for fun,
$drives =
($getdevice) -split '\s+(?=Device #\d)' |
select -Skip 1 |
foreach { $Stringdata =
$_.replace(' : ','=') -replace 'Device #(\d)','Device = $1' -Replace 'Device is a (\w+)','DeviceIs = $1'
New-Object PSObject -Property $(ConvertFrom-StringData $Stringdata)
}
$drives | select Device,DeviceIs,'Total Size'
Device DeviceIs Total Size
------ -------- ----------
0 Hard drive 70007 MB
1 Hard drive 70007 MB
2 Hard drive 286102 MB
3 Hard drive 286102 MB

try this variant:
[regex]::Matches($data,'(?im)device #\d((?!\s*Device #\d)\r?\n.)*?') | select value
Value
-----
Device #0
Device #1
Device #2
Device #3
Device #4

Related

Extract string from text file via Powershell

I have been trying to extract certain values from multiple lines inside a .txt file with PowerShell.
Host
Class
INCLUDE vmware:/?filter=Displayname Equal "server01" OR Displayname Equal "server02" OR Displayname Equal "server03 test"
This is what I want :
server01
server02
server03 test
I have code so far :
$Regex = [Regex]::new("(?<=Equal)(.*)(?=OR")
$Match = $Regex.Match($String)
You may use
[regex]::matches($String, '(?<=Equal\s*")[^"]+')
See the regex demo.
See more ways to extract multiple matches here. However, you main problem is the regex pattern. The (?<=Equal\s*")[^"]+ pattern matches:
(?<=Equal\s*") - a location preceded with Equal and 0+ whitespaces and then a "
[^"]+ - consumes 1+ chars other than double quotation mark.
Demo:
$String = "Host`nClass`nINCLUDE vmware:/?filter=Displayname Equal ""server01"" OR Displayname Equal ""server02"" OR Displayname Equal ""server03 test"""
[regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value}
Output:
server01
server02
server03 test
Here is a full snippet reading the file in, getting all matches and saving to file:
$newfile = 'file.txt'
$file = 'newtext.txt'
$regex = '(?<=Equal\s*")[^"]+'
Get-Content $file |
Select-String $regex -AllMatches |
Select-Object -Expand Matches |
ForEach-Object { $_.Value } |
Set-Content $newfile
Another option (PSv3+), combining [regex]::Matches() with the -replace operator for a concise solution:
$str = #'
Host
Class
INCLUDE vmware:/?filter=Displayname Equal "server01" OR Displayname Equal "server02" OR Displayname Equal "server03 test"
'#
[regex]::Matches($str, '".*?"').Value -replace '"'
Regex ".*?" matches all "..."-enclosed tokens; .Value extracts them, and -replace '"' strips the " chars.
It may be not be obvious, but this happens to be the fastest solution among the answers here, based on my tests - see bottom.
As an aside: The above would be even more PowerShell-idiomatic if the -match operator - which only looks for a (one) match - had a variant named, say, -matchall, so that one could write:
# WISHFUL THINKING (as of PowerShell Core 6.2)
$str -matchall '".*?"' -replace '"'
See this feature suggestion on GitHub.
Optional reading: performance comparison
Pragmatically speaking, all solutions here are helpful and may be fast enough, but there may be situations where performance must be optimized.
Generally, using Select-String (and the pipeline in general) comes with a performance penalty - while offering elegance and memory-efficient streaming processing.
Also, repeated invocation of script blocks (e.g., { $_.Value }) tends to be slow - especially in a pipeline with ForEach-Object or Where-Object, but also - to a lesser degree - with the .ForEach() and .Where() collection methods (PSv4+).
In the realm of regexes, you pay a performance penalty for variable-length look-behind expressions (e.g. (?<=EQUAL\s*")) and the use of capture groups (e.g., (.*?)).
Here is a performance comparison using the Time-Command function, averaging 1000 runs:
Time-Command -Count 1e3 { [regex]::Matches($str, '".*?"').Value -replace '"' },
{ [regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value} },
{ [regex]::Matches($str, '\"(.*?)\"').Groups.Where({$_.name -eq '1'}).Value },
{ $str | Select-String -Pattern '(?<=Equal\s*")[^"]+' -AllMatches | ForEach-Object{$_.Matches.Value} } |
Format-Table Factor, Command
Sample timings from my MacBook Pro; the exact times aren't important (you can remove the Format-Table call to see them), but the relative performance is reflected in the Factor column, from fastest to slowest.
Factor Command
------ -------
1.00 [regex]::Matches($str, '".*?"').Value -replace '"' # this answer
2.85 [regex]::Matches($str, '\"(.*?)\"').Groups.Where({$_.name -eq '1'}).Value # AdminOfThings'
6.07 [regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value} # Wiktor's
8.35 $str | Select-String -Pattern '(?<=Equal\s*")[^"]+' -AllMatches | ForEach-Object{$_.Matches.Value} # LotPings'
You can modify your regex to use a capture group, which is indicated by the parentheses. The backslashes just escape the quotes. This allows you to just capture what you are looking for and then filter it further. The capture group here is automatically named 1 since I didn't provide a name. Capture group 0 is the entire match including quotes. I switched to the Matches method because that encompasses all matches for the string whereas Match only captures the first match.
$regex = [regex]'\"(.*?)\"'
$regex.matches($string).groups.where{$_.name -eq 1}.value
If you want to export the results, you can do the following:
$regex = [regex]'\"(.*?)\"'
$regex.matches($string).groups.where{$_.name -eq 1}.value | sc "c:\temp\export.txt"
An alterative reading the file directly with Select-String using Wiktor's good RegEx:
Select-String -Path .\file.txt -Pattern '(?<=Equal\s*")[^"]+' -AllMatches|
ForEach-Object{$_.Matches.Value} | Set-Content NewFile.txt
Sample output:
> Get-Content .\NewFile.txt
server01
server02
server03 test

Powershell RegEx Match Index is always the same

I'm using the following code snippet to search for several system names in a text file and save them in an array.
Now, I need to save the position of the matches but just get always only the position of the first match.
$pattern_sysname = '(?<=Computername).+?($)'
Get-Content $path | Foreach {if ([Regex]::IsMatch($_, $pattern_sysname)) {
$arr_sysname += [Regex]::Match($_, $pattern_sysname)
}
}
$arr_sysname.index
I need the position of every single match.
See this demo:
#demo data
#'
Computername12
This is Computername1
ComputernameABC
NotMatched
'# | out-file regex.test
$pattern_sysname = '(?<=Computername).+?$'
Select-String -Path regex.test -Pattern $pattern_sysname -AllMatches |
select LineNumber,#{N='OffsetInLine';E={$_.Matches[0].Index}}
Result:
LineNumber OffsetInLine
---------- ------------
1 12
2 20
3 12

Capturing multiple points of data from a text file using Regex in powershell

So I got this regex expression to work in Regex101 and it captures exactly what I want to capture. https://regex101.com/r/aJ1bZ4/3
But when I try the same thing in powershell all I get is the first set of matches. I've tried using the (?s:), the (?m:) but none of these modifiers seem to do the job. Here is my powershell script.
$reportTitleList = type ReportExecution.log | Out-String |
where {$_ -match "(?<date>\d{4}\/\d{2}\/\d{2}).*ID=(?<reportID>.*):.*Started.*Title=(?<reportName>.*)\[.*\n.*Begin ....... (?<reportHash>.*)"} |
foreach {
new-object PSObject -prop #{
Date=$matches['date']
ReportID=$matches['reportID']
ReportName=$matches['reportName']
ReportHash=$matches['reportHash']
}
}
$reportTitleList > reportTitleList.txt
What am I doing wrong? Why am I not getting all the matches as the regex101 example?
-match only find the first match. To use a global search you need to use [regex]::Matches() or Select-String with the -AllMatches switch. Ex:
#In PoweShell 3.0+ you can replace `Get-Content | Out-String` with `Get-Content -Raw`
$reportlist = Get-Content -Path ReportExecution.log | Out-String |
Select-String -Pattern $pattern -AllMatches |
Select-Object -ExpandProperty Matches |
Select-Object #{n="Date";e={$_.Groups["date"]}},
#{n="ReportID";e={$_.Groups["reportID"]}},
#{n="ReportName";e={$_.Groups["reportName"]}},
#{n="ReportHash";e={$_.Groups["reportHash"]}}
#Show output
$reportlist
Output:
Date ReportID ReportName ReportHash
---- -------- ---------- ----------
2015/03/23 578 Calendar Day Activity/Calendar Day Activity 38C19F4E790446709B8C7A32FF97BC...
2015/03/23 861 Program Format Report/Program Format Report 3C9CB2150AF14B15A1B361729C007B...
2015/03/23 1077 Multi-Station Program Availability/Multi-Station Program Availability 52526430EE4E401BA4376B38A2D88B...
2015/03/23 1299 Program Audit Trail/Program Audit Trail FDD1B7D9F34E46549A377A17B9A7A1...
2015/03/23 1541 Program Availability/Program Availability 843B44F4475C4950A7784C8961B642...
2015/03/23 1756 Program Description Export/Program Description Export E5800A76C68E4D5281B8D680DB2E93...
-match returns as soon as it finds a match (they should have a -matches operator right?). If you want multiple matches, use:
$mymatches = [regex]::matches($input,$pattern)
output will be different than -match, however, and you'll have to massage it a bit, something like: (see here for another example of conversion)
$mymatches | ForEach-Object { if ( $_.Success) { echo $_.value}}

Regex to match only words without _ or -

I am trying to extract word out of a text file which contains exactly one word per each line. But I only want to match the word if there are no "_"(underscore) or "-" (dash) in the word:
File might look like :
< someword
< SomeOtherword
< wordwith-dash-anotherd
< wordwith_under_anotheru
I only want to extract line 1 & 2 and ignore line 3 & 4
(i.e. result when regex match each line should be: someword SomeOtherword without "<" and space for each line)
I have been trying with "[\w-]+" which matches words with both _ & -
I am using PowerShell regex engine.
I am processing a file with close to 100000 lines. I don't want to loop through each line as need the processing time to be very quick. code I am using:
$rx = '[\w-]+'
Get-Content $filename | Select-String -Pattern $rx -AllMatches | select -ExpandProperty Matches | select -ExpandProperty Value | out-file $outputfile
If you are performance sensitive, this approach is measurably faster (2.6 secs vs. 80 millisecs):
(Select-String '^[a-zA-Z]+$' file.txt -AllMatches).Matches.Value
This does require a feature that is new to PowerShell v3. You don't say which version you are using.
To do a regex match in powershell you can use either -match operator or select-string. There is also a -notmatch operator and a -NotMatch flag for select-string. Both filter for the absence of a match.
So one option is
gc 'file.txt' | where { $_ -notmatch '-|_' } | foreach { $_.Trim('<', ' ') }
and another is
gc 'file.txt' | select-string -NotMatch '-|_' | foreach { $_.Line.Trim('<', ' ') }

Multiline regex to match config block

I am having some issues trying to match a certain config block (multiple ones) from a file. Below is the block that I'm trying to extract from the config file:
ap71xx 00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
There are multiple ones just like this, each with a different MAC address. How do I match a config block across multiple lines?
The first problem you may run into is that in order to match across multiple lines, you need to process the file's contents as a single string rather than by individual line. For example, if you use Get-Content to read the contents of the file then by default it will give you an array of strings - one element for each line. To match across lines you want the file in a single string (and hope the file isn't too huge). You can do this like so:
$fileContent = [io.file]::ReadAllText("C:\file.txt")
Or in PowerShell 3.0 you can use Get-Content with the -Raw parameter:
$fileContent = Get-Content c:\file.txt -Raw
Then you need to specify a regex option to match across line terminators i.e.
SingleLine mode (. matches any char including line feed), as well as
Multiline mode (^ and $ match embedded line terminators), e.g.
(?smi) - note the "i" is to ignore case
e.g.:
C:\> $fileContent | Select-String '(?smi)([0-9a-f]{2}(-|\s*$)){6}.*?!' -AllMatches |
Foreach {$_.Matches} | Foreach {$_.Value}
00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
Use the Select-String cmdlet to do the search because you can specify -AllMatches and it will output all matches whereas the -match operator stops after the first match. Makes sense because it is a Boolean operator that just needs to determine if there is a match.
In case this may still be of value to someone and depending on the actual requirement, the regex in Keith's answer doesn't need to be that complicated. If the user simply wants to output each block the following will suffice:
$fileContent = [io.file]::ReadAllText("c:\file.txt")
$fileContent |
Select-String '(?smi)ap71xx[^!]+!' -AllMatches |
%{ $_.Matches } |
%{ $_.Value }
The regex ap71xx[^!]*! will perform better and the use of .* in a regular expression is not recommended because it can generate unexpected results. The pattern [^!]+! will match any character except the exclamation mark, followed by the exclamation mark.
If the start of the block isn't required in the output, the updated script is:
$fileContent |
Select-String '(?smi)ap71xx([^!]+!)' -AllMatches |
%{ $_.Matches } |
%{ $_.Groups[1] } |
%{ $_.Value }
Groups[0] contains the whole matched string, Groups[1] will contain the string match within the parentheses in the regex.
If $fileContent isn't required for any further processing, the variable can be eliminated:
[io.file]::ReadAllText("c:\file.txt") |
Select-String '(?smi)ap71xx([^!]+!)' -AllMatches |
%{ $_.Matches } |
%{ $_.Groups[1] } |
%{ $_.Value }
This regex will search for the text ap followed by any number of characters and new lines ending with a !:
(?si)(a).+?\!{1}
So I was a little bored. I wrote a script that will break up the text file as you described (as long as it only contains the lines you displayed). It might work with other random lines, as long as they don't contain the key words: ap, profile, domain, hostname, or area. It will import them, and check line by line for each of the properties (MAC, Profile, domain, hostname, area) and place them into an object that can be used later. I know this isn't what you asked for, but since I spent time working on it, hopefully it can be used for some good. Here is the script if anyone is interested. It will need to be tweaked to your specific needs:
$Lines = Get-Content "c:\test\test.txt"
$varObjs = #()
for ($num = 0; $num -lt $lines.Count; $num =$varLast ) {
#Checks to make sure the line isn't blank or a !. If it is, it skips to next line
if ($Lines[$num] -match "!") {
$varLast++
continue
}
if (([regex]::Match($Lines[$num],"^\s.*$")).success) {
$varLast++
continue
}
$Index = [array]::IndexOf($lines, $lines[$num])
$b=0
$varObj = New-Object System.Object
while ($Lines[$num + $b] -notmatch "!" ) {
#Checks line by line to see what it matches, adds to the $varObj when it finds what it wants.
if ($Lines[$num + $b] -match "ap") { $varObj | Add-Member -MemberType NoteProperty -Name Mac -Value $([regex]::Split($lines[$num + $b],"\s"))[1] }
if ($lines[$num + $b] -match "profile") { $varObj | Add-Member -MemberType NoteProperty -Name Profile -Value $([regex]::Split($lines[$num + $b],"\s"))[3] }
if ($Lines[$num + $b] -match "domain") { $varObj | Add-Member -MemberType NoteProperty -Name rf-domain -Value $([regex]::Split($lines[$num + $b],"\s"))[3] }
if ($Lines[$num + $b] -match "hostname") { $varObj | Add-Member -MemberType NoteProperty -Name hostname -Value $([regex]::Split($lines[$num + $b],"\s"))[2] }
if ($Lines[$num + $b] -match "area") { $varObj | Add-Member -MemberType NoteProperty -Name area -Value $([regex]::Split($lines[$num + $b],"\s"))[2] }
$b ++
} #end While
#Adds the $varObj to $varObjs for future use
$varObjs += $varObj
$varLast = ($b + $Index) + 2
}#End for ($num = 0; $num -lt $lines.Count; $num = $varLast)
#displays the $varObjs
$varObjs
To me, a very clean and simple approach is to use a multiline bloc regex, with named captures, like this:
# Based on this text configuration:
$configurationText = #"
ap71xx 00-01-23-45-67-89
use profile PROFILE
use rf-domain DOMAIN
hostname ACCESSPOINT
area inside
!
"#
# We can build a multiline regex bloc with the strings to be captured.
# Here, i am using the regex '.*?' than roughly means 'capture anything, as less as possible'
# A more specific regex can be defined for each field to capture.
# ( ) in the regex if for defining a group
# ?<> is for naming a group
$regex = #"
(?<userId>.*?) (?<userCode>.*?)
use profile (?<userProfile>.*?)
use rf-domain (?<userDomain>.*?)
hostname (?<hostname>.*?)
area (?<area>.*?)
!
"#
# Lets see if this matches !
if($configurationText -match $regex)
{
# it does !
Write-Host "Config text is successfully matched, here are the matches:"
$Matches
}
else
{
Write-Host "Config text could not be matched."
}
This script outputs the following:
PS C:\Users\xdelecroix> C:\FusionInvest\powershell\regex-capture-multiline-stackoverflow.ps1
Config text is successfully matched, here are the matches:
Name Value
---- -----
hostname ACCESSPOINT
userProfile PROFILE
userCode 00-01-23-45-67-89
area inside
userId ap71xx
userDomain DOMAIN
0 ap71xx 00-01-23-45-67-89...
For more flexibility, you can use Select-String instead of -match, but this is not really important here, in the context of this sample.
Here's my take. If you don't need the regex, you can use -like or .contains(). The question never says what the search pattern is. Here's an example with a windows text file.
$file = (get-content -raw file.txt) -replace "`r" # avoid the line ending issue
$pattern = 'two
three
f.*' -replace "`r"
# just showing what they really are
$file -replace "`r",'\r' -replace "`n",'\n'
$pattern -replace "`r",'\r' -replace "`n",'\n'
$file -match $pattern
$file | select-string $pattern -quiet