Select-string apply to multiple lines of input - regex

I have a file like:
abc WANT THIS
def NOT THIS
ghijk WANT THIS
lmno DO NOT LIKE
pqr WANT THIS
...
From which I want to extract:
abc
ghijk
pqr
When I apply the following:
(Select-String -Path $infile -Pattern "([^ ]+) WANT THIS").Matches.Groups[1].Value >$outfile
It only returns the match for the first line:
abc
(adding -AllMatches did not change the behaviour)

You may use
Select-String -path $infile -Pattern '^\s*(\S+) WANT THIS' -AllMatches | Foreach-Object {$_.Matches} | Foreach-Object {$_.Groups[1].Value} > $outfile
The ^\s*(\S+) WANT THIS pattern will match
^ - start of a line
\s* - 0+ whitespaces
(\S+) - Group 1: one or more non-whitespace chars
WANT THIS - a literal substring.
Now, -AllMatches will collect all matches, then, you need to iterate over all matches with Foreach-Object {$_.Matches} and access Group 1 values (with Foreach-Object {$_.Groups[1].Value}), and then save the results to the output file.

Re-reading the code, its matching them all, but only writing the value of the first match (doh!):
(Select-String -Path $scriptfile -Pattern "([^ ]+) WANT THIS").Matches.Groups.Value >$tmpfile
OTOH, it appears that the "captures" in the pattern output object also contain the "non-captured" content!!!!

Related

Regex Powershell shows too much

I am new to powershell. I am trying to automate my work a bit and need simple extraction of following pattern from all filetypes:
([0-9A-Z]{2,4}.[0-9A-Z]{8}.[0-9A-Z]{8}.[0-9A-Z]{4})
Example:
*lots of text*
X-xdaemon-transaction-id: string=9971.0A67341C.6147B834.0043,ee=3,shh,rec=0.0,recu=0.0,reid=0.0,cu=3,cld=1
X-xdaemon-transaction-id: string=AA71.0A67341C.6147B442.0043,ee=3,shh,rec=0.0,recu=0.0,reip=0.0,cu=3,cld=1
*lots of text*
Unfortunately, I am receiving output like this:
1mAAAA-0005nG-TN-H:220:
X-xdaemon-transaction-id: string=AA71.0A67341C.6147B442.0043,ee=3,shh,rec=0.0,recu=0.0,reip=0.0,cu=3,cld=1
my 'code' is as following:
Select-String -Path C:\Samples\* -Pattern "(0001.[0-9A-Z]{8}.[0-9A-Z]{8}.[0-9A-Z]{4})" -CaseSensitive
And I'd like to receive only the patterns: AA71.0A67341C.6147B442.0043 without anything added
Thanks for any help!
You can use
$rx = '\b[0-9A-Z]{2,4}\.[0-9A-Z]{8}\.[0-9A-Z]{8}\.[0-9A-Z]{4}\b'
Select-String -AllMatches -Pattern $rx -Path 'C:\Samples\*' -CaseSensitive | % { $_.matches.value }
That is,
Add word boundaries to match your expected strings as whole words and escape the literal . chars
Use -AllMatches (to get multiple matches per line if any) and access each resulting object match value with $_.matches.value.
PS test:
PS C:\Users\admin> $B = Select-String -AllMatches -Pattern '\b[0-9A-Z]{2,4}\.[0-9A-Z]{8}\.[0-9A-Z]{8}\.[0-9A-Z]{4}\b' -Path 'C:\Samples\*' -CaseSensitive | % { $_.matches.value }
PS C:\Users\admin> $B
9971.0A67341C.6147B834.0043
AA71.0A67341C.6147B442.0043
PS C:\Users\admin>
try:
$find = Get-ChildItem *.txt | Select-String -Pattern '\b[0-9A-Z]{2,4}.[0-9A-Z]{8}.[0-9A-Z]{8}.[0-9A-Z]{4}\b' -CaseSensitive
$find.Matches.Value

Powershell regex only select digits

I have a script that I am working on to parse each line in the log. My issue is the regex I use matches from src= until space.
I only want the ip address not the src= part. But I do still need to match from src= up to space but in the result only store digits. Below is what I use but it sucks really badly. So any help would be helpful
#example text
$destination=“src=192.168.96.112 dst=192.168.5.22”
$destination -match 'src=[^\s]+'
$result = $matches.Values
#turn it into string since trim doesn’t work
$result=echo $result
$result=$result.trim(“src=”)
You can use a lookbehind here, and since -match only returns the first match, you will be able to access the matched value using $matches[0]:
$destination -match '(?<=src=)\S+' | Out-Null
$matches[0]
# => 192.168.96.112
See the .NET regex demo.
(?<=src=) - matches a location immediately preceded with src=
\S+ - one or more non-whitespace chars.
To extract all these values, use
Select-String '(?<=src=)\S+' -input $destination -AllMatches | Foreach {$_.Matches} | Foreach-Object {$_.Value}
or
Select-String '(?<=src=)\S+' -input $destination -AllMatches | % {$_.Matches} | % {$_.Value}
Another way could be using a capturing group:
src=(\S+)
Regex demo | Powershell demo
For example
$destination=“src=192.168.96.112 dst=192.168.5.22”
$pattern = 'src=(\S+)'
Select-String $pattern -input $destination -AllMatches | Foreach-Object {$_.Matches} | Foreach-Object {$_.Groups[1].Value}
Output
192.168.96.112
Or a bit more specific matching the dot and the digits (or see this page for an even more specific match for an ip number)
src=(\d{1,3}(?:\.\d{1,3}){3})

Powershell capture multiple values?

The following code returns only one match.
$s = 'x.a,
x.b,
x.c
'
$s -match 'x\.(.*?)[,$]'
$Matches.Count # return 2
$Matches[1] # returns a only
Excepted to return a, b, c.
The -match operator only finds the first match. The -AllMatches with Select-String will fetch all matches in the input. Also, [,$] matches a , or $ literal chars, the $ is not a string/line end metacharacter.
A possible solution may look like
Select-String 'x\.([^,]+)' -input $s -AllMatches | Foreach {$_.Matches} | Foreach-Object {$_.Groups[1].Value}
The pattern is x\.([^,]+), it matches x. and then captures into Group 1 any one or more chars other than ,.

How to modify this regex to work in Powershell

So I have this regex https://regex101.com/r/xG8oX2/2 which gives me the matches I want.
But when I run this powershell script, it give me no matches. What should I modify in this regex to be able to get the same matches in powershell?
$pattern2 = '\d{4}\/\d{2}\/\d{2}.*]\s(?<reportHash>.*):.*Start.*\r*\n*.*\n.*ReportLayoutID=(\d{1,7})';
$reportLayoutIDList = Get-Content -Path bigOptions.txt | Out-String |
Select-String -Pattern $pattern2 -AllMatches |
Select-Object -ExpandProperty Matches |
Select-Object #{n="ReportHash";e={$_.Groups["reportHash"]}},
#{n="LayoutID";e={$_.Groups["reportLayoutID"]}};$reportLayoutIDList |
Export-csv reportLayoutIDList.csv;
The problem is your linebreaks. In windows, linebreaks are CRLF (\r\n) while in UNIX etc. they're just LF \n.
So either you need to modify the input to only use LF or you need to replace \n with \r\n in your regex.
As #briantist mentioned, using \r?\n will match either way.
Thank you to both Frode F and briantist.
This is the regex pattern that worked in Powershell:
$pattern2 = '\d{4}\/\d{2}\/\d{2}.*]\s(?<reportHash>.*):.*Start.*[\r?\n]*.*[\r?\n].*ReportLayoutID=(?<reportLayoutID>\d+)';

How do I return only the matching regular expression when I select-string(grep) in PowerShell?

I am trying to find a pattern in files. When I get a match using Select-String I do not want the entire line, I just want the part that matched.
Is there a parameter I can use to do this?
For example:
If I did
select-string .-.-.
and the file contained a line with:
abc 1-2-3 abc
I'd like to get a result of just 1-2-3 instead of the entire line getting returned.
I would like to know the Powershell equivalent of a grep -o
Or just:
Select-String .-.-. .\test.txt -All | Select Matches
David's on the right path. [regex] is a type accelerator for System.Text.RegularExpressions.Regex
[regex]$regex = '.-.-.'
$regex.Matches('abc 1-2-3 abc') | foreach-object {$_.Value}
$regex.Matches('abc 1-2-3 abc 4-5-6') | foreach-object {$_.Value}
You could wrap that in a function if that is too verbose.
I tried other approach: Select-String returns property Matches that can be used. To get all the matches, you have to specify -AllMatches. Otherwise it returns only the first one.
My test file content:
test test1 alk atest2 asdflkj alj test3 test
test test3 test4
test2
The script:
select-string -Path c:\temp\select-string1.txt -Pattern 'test\d' -AllMatches | % { $_.Matches } | % { $_.Value }
returns
test1 #from line 1
test2 #from line 1
test3 #from line 1
test3 #from line 2
test4 #from line 2
test2 #from line 3
Select-String at technet.microsoft.com
In the spirit of teach a man to fish ...
What you want to do is pipe the output of your select-string command into Get-member, so you can see what properties the objects have. Once you do that, you'll see "Matches" and you can select just that by piping your output to | **Select-Object** Matches.
My suggestion is to use something like: select linenumber, filename, matches
For example: on stej's sample:
sls .\test.txt -patt 'test\d' -All |select lineNumber,fileName,matches |ft -auto
LineNumber Filename Matches
---------- -------- -------
1 test.txt {test1, test2, test3}
2 test.txt {test3, test4}
3 test.txt {test2}
None of the above answers worked for me. The below did.
Get-Content -Path $pathToFile | Select-String -Pattern "'test\d'" | foreach {$_.Matches.Value}
Get-Content -Path $pathToFile | # Get-Content will divide into single lines for us
Select-String -Pattern "'test\d'" | # Define the Regex
foreach {$_.Matches.Value} # only return the value of the Object's Matches field. (This allows for multiple result matches.)
Instead of piping to % or select you can use simpler .prop Member Enumeration syntax, which magically works on multiple elements:
(Select-String .-.-. .\test.txt -All).Matches.Value
or less parentheses:
$m = Select-String .-.-. .\test.txt -All
$m.Matches.Value
If you don't want to use ForEach operator, you can only use pipes and Select -Expand
For example, to get only the path after C:\, you could use :
Get-ChildItem | Select-String -Pattern "(C:\\)(.*)" | Select -Expand Matches | Select -Expand Groups | Where Name -eq 2 | Select -Expand Value
Where Name -eq 2 only selects the second match of the regex pattern specified.
You can use the System.Text.RegularExpressions namespace:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx