So I have this regex https://regex101.com/r/xG8oX2/2 which gives me the matches I want.
But when I run this powershell script, it give me no matches. What should I modify in this regex to be able to get the same matches in powershell?
$pattern2 = '\d{4}\/\d{2}\/\d{2}.*]\s(?<reportHash>.*):.*Start.*\r*\n*.*\n.*ReportLayoutID=(\d{1,7})';
$reportLayoutIDList = Get-Content -Path bigOptions.txt | Out-String |
Select-String -Pattern $pattern2 -AllMatches |
Select-Object -ExpandProperty Matches |
Select-Object #{n="ReportHash";e={$_.Groups["reportHash"]}},
#{n="LayoutID";e={$_.Groups["reportLayoutID"]}};$reportLayoutIDList |
Export-csv reportLayoutIDList.csv;
The problem is your linebreaks. In windows, linebreaks are CRLF (\r\n) while in UNIX etc. they're just LF \n.
So either you need to modify the input to only use LF or you need to replace \n with \r\n in your regex.
As #briantist mentioned, using \r?\n will match either way.
Thank you to both Frode F and briantist.
This is the regex pattern that worked in Powershell:
$pattern2 = '\d{4}\/\d{2}\/\d{2}.*]\s(?<reportHash>.*):.*Start.*[\r?\n]*.*[\r?\n].*ReportLayoutID=(?<reportLayoutID>\d+)';
Related
I have a script that I am working on to parse each line in the log. My issue is the regex I use matches from src= until space.
I only want the ip address not the src= part. But I do still need to match from src= up to space but in the result only store digits. Below is what I use but it sucks really badly. So any help would be helpful
#example text
$destination=“src=192.168.96.112 dst=192.168.5.22”
$destination -match 'src=[^\s]+'
$result = $matches.Values
#turn it into string since trim doesn’t work
$result=echo $result
$result=$result.trim(“src=”)
You can use a lookbehind here, and since -match only returns the first match, you will be able to access the matched value using $matches[0]:
$destination -match '(?<=src=)\S+' | Out-Null
$matches[0]
# => 192.168.96.112
See the .NET regex demo.
(?<=src=) - matches a location immediately preceded with src=
\S+ - one or more non-whitespace chars.
To extract all these values, use
Select-String '(?<=src=)\S+' -input $destination -AllMatches | Foreach {$_.Matches} | Foreach-Object {$_.Value}
or
Select-String '(?<=src=)\S+' -input $destination -AllMatches | % {$_.Matches} | % {$_.Value}
Another way could be using a capturing group:
src=(\S+)
Regex demo | Powershell demo
For example
$destination=“src=192.168.96.112 dst=192.168.5.22”
$pattern = 'src=(\S+)'
Select-String $pattern -input $destination -AllMatches | Foreach-Object {$_.Matches} | Foreach-Object {$_.Groups[1].Value}
Output
192.168.96.112
Or a bit more specific matching the dot and the digits (or see this page for an even more specific match for an ip number)
src=(\d{1,3}(?:\.\d{1,3}){3})
I have a file like:
abc WANT THIS
def NOT THIS
ghijk WANT THIS
lmno DO NOT LIKE
pqr WANT THIS
...
From which I want to extract:
abc
ghijk
pqr
When I apply the following:
(Select-String -Path $infile -Pattern "([^ ]+) WANT THIS").Matches.Groups[1].Value >$outfile
It only returns the match for the first line:
abc
(adding -AllMatches did not change the behaviour)
You may use
Select-String -path $infile -Pattern '^\s*(\S+) WANT THIS' -AllMatches | Foreach-Object {$_.Matches} | Foreach-Object {$_.Groups[1].Value} > $outfile
The ^\s*(\S+) WANT THIS pattern will match
^ - start of a line
\s* - 0+ whitespaces
(\S+) - Group 1: one or more non-whitespace chars
WANT THIS - a literal substring.
Now, -AllMatches will collect all matches, then, you need to iterate over all matches with Foreach-Object {$_.Matches} and access Group 1 values (with Foreach-Object {$_.Groups[1].Value}), and then save the results to the output file.
Re-reading the code, its matching them all, but only writing the value of the first match (doh!):
(Select-String -Path $scriptfile -Pattern "([^ ]+) WANT THIS").Matches.Groups.Value >$tmpfile
OTOH, it appears that the "captures" in the pattern output object also contain the "non-captured" content!!!!
I am learning regex and am trying to get a better understanding by using a text file with the value $100,000 in it. What I am trying to do is to search the text file for the string "$100,000" and if it is there export the value out into a new CSV. this is what I'm using so far.
[io.file]::readalltext("c:\utilities\notes_$datetime.txt") -match("[$][0-9][0-9][0-9],[0-9][0-9][0-9]") | Out-File C:\utilities\amount.txt -Encoding ascii -Force
Which returns true. Can someone point me in the right direction as to grabbing the string value that it finds into a new CSV?
many thanks!
You're reading the file into a single string, not an array of lines, so you should use the Select-String -AllMatches instead of the -match operator:
[IO.File]::ReadAllText("c:\utilities\notes_$datetime.txt") |
Select-String '\$\d{3},\d{3}' -AllMatches |
% { $_.Matches.Groups.Value } |
Out-File C:\utilities\amount.txt -Encoding ascii -Force
As a side note, using Get-Content -Raw would be slightly more PoSh than using .Net methods, although .Net methods provide better performance.
Get-Content "c:\utilities\notes_$datetime.txt" -Raw |
Select-String '\$\d{3},\d{3}' -AllMatches |
% { $_.Matches.Groups.Value } |
Out-File C:\utilities\amount.txt -Encoding ascii -Force
I prefer to use [regex]::match for that:
$x = 'text bla $100,000 text text'
[regex]::Match($x,"\$[\d]{3},[\d]{3}").Groups[0].Value
I also changed the expression a little bit ($ followed by 3 numbers, followed by a "," and another 3 numbers).
So your script could look like this:
$fileContent = Get-Content "c:\utilities\notes_$datetime.txt"
[regex]::Match($fileContent,"\$[\d]{3},[\d]{3}").Groups[0].Value | Out-File C:\utilities\amount.txt -Encoding ascii -Force
Why not use the Select-String cmdlet - far easier:
Select-String .\infile.csv -pattern '\$[\d]{3},[\d]{3}' | Select Line | Out-File outfile.txt
You can then process multiple files like so:
Get-Childitem *.csv | Select-String -pattern '\$[\d]{3},[\d]{3}' | Select Line | Out-File outfile.txt
The Select-String has the following properties:
Line - the line where the regex found a match
LineNumber - the line number in the file where the match was found
Filename - the name of the file the match was found in
A simple enough question I hope.
I have a text log file that includes the following line:
123,010502500114082000000009260000000122001T
I want to search through the log file and return the "00000000926" section of the above text. So I wrote a regular expression:
(?<=123.{17}).{11}
So when the look behind text equals '123' with 17 characters, return the next 11. This works fine when tested on online regex editors. However in Powershell the entire line is returned instead of the 11 characters I want and I can't understand why.
$InputFile = get-content logfile.log
$regex = '(?<=123.{17}).{11}'
$Inputfile | select-string $regex
(entire line is returned).
Why is powershell returning the entire line?
Don't discount Select-String just yet. Like Briantist says it is doing what you want it to but you need to extract the data you actually want in one of two ways. Select-String returns Microsoft.PowerShell.Commands.MatchInfo objects and not just raw strings. Also we are going to use Select-String's ability to take file input directly.
$InputFile = "logfile.log"
$regex = '(?<=123.{17}).{11}'
Select-string $InputFile -Pattern $regex | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value
Of if you have at least PowerShell 3.0
(Select-string $InputFile -Pattern $regex).Matches.Value
Which gives in both cases
00000009260
It's because you're using Select-String which returns the line that matches (think grep).
$InputFile = get-content logfile.log | ForEach-Object {
if ($_ -match '(?<=123.{17})(.{11})') {
$Matches[1]
}
}
Haven't tested this, but it should work (or something similar).
You don't really need the lookaround regex for that:
$InputFile = get-content logfile.log
$InputFile -match '123.{28}' -replace '123.{17}(.{11}).+','$1'
I am using a regular expression search to match up and replace some text. The text can span multiple lines (may or may not have line breaks).
Currently I have this:
$regex = "\<\?php eval.*?\>"
Get-ChildItem -exclude *.bak | Where-Object {$_.Attributes -ne "Directory"} |ForEach-Object {
$text = [string]::Join("`n", (Get-Content $_))
$text -replace $RegEx ,"REPLACED"}
Try this:
$regex = New-Object Text.RegularExpressions.Regex "\<\?php eval.*?\>", ('singleline', 'multiline')
Get-ChildItem -exclude *.bak |
Where-Object {!$_.PsIsContainer} |
ForEach-Object {
$text = (Get-Content $_.FullName) -join "`n"
$regex.Replace($text, "REPLACED")
}
A regular expression is explicitly created via New-Object so that options can be passed in.
Try changing your regex pattern to:
"(?s)\<\?php eval.*?\>"
to get singleline (dot matches any char including line terminators). Since you aren't using the ^ or $ metacharacters I don't think you need to specify multiline (^ & $ match embedded line terminators).
Update: It seems that -replace makes sure the regex is case-insensitive so the i option isn't needed.
One should use the (.|\n)+ expression to cross line boundaries
since . doesn't match new lines.