regex select multilines in powershell - regex

I created a file like this
echo "test 1", Hello, foo, bar, world, "test 2" > test.txt
and the result is this:
test 1
Hello
foo
bar
a better world
test 2
I need to remove all the text starting with the keyword "Hello" and ending with "world", including both keywords.
Something like this
test 1
test 2
I tried
$pattern='(?s)(?<=/Hello/\r?\n).*?(?=world)'
(Get-Content -Path .\test.txt -Raw) -replace $pattern, "" | Set-Content -Path .\test.txt
but nothing happend.
What can I try?

Assuming you want to remove the starting and ending keywords you could use either (?s)\s*Hello.*world or (?s)\s*Hello.*?world depending on if you want .* to be greedy or lazy.
(Get-Content path\to\file.txt -Raw) -replace '(?s)\s*Hello.*world' |
Set-Content path\to\result.txt
Use -creplace for case sensitive matching of the keywords.

Leaving aside that there are extraneous / in your regex, reformulate it as follows:Tip of the hat to Santiago Squarzon.
$pattern = '(?sm)^Hello\r?\n.*?world\r?\n'
(Get-Content -Path .\test.txt -Raw) -replace $pattern |
Set-Content -Path .\test.txt
This removes the line starting with Hello all the way through the (first) subsequent line that ends in world, including the next newline.
This yields the desired output, as shown in your question.
As for what you tried:
Aside from the extraneous / chars., your primary problem is that you are using look-around assertions ((?<=...), (?=...)), which cause what they match not to be captured as part of the overall match, and are therefore not replaced by -replace.

I think this is a duplicate with How can I deleted lines from a certain position? or any of the included other duplicates:
'test1', 'Hello', 'foo', 'bar', 'world', 'test2' |SelectString -From '(?=Hello)' -To '(?<=world)'

Related

Extract string from text file via Powershell

I have been trying to extract certain values from multiple lines inside a .txt file with PowerShell.
Host
Class
INCLUDE vmware:/?filter=Displayname Equal "server01" OR Displayname Equal "server02" OR Displayname Equal "server03 test"
This is what I want :
server01
server02
server03 test
I have code so far :
$Regex = [Regex]::new("(?<=Equal)(.*)(?=OR")
$Match = $Regex.Match($String)
You may use
[regex]::matches($String, '(?<=Equal\s*")[^"]+')
See the regex demo.
See more ways to extract multiple matches here. However, you main problem is the regex pattern. The (?<=Equal\s*")[^"]+ pattern matches:
(?<=Equal\s*") - a location preceded with Equal and 0+ whitespaces and then a "
[^"]+ - consumes 1+ chars other than double quotation mark.
Demo:
$String = "Host`nClass`nINCLUDE vmware:/?filter=Displayname Equal ""server01"" OR Displayname Equal ""server02"" OR Displayname Equal ""server03 test"""
[regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value}
Output:
server01
server02
server03 test
Here is a full snippet reading the file in, getting all matches and saving to file:
$newfile = 'file.txt'
$file = 'newtext.txt'
$regex = '(?<=Equal\s*")[^"]+'
Get-Content $file |
Select-String $regex -AllMatches |
Select-Object -Expand Matches |
ForEach-Object { $_.Value } |
Set-Content $newfile
Another option (PSv3+), combining [regex]::Matches() with the -replace operator for a concise solution:
$str = #'
Host
Class
INCLUDE vmware:/?filter=Displayname Equal "server01" OR Displayname Equal "server02" OR Displayname Equal "server03 test"
'#
[regex]::Matches($str, '".*?"').Value -replace '"'
Regex ".*?" matches all "..."-enclosed tokens; .Value extracts them, and -replace '"' strips the " chars.
It may be not be obvious, but this happens to be the fastest solution among the answers here, based on my tests - see bottom.
As an aside: The above would be even more PowerShell-idiomatic if the -match operator - which only looks for a (one) match - had a variant named, say, -matchall, so that one could write:
# WISHFUL THINKING (as of PowerShell Core 6.2)
$str -matchall '".*?"' -replace '"'
See this feature suggestion on GitHub.
Optional reading: performance comparison
Pragmatically speaking, all solutions here are helpful and may be fast enough, but there may be situations where performance must be optimized.
Generally, using Select-String (and the pipeline in general) comes with a performance penalty - while offering elegance and memory-efficient streaming processing.
Also, repeated invocation of script blocks (e.g., { $_.Value }) tends to be slow - especially in a pipeline with ForEach-Object or Where-Object, but also - to a lesser degree - with the .ForEach() and .Where() collection methods (PSv4+).
In the realm of regexes, you pay a performance penalty for variable-length look-behind expressions (e.g. (?<=EQUAL\s*")) and the use of capture groups (e.g., (.*?)).
Here is a performance comparison using the Time-Command function, averaging 1000 runs:
Time-Command -Count 1e3 { [regex]::Matches($str, '".*?"').Value -replace '"' },
{ [regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value} },
{ [regex]::Matches($str, '\"(.*?)\"').Groups.Where({$_.name -eq '1'}).Value },
{ $str | Select-String -Pattern '(?<=Equal\s*")[^"]+' -AllMatches | ForEach-Object{$_.Matches.Value} } |
Format-Table Factor, Command
Sample timings from my MacBook Pro; the exact times aren't important (you can remove the Format-Table call to see them), but the relative performance is reflected in the Factor column, from fastest to slowest.
Factor Command
------ -------
1.00 [regex]::Matches($str, '".*?"').Value -replace '"' # this answer
2.85 [regex]::Matches($str, '\"(.*?)\"').Groups.Where({$_.name -eq '1'}).Value # AdminOfThings'
6.07 [regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value} # Wiktor's
8.35 $str | Select-String -Pattern '(?<=Equal\s*")[^"]+' -AllMatches | ForEach-Object{$_.Matches.Value} # LotPings'
You can modify your regex to use a capture group, which is indicated by the parentheses. The backslashes just escape the quotes. This allows you to just capture what you are looking for and then filter it further. The capture group here is automatically named 1 since I didn't provide a name. Capture group 0 is the entire match including quotes. I switched to the Matches method because that encompasses all matches for the string whereas Match only captures the first match.
$regex = [regex]'\"(.*?)\"'
$regex.matches($string).groups.where{$_.name -eq 1}.value
If you want to export the results, you can do the following:
$regex = [regex]'\"(.*?)\"'
$regex.matches($string).groups.where{$_.name -eq 1}.value | sc "c:\temp\export.txt"
An alterative reading the file directly with Select-String using Wiktor's good RegEx:
Select-String -Path .\file.txt -Pattern '(?<=Equal\s*")[^"]+' -AllMatches|
ForEach-Object{$_.Matches.Value} | Set-Content NewFile.txt
Sample output:
> Get-Content .\NewFile.txt
server01
server02
server03 test

Regular expression seems not to work in Where-Object cmdlet

I am trying to add quote characters around two fields in a file of comma separated lines. Here is one line of data:
1/22/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
which I would like to become this:
1/22/2018 0:00:00,"0000000","001B9706BE",1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
I began developing my regular expression in a simple PowerShell script, and soon I have the following:
$strData = '1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0'
$strNew = $strData -replace "([^,]*),([^,]*),([^,]*),(.*)",'$1,"$2","$3",$4'
$strNew
which gives me this output:
1/29/2018 0:00:00,"0000000","001B9706BE",1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
Great! I'm all set. Extend this example to the general case of a file of similar lines of data:
Get-Content test_data.csv | Where-Object -FilterScript {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", '$1,"$2","$3",$4'
}
This is a listing of test_data.csv:
1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104938428,0016C4C483,1,45,0,1,0,0,0,0,0,0,0,0,0,0,35,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104943875,0016C4B0BC,1,31,0,1,0,0,0,0,0,0,0,0,0,0,25,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104948067,0016C4834D,1,33,0,1,0,0,0,0,0,0,0,0,0,0,23,0,1,0,0,0,0,0,0,0,0,0,0
This is the output of my script:
1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104938428,0016C4C483,1,45,0,1,0,0,0,0,0,0,0,0,0,0,35,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104943875,0016C4B0BC,1,31,0,1,0,0,0,0,0,0,0,0,0,0,25,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104948067,0016C4834D,1,33,0,1,0,0,0,0,0,0,0,0,0,0,23,0,1,0,0,0,0,0,0,0,0,0,0
I have also tried this version of the script:
Get-Content test_data.csv | Where-Object -FilterScript {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", "`$1,`"`$2`",`"`$3`",$4"
}
and obtained the same results.
My simple test script has convinced me that the regex is correct, but something happens when I use that regex inside a filter script in the Where-Object cmdlet.
What simple, yet critical, detail am I overlooking here?
Here is my PSVerion:
Major Minor Build Revision
----- ----- ----- --------
5 0 10586 117
You're misunderstanding how Where-Object works. The cmdlet outputs those input lines for which the -FilterScript expression evaluates to $true. It does NOT output whatever you do inside that scriptblock (you'd use ForEach-Object for that).
You don't need either Where-Object or ForEach-Object, though. Just put Get-Content in parentheses and use that as the first operand for the -replace operator. You also don't need the 4th capturing group. I would recommend anchoring the expression at the beginning of the string, though.
(Get-Content test_data.csv) -replace '^([^,]*),([^,]*),([^,]*)', '$1,"$2","$3"'
This seems to work here. I used ForEach-Object to process each record.
Get-Content test_data.csv |
ForEach-Object { $_ -replace "([^,]*),([^,]*),([^,]*),(.*)", '$1,"$2","$3",$4' }
This also seems to work. Uses the ? to create a reluctant (lazy) capture.
Get-Content test_data.csv |
ForEach-Object { $_ -replace '(.*?),(.*?),(.*?),(.*)', '$1,"$2","$3",$4' }
I would just make a small change to what you have in order for this to work. Simply change the script to the following, noting that I changed the -FilterScript to a ForEach-Object and fixed a minor typo that you had on the last item in the regular expression with the quotes:
Get-Content c:\temp\test_data.csv | ForEach-Object {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", "`$1,`"`$2`",`"`$3`",`"`$4"
}
I tested this with the data you provided and it adds the quotes to the correct columns.

How to make select-string only match records that are 6 characters long?

I’m creating a script that reads a text file and compares the results to an array. It works fine, but I have some records that say they match but they don’t.
For example - TG1032 and TG match according to the select-string script.
Here is my select-string:
$Sel = select-string -pattern $strArrVal -path $txt
Is there a way to alter this to make select-string only match records that are 6 characters long?
I would still like to point out where your pattern is wrong but the solution will most likely be the same regardless. If you are looking to match lines that are exactly 6 characters then you could just use the pattern ^.{6}$.
$strArrVal = "^.{6}$"
Select-String -Pattern $strArrVal -Path $txt
If that is really all you are looking for then regex is not really required. You could do this with Get-Content with similar results
Get-Content $txt | Where-Object{$_.length -eq 6}

Regex replace contents of file and delete lines that don't match

I have a large log file where I want to extract certain types of lines. I have created a working regex to match these lines. How can I now use this regex to extract the lines and nothing else? I have tried
cat .\file | %{
if($_ -match "..."){
$_ -replace "...", '...'
}
else{
$_ -replace ".*", ""
}
}
Which almost works, but the lines that are not of interest still remain as blank lines (meaning the lines of interested are spaced VERY far apart).
The best way is to remove the else clause altogether. If you do that, then no object will be returned from that iteration of the ForEach-Object block.
cat .\file | %{
if($_ -match "..."){
$_ -replace "...", '...'
}
}
Just to append to briantist's answer you don't even need the loop structure. -match and -replace will function as array operators. Removing the need for the if and ForEach-Object.
(Get-Content .\file) -match "..." -replace "...","..."
Get-Content being the target of the alias cat

Powershell replace exact string

I want to replace a simple string "WEEK." (with a dot) in a text file with the string "TEST"
$LOG= "C:\FILE.TXT"
$A= "TEST"
(Get-Content $LOG) | Foreach { $_ -Replace "WEEK.", $A } | Set-Content $LOG;
The problem is that my file has this content:
WEEK_A WEEK.
And when I run my script the result is:
TESTA TEST
and the result that i want is:
WEEK_A TEST
I try with ^ "WEEK." and "^WEEK.$" but it not worked
Can you help me with the regexp? Thanks
====== EDIT ==================
Ok. I try with
$LOG= "C:\FILE.TXT"
$A= "TEST"
(Get-Content $LOG) | Foreach { $_ -Replace "WEEK\.", $A } | Set-Content $LOG;
and seems its works
The reason why this happened is because you have used pattern WEEK. The dot was a problem: in a regular expression world, the dot means "any character". That's why it was replacing both WEEK_ and WEEK..
When you have added backslash, then the dot was escaped ie. it lost it's special meaning. Thus making it work.