Remove lines from file if do not match regular expression - regex

For every file in a directory I wish to remove lines that match a regular expression (beginning with |B for example) using powershell.
I think I can do this via Get-ChildItem on the directory, foreach-object, get-content and some sort of if -match but I'm really struggling to fit it all together.
Any help would be massively appreciated. This is the first time I've ever written a powershell script.

Something like the below should get you in the right direction
$files = Get-ChildItem "C:\your\dir"
foreach ($file in $files) {
$c = Get-Content $file.fullname | where { $_ -notmatch "^\|B" }
$c | Set-Content $file.fullname
}

Related

PowerShell - How to Update a file based on content from another file

I've searched all over including here at StackOverFlow and I cannot seem to find the solution I am needing help with. Here is my issue.
Lets say in File1.txt I have the following (no spaces between each line)
\\Serv02\LOC6\Client\726C30\032383\2200018023.pdf
\\Serv02\LOC6\Client\726C30\032383\2200718091.pdf
\\Serv02\LOC6\Client\726C30\030684\2300309040.pdf
\\Serv02\LOC6\Client\726C30\031274\2300429971.pdf
File2.txt will have the same information, however, I am needing to add a 1 right before the .pdf for each one (within file2.txt)
Example:
\\Serv02\LOC6\Client\726C30\032383\22000180231.pdf
I can easily update file2.txt using a RegEx statement, however it's only updating the contents based on that RegEx statement.
File2.txt will have a lot more data in it than file1.txt (more of the exact type of information). I am only needing to update file2.txt adding in the 1 right before .pdf BASED on what is in file1.txt
Here is the code I am using but as you can see it does not read file1.txt at all, I'm just using a RegEx statement to update file2.txt adding in the 1 before .pdf (the code below works to add in the 1 before .pdf, but I'm not iterating through file1.txt)
clear-host
set-location c:\temp
$File = "C:\Temp\file1.txt"
$FileZ = "C:\Temp\file2.txt"
$File2 = (Get-ChildItem $fileZ) | Select -ExpandProperty BaseName
$regex01 = '(\\Serv02\LOC6\Client\726C30\\d{1,6}\\d{1,10})(.pdf)$'
get-content $fileZ | % { $_ -replace $regex01, '${1}1${2}' -join "`r`n" } | out-file -Encoding default "c:\Temp\$File2.txt"
start-sleep -Seconds 2
$NewMRC = Get-ChildItem "$file2.txt" | Select -ExpandProperty Name
Get-ChildItem $NewMRC | rename-item -NewName {$_.Name -replace ".txt",".MRC2"}
If file1.txt had another line that didn't match up to the RegEx as shown above, file2.txt would not be updated with that line
\\Serv03\LOC7\Client\780D30\031456\8675309123.pdf
I hope I have explained this well enough. I'm not new to PowerShell but I am far from an expert. Any assistance is greatly appreciated.
I've modified your code as follows. The approach is read the content of File1.txt and store it in a variable. Then iterate on each line of File2.txt to check it against the regex as well as if that line is present in file1 content. If yes then replace it with whatever you want. Output this to a .tmp file in append mode. Once all the lines in File2.txt are processed, then replace it with .tmp file.
clear-host
set-location c:\temp
$File = "file1.txt"
$FileZ = "file2.txt"
# PS2
$File1 = get-content $File | Out-String
# PS3
# $File1 = get-content $File -Raw
$File2 = (Get-ChildItem $fileZ) | Select -ExpandProperty BaseName
if( test-path "$File2.tmp" ) { remove-item "$File2.tmp" }
$regex01 = '(\\\\Serv02\\LOC6\\Client\\726C30\\\d{1,6}\\\d{1,10})(.pdf)$'
get-content $fileZ |% {
$line = $_
$find = $line -replace '\\','\\'
if ( ($line -match $regex01) -AND ( $File1 -match $find ) ) {
$line -replace $regex01,'${1}1${2}' -join "`r`n"
} else {
$line
}
} | out-file "$File2.tmp" -append
remove-item "$File2.txt"
rename-item "$File2.tmp" "$File2.txt"
#start-sleep -Seconds 2
#$NewMRC = Get-ChildItem "$file2.txt" | Select -ExpandProperty Name
#Get-ChildItem $NewMRC | rename-item -NewName {$_.Name -replace ".txt",".MRC2"}
Notes:
The last 3 lines of your code doesn't seem to be related to your problem statement. So I've commented those lines.
$find = $line -replace '\\','\\': We are replacing single backslash \ with double backslash \\. But in the first parameter to -replace it must be escaped and in second param it must NOT be. So, even though they look same, they are interpreted differently.
One way to do this: Retrieve file content of first file into an array, then retrieve content of second file. For each line in second file: If first file's content has a line matching the current line, output modified line; otherwise, just output the current line.
$pattern = '(\\{2}(?:[^\\]+\\)+)([^\\\.]+)(\.pdf)'
$file1Content = Get-Content "file1.txt"
Get-Content "file2.txt" | ForEach-Object {
if ( $file1Content -contains $_ ) {
$_ | Select-String $pattern | ForEach-Object {
"{0}{1}1{2}" -f
$_.Matches[0].Groups[1].Value,
$_.Matches[0].Groups[2].Value,
$_.Matches[0].Groups[3].Value
}
}
else {
$_
}
}
First match group ($_.Matches[0].Groups[1].Value) is \\servername\sharename\path, second match group is filename without extension, and third match group is the file extension.

Regex is not working in powershell code, returns nothing

I have a problem with my regex, it is only selecting one error among four errors.
When I use this regex in my powershell code, it does not work. It is returning nothing.
My code is :
Get-ChildItem -Path '/Users/user/Documents/tmp' -filter '*.txt' | ForEach-Object {
$content = Get-Content $_.FullName
[regex]::Matches($content, "(ERROR\:[\S\s\n\r]*?\n)(C:)") | ForEach-Object {
$_.Groups[0].Value -replace '\r?\n'
}
}
My regex is:
https://regex101.com/r/kU9gR4/1
What is the problem in my regex and in my powershell code?

powershell -replace regex

I have the following script which I try to run on various html files
$files = $args[0];
$string1 = $args[1];
$string2 = $args[2];
Write-Host "Replace $string1 with $string2 in $files";
gci -r -include "$files" |
foreach-object { $a = $_.fullname; ( get-content $a ) |
foreach-object {
$_ -replace "%string1" , "$string2" |
set-content $a
}
}
in an attempt to edit this line found in all the files.
<tr><td>TestCase</td></tr>
I call the script from powershell like this (it's called replace.ps1)
./replace *.html sampleTest myNewTest
but instead of changing sampleTest.html to myNewTest.html
it deletes everything in the doc except for the last line,
leaving all of the files like so:
/html
in fact, no matter what arguments I pass in this seems to happen.
Can anyone explain this/help me understand why it's happening?
Your loop structure is to blame here. You need to have the Set-Content located outside the loop. Your code is overwriting the file at every pass.
....
foreach-object { $a = $_.fullname; ( get-content $a ) |
foreach-object {
$_ -replace "$string1" , "$string2" |
} | set-content $a
}
It also might have been a typo but you had "%string1" before which, while syntactically correct, what not what you intended.
Could also have used Add-Content but that would mean you have to erase the file first. set-content $a used at the end of the pipe is more intuitive.
Your example is not one that uses regex. You could have used $_.replace($string1,$string2) with the same results.

PowerShell regex filter files

I am trying to filter files using PowerShell, and I need to insert a new line character in between </tr><tr> to break those into separate lines and then remove all the lines that match <tr> lots of characters BTE lots of characters </tr> and save the files in place.
Forgive me, as I am new to PowerShell, and this is simple in SED, but I must use PowerShell. This is what I have but could be completely wrong.
Get-Content *.htm | Foreach-Object {$_ -replace '</tr><tr>', '</tr>\r\n<tr>'; $_}f
Get-Content *.htm | Foreach-Object {$_ -replace '<tr>.*BTE.*</tr>', ''; $_}
So it just sounds like you need to save your changes back to the original files. Also we should just be able to make these changes in one pass instead of reading the files twice.
Get-ChildItem *.htm | Foreach-Object {
$singleFileName = $_.FullName
(Get-Content $singleFileName) -replace '</tr><tr>', "</tr>`r`n<tr>" -replace '<tr>.*BTE.*</tr>' | Set-Content $singleFileName
}
You can't read and write to the same file in the pipe. We place (Get-Content $singleFileName) in parenthesis so that the whole file is read at once.
Get-Content $singleFileName | Set-Content $singleFileName
As each line is passed down the pipe the file is left open so that Set-Content can't write to it.
I don't think you have to insert the line break if RegEx is able to capture the group like this.
Get-ChildItem *.htm | Foreach-Object {
$singleFileName = $_.FullName
([RegEx]::Matches((Get-Content $singleFileName),'<tr>.*?</tr>')).Value|?{$_ -notlike '<tr>*BTE*</tr>'} | Set-Content $singleFileName
}

Powershell ignoring look behind regular expression to return entire line

A simple enough question I hope.
I have a text log file that includes the following line:
123,010502500114082000000009260000000122001T
I want to search through the log file and return the "00000000926" section of the above text. So I wrote a regular expression:
(?<=123.{17}).{11}
So when the look behind text equals '123' with 17 characters, return the next 11. This works fine when tested on online regex editors. However in Powershell the entire line is returned instead of the 11 characters I want and I can't understand why.
$InputFile = get-content logfile.log
$regex = '(?<=123.{17}).{11}'
$Inputfile | select-string $regex
(entire line is returned).
Why is powershell returning the entire line?
Don't discount Select-String just yet. Like Briantist says it is doing what you want it to but you need to extract the data you actually want in one of two ways. Select-String returns Microsoft.PowerShell.Commands.MatchInfo objects and not just raw strings. Also we are going to use Select-String's ability to take file input directly.
$InputFile = "logfile.log"
$regex = '(?<=123.{17}).{11}'
Select-string $InputFile -Pattern $regex | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value
Of if you have at least PowerShell 3.0
(Select-string $InputFile -Pattern $regex).Matches.Value
Which gives in both cases
00000009260
It's because you're using Select-String which returns the line that matches (think grep).
$InputFile = get-content logfile.log | ForEach-Object {
if ($_ -match '(?<=123.{17})(.{11})') {
$Matches[1]
}
}
Haven't tested this, but it should work (or something similar).
You don't really need the lookaround regex for that:
$InputFile = get-content logfile.log
$InputFile -match '123.{28}' -replace '123.{17}(.{11}).+','$1'