PowerShell regex filter files - regex

I am trying to filter files using PowerShell, and I need to insert a new line character in between </tr><tr> to break those into separate lines and then remove all the lines that match <tr> lots of characters BTE lots of characters </tr> and save the files in place.
Forgive me, as I am new to PowerShell, and this is simple in SED, but I must use PowerShell. This is what I have but could be completely wrong.
Get-Content *.htm | Foreach-Object {$_ -replace '</tr><tr>', '</tr>\r\n<tr>'; $_}f
Get-Content *.htm | Foreach-Object {$_ -replace '<tr>.*BTE.*</tr>', ''; $_}

So it just sounds like you need to save your changes back to the original files. Also we should just be able to make these changes in one pass instead of reading the files twice.
Get-ChildItem *.htm | Foreach-Object {
$singleFileName = $_.FullName
(Get-Content $singleFileName) -replace '</tr><tr>', "</tr>`r`n<tr>" -replace '<tr>.*BTE.*</tr>' | Set-Content $singleFileName
}
You can't read and write to the same file in the pipe. We place (Get-Content $singleFileName) in parenthesis so that the whole file is read at once.
Get-Content $singleFileName | Set-Content $singleFileName
As each line is passed down the pipe the file is left open so that Set-Content can't write to it.

I don't think you have to insert the line break if RegEx is able to capture the group like this.
Get-ChildItem *.htm | Foreach-Object {
$singleFileName = $_.FullName
([RegEx]::Matches((Get-Content $singleFileName),'<tr>.*?</tr>')).Value|?{$_ -notlike '<tr>*BTE*</tr>'} | Set-Content $singleFileName
}

Related

PowerShell replace unknown 3 letter word after operator [duplicate]

I have a simple textfile and I need a powershell script to replace some parts of the file content.
My current script is the following:
$content = Get-Content -path "Input.json"
$content -Replace '"(\d+),(\d{1,})"', '$1.$2' | Out-File "output.json"
Is it possible to write it in one line without the content variable, like this?
Get-Content -path "Input.json" | ??? -Replace '"(\d+),(\d{1,})"', '$1.$2' | Out-File "output.json"
I don't know how I can use the output of the first get-content commandlet in the second command without the $content variable? Is there an automatic powershell variable
Is it possible to do more replacements than one in a pipeline.
Get-Content -path "Input.json" | ??? -Replace '"(\d+),(\d{1,})"', '$1.$2' | ??? -Replace 'second regex', 'second replacement' | Out-File "output.json"
Yes, you can do that in one line and don't even need a pipeline, as -replace works on arrays like you would expect it to do (and you can chain the operator):
(Get-Content Input.json) `
-replace '"(\d+),(\d{1,})"', '$1.$2' `
-replace 'second regex', 'second replacement' |
Out-File output.json
(Line breaks added for readability.)
The parentheses around the Get-Content call are necessary to prevent the -replace operator being interpreted as an argument to Get-Content.
Is it possible to write it in one line without the content variable, like this?
Yes: use ForEach-Object (or its alias %) and then $_ to reference the object on the pipeline:
Get-Content -path "Input.json" | % { $_ -Replace '"(\d+),(\d{1,})"', '$1.$2' } | Out-File "output.json"
Is it possible to do more replacements than one in a pipeline.
Yes.
As above: just adding more Foreach-Object segments.
As -replace returns the result, they can be chained in a single expression:
($_ -replace $a,$b) -replace $c,$d
I suspect the parentheses are not needed, but I think they make it easier to read: clearly
more than a few chained operators (especially if the match/replacements are non-trivial) will
not be clear.

Remove lines from file if do not match regular expression

For every file in a directory I wish to remove lines that match a regular expression (beginning with |B for example) using powershell.
I think I can do this via Get-ChildItem on the directory, foreach-object, get-content and some sort of if -match but I'm really struggling to fit it all together.
Any help would be massively appreciated. This is the first time I've ever written a powershell script.
Something like the below should get you in the right direction
$files = Get-ChildItem "C:\your\dir"
foreach ($file in $files) {
$c = Get-Content $file.fullname | where { $_ -notmatch "^\|B" }
$c | Set-Content $file.fullname
}

PowerShell regex export match contents

I am learning regex and am trying to get a better understanding by using a text file with the value $100,000 in it. What I am trying to do is to search the text file for the string "$100,000" and if it is there export the value out into a new CSV. this is what I'm using so far.
[io.file]::readalltext("c:\utilities\notes_$datetime.txt") -match("[$][0-9][0-9][0-9],[0-9][0-9][0-9]") | Out-File C:\utilities\amount.txt -Encoding ascii -Force
Which returns true. Can someone point me in the right direction as to grabbing the string value that it finds into a new CSV?
many thanks!
You're reading the file into a single string, not an array of lines, so you should use the Select-String -AllMatches instead of the -match operator:
[IO.File]::ReadAllText("c:\utilities\notes_$datetime.txt") |
Select-String '\$\d{3},\d{3}' -AllMatches |
% { $_.Matches.Groups.Value } |
Out-File C:\utilities\amount.txt -Encoding ascii -Force
As a side note, using Get-Content -Raw would be slightly more PoSh than using .Net methods, although .Net methods provide better performance.
Get-Content "c:\utilities\notes_$datetime.txt" -Raw |
Select-String '\$\d{3},\d{3}' -AllMatches |
% { $_.Matches.Groups.Value } |
Out-File C:\utilities\amount.txt -Encoding ascii -Force
I prefer to use [regex]::match for that:
$x = 'text bla $100,000 text text'
[regex]::Match($x,"\$[\d]{3},[\d]{3}").Groups[0].Value
I also changed the expression a little bit ($ followed by 3 numbers, followed by a "," and another 3 numbers).
So your script could look like this:
$fileContent = Get-Content "c:\utilities\notes_$datetime.txt"
[regex]::Match($fileContent,"\$[\d]{3},[\d]{3}").Groups[0].Value | Out-File C:\utilities\amount.txt -Encoding ascii -Force
Why not use the Select-String cmdlet - far easier:
Select-String .\infile.csv -pattern '\$[\d]{3},[\d]{3}' | Select Line | Out-File outfile.txt
You can then process multiple files like so:
Get-Childitem *.csv | Select-String -pattern '\$[\d]{3},[\d]{3}' | Select Line | Out-File outfile.txt
The Select-String has the following properties:
Line - the line where the regex found a match
LineNumber - the line number in the file where the match was found
Filename - the name of the file the match was found in

Find combinations of two strings in multiple files

I want to search through files in a folder and find the following strings in each file and I want to output it to a file. I would like to find a combination of 2 strings in the files no matter how it is written in the file. I should be able to find these combination of strings even if a carriage return exists in the middle of these 2 strings.
Here's the code I have so far:
$Path = "C:\Promotion\Scripts"
$txt_string1 = "CREATE"
$txt_string2 = "PROC"
$PathArray = #()
$Results = "C:\Promotion\Errors\Deployment_Errors.txt"
# This code snippet gets all the files in $Path that end in ".sql".
Get-ChildItem $Path -Filter "*.sql" |
Where-Object { $_.Attributes -ne "Directory"} |
ForEach-Object {
If (Get-Content $_.FullName | Select-String -Pattern $txt_string2) {
$PathArray += $_.FullName
}
}
$PathArray | ForEach-Object {$_} | Out-File $Results
for find more than one string in txt file You should Use like this method
"hello","guy","hello guy" | Select-String -Pattern '(hello.*guy)|(guy.*hello)'
the result :
hello guy
after you find strings you want out-file
like that:
"hello","guy","hello guy" | Select-String -Pattern '(hello.*guy)|(guy.*hello)' | Out-File -FilePath c:\test.txt
now we see in test.txt
PS C:\> Get-Content test.txt
hello guy
You can do this without loops. Define the combinations of your two search terms as alternatives in a regular expression with multiline support enabled ((?ms)).
$basepath = 'C:\Promotion\Scripts'
$results = 'C:\Promotion\Errors\Deployment_Errors.txt'
$term1 = 'CREATE'
$term2 = 'PROC'
$pattern = "(?ms)($term1.*$term2|$term2.*$term1)"
Get-ChildItem "$basepath\*.sql" |
? { Get-Content $_.FullName -Raw | Select-String -Pattern $pattern } |
select -Unique -Expand FullName |
Out-File $results
Note that this will report any file that contains both terms anywhere in it, no matter what other text is between them. If you want to find only files that contain combinations of the two terms either not separated (CREATEPROC or PROCCREATE) or separated nothing but whitespace, change the pattern to this:
$pattern = "(?ms)($term1\s*$term2|$term2\s*$term1)"
Depending on your search terms it may also be a good idea to escape them before building the regular expression, so that you don't get unwanted meta characters (not likely with the two string literals you have, but just to be on the safe side):
$term1 = [regex]::Escape('CREATE')
$term2 = [regex]::Escape('PROC')

Use powershell ForEach-Object to match and replace string with regex

I use the below pipeline to read a file and replace a line in it and save it to another file, but found that the string in target file is not replaced, it's still the old one.
original line is : name-1a2b3c4d
new line should be: name-6a5e4r3h
(Get-Content "test1.xml") | ForEach-Object {$_ -replace '^name-.*$', "name-6a5e4r3h"} | Set-Content "test2.xml"
Anything missing there?
One thing you're missing is that the -replace operator works just fine on an array, which means you don't need that foreach-object loop at all:
(Get-Content "test1.xml") -replace '^name-.*$', 'name-6a5e4r3h' | Set-Content test2.xml
You're not changing the $_ variable.
You might try:
$lines = Get-Content $file
$len = $lines.count
for($i=0;$i-lt$len;$i++){
$lines[$i] = $lines[$i] -replace $bad, $good
}
$lines > $outfile