Find multiple lines spanning text and replace using PowerShell - regex

I am using a regular expression search to match up and replace some text. The text can span multiple lines (may or may not have line breaks).
Currently I have this:
$regex = "\<\?php eval.*?\>"
Get-ChildItem -exclude *.bak | Where-Object {$_.Attributes -ne "Directory"} |ForEach-Object {
$text = [string]::Join("`n", (Get-Content $_))
$text -replace $RegEx ,"REPLACED"}

Try this:
$regex = New-Object Text.RegularExpressions.Regex "\<\?php eval.*?\>", ('singleline', 'multiline')
Get-ChildItem -exclude *.bak |
Where-Object {!$_.PsIsContainer} |
ForEach-Object {
$text = (Get-Content $_.FullName) -join "`n"
$regex.Replace($text, "REPLACED")
}
A regular expression is explicitly created via New-Object so that options can be passed in.

Try changing your regex pattern to:
"(?s)\<\?php eval.*?\>"
to get singleline (dot matches any char including line terminators). Since you aren't using the ^ or $ metacharacters I don't think you need to specify multiline (^ & $ match embedded line terminators).
Update: It seems that -replace makes sure the regex is case-insensitive so the i option isn't needed.

One should use the (.|\n)+ expression to cross line boundaries
since . doesn't match new lines.

Related

Powershell Remove Special Character(s) from Filenames

I am looking for a way to remove several special characters from filenames via a powershell script.
My filenames look like this:
[Report]_first_day_of_month_01_(generated_by_powershell)_[repnbr1].txt
I have been puzzling over removing the [] and everything between them, the () and everything between those, and removing all the _'s as well, with the desired result being a filename that looks like this:
first day of month 01.txt
Thus far, I have tried the below solution to no avail. I have run each of these from the directory in which the files reside.
Get-ChildItem -Path .\ -Filter *.mkv | %{
$Name = $_.Name
$NewName = $Name -Replace "(\s*)\(.*\)",''
$NewName2 = $NewName -Replace "[\s*]\[.*\]",''
$NewName3 = $NewName2 -Replace "_",' '
Rename-Item -Path $_ -NewName $NewName3
}
Since it does not work even if I try and do one set at a time like this:
Get-ChildItem -Path .\ -Filter *.mkv | %{
$Name = $_.Name
$NewName = $Name -Replace "(\s*)\(.*\)",''
Rename-Item -Path $_ -NewName $NewName
}
I assume there is an inherent flaw in the way I am trying to accomplish this task. That being said, I would prefer to use the Rename-Item cmdlet rather than using a move-item solution.
gci *.txt | Rename-Item -NewName {$_ -replace '_*(\[.*?\]|\(.*?\))_*' -replace '_+', ' '}
The rename is a regex which matches [text] or (text) blocks and replaces them with nothing. Parentheses and brackets need escaping in regexes to match them literally. It matches them with optional leading or trailing underscores to get [Report]_ or _[repnbr1] because it would leave _ at the start or end of the name and they would become leading/trailing spaces, which is annoying. Then it replaces remaining underscores with spaces.
See the regex working here: Regex101

How to use regular expression matching groups in batch renames?

I'm trying to do some regular expression based bulk renames with PowerShell.
This succesfully gives me only the files I need:
Get-ChildItem . | Where-Object { $_.Name -cmatch "(\b|_)(L|H|M|X{1,3})(_|\b)" }
(all those that contain an uppercase L, M, X, ...)
Next, I want to rename, i.e. mycustom_M.png to processed_M.png, another_L.png to processed_L.png, and so forth.
Basically, I would use the regexp .*?(?:\b|_)(L|H|M|X{1,3})(?:_|\b).* to select the names, and processed_\1.png to replace them if I was in Notepad++, but I can't get it to work in PowerShell (I'm surely missing the right syntax here):
[...] | Rename-Item -NewName { $_.Name -replace ".*?(?:\b|_)(L|H|M|X{1,3})(?:_|\b).*","banner_$Matches.groups[1].value" }
Backreferences in PowerShell start with a $, not a \. However, you must either put the replacement expression in single quotes or escape the $, otherwise PowerShell would expand the $1 as a regular variable:
$pattern = ".*?(?:\b|_)(L|H|M|X{1,3})(?:_|\b).*"
... | Rename-Item -NewName { $_.Name -replace $pattern, 'banner_$1' }
or
$pattern = ".*?(?:\b|_)(L|H|M|X{1,3})(?:_|\b).*"
... | Rename-Item -NewName { $_.Name -replace $pattern, "banner_`$1" }

Remove lines from file if do not match regular expression

For every file in a directory I wish to remove lines that match a regular expression (beginning with |B for example) using powershell.
I think I can do this via Get-ChildItem on the directory, foreach-object, get-content and some sort of if -match but I'm really struggling to fit it all together.
Any help would be massively appreciated. This is the first time I've ever written a powershell script.
Something like the below should get you in the right direction
$files = Get-ChildItem "C:\your\dir"
foreach ($file in $files) {
$c = Get-Content $file.fullname | where { $_ -notmatch "^\|B" }
$c | Set-Content $file.fullname
}

Powershell ignoring look behind regular expression to return entire line

A simple enough question I hope.
I have a text log file that includes the following line:
123,010502500114082000000009260000000122001T
I want to search through the log file and return the "00000000926" section of the above text. So I wrote a regular expression:
(?<=123.{17}).{11}
So when the look behind text equals '123' with 17 characters, return the next 11. This works fine when tested on online regex editors. However in Powershell the entire line is returned instead of the 11 characters I want and I can't understand why.
$InputFile = get-content logfile.log
$regex = '(?<=123.{17}).{11}'
$Inputfile | select-string $regex
(entire line is returned).
Why is powershell returning the entire line?
Don't discount Select-String just yet. Like Briantist says it is doing what you want it to but you need to extract the data you actually want in one of two ways. Select-String returns Microsoft.PowerShell.Commands.MatchInfo objects and not just raw strings. Also we are going to use Select-String's ability to take file input directly.
$InputFile = "logfile.log"
$regex = '(?<=123.{17}).{11}'
Select-string $InputFile -Pattern $regex | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value
Of if you have at least PowerShell 3.0
(Select-string $InputFile -Pattern $regex).Matches.Value
Which gives in both cases
00000009260
It's because you're using Select-String which returns the line that matches (think grep).
$InputFile = get-content logfile.log | ForEach-Object {
if ($_ -match '(?<=123.{17})(.{11})') {
$Matches[1]
}
}
Haven't tested this, but it should work (or something similar).
You don't really need the lookaround regex for that:
$InputFile = get-content logfile.log
$InputFile -match '123.{28}' -replace '123.{17}(.{11}).+','$1'

Powershell regex group replacing

I want to replace some text in every script file in folder, and I'm trying to use this PS code:
$pattern = '(FROM [a-zA-Z0-9_.]{1,100})(?<replacement_place>[a-zA-Z0-9_.]{1,7})'
Get-ChildItem -Path 'D:\Scripts' -Recurse -Include *.sql | ForEach-Object { (Get-Content $_.fullname) -replace $pattern, 'replace text' | Set-Content $_.fullname }
But I have no idea how to keep first part of expression, and just replace the second one. Any idea how can I do this? Thanks.
Not sure that provided regex for tables names is correct, but anyway you could replace with captures using variables $1, $2 and so on, and following syntax: 'Doe, John' -ireplace '(\w+), (\w+)', '$2 $1'
Note that the replacement pattern either needs to be in single quotes ('') or have the $ signs of the replacement group specifiers escaped ("`$2 `$1").
# may better replace with $pattern = '(FROM) (?<replacement_place>[a-zA-Z0-9_.]{1,7})'
$pattern = '(FROM [a-zA-Z0-9_.]{1,100})(?<replacement_place>[a-zA-Z0-9_.]{1,7})'
Get-ChildItem -Path 'D:\Scripts' -Recurse -Include *.sql | % `
{
(Get-Content $_.fullname) | % `
{ $_-replace $pattern, '$1 replace text' } |
Set-Content $_.fullname -Force
}
If you need to reference other variables in your replacement expression (as you may), you can use a double-quoted string and escape the capture dollars with a backtick
{ $_-replace $pattern, "`$1 replacement text with $somePoshVariable" } |