Powershell Remove Special Character(s) from Filenames - regex

I am looking for a way to remove several special characters from filenames via a powershell script.
My filenames look like this:
[Report]_first_day_of_month_01_(generated_by_powershell)_[repnbr1].txt
I have been puzzling over removing the [] and everything between them, the () and everything between those, and removing all the _'s as well, with the desired result being a filename that looks like this:
first day of month 01.txt
Thus far, I have tried the below solution to no avail. I have run each of these from the directory in which the files reside.
Get-ChildItem -Path .\ -Filter *.mkv | %{
$Name = $_.Name
$NewName = $Name -Replace "(\s*)\(.*\)",''
$NewName2 = $NewName -Replace "[\s*]\[.*\]",''
$NewName3 = $NewName2 -Replace "_",' '
Rename-Item -Path $_ -NewName $NewName3
}
Since it does not work even if I try and do one set at a time like this:
Get-ChildItem -Path .\ -Filter *.mkv | %{
$Name = $_.Name
$NewName = $Name -Replace "(\s*)\(.*\)",''
Rename-Item -Path $_ -NewName $NewName
}
I assume there is an inherent flaw in the way I am trying to accomplish this task. That being said, I would prefer to use the Rename-Item cmdlet rather than using a move-item solution.

gci *.txt | Rename-Item -NewName {$_ -replace '_*(\[.*?\]|\(.*?\))_*' -replace '_+', ' '}
The rename is a regex which matches [text] or (text) blocks and replaces them with nothing. Parentheses and brackets need escaping in regexes to match them literally. It matches them with optional leading or trailing underscores to get [Report]_ or _[repnbr1] because it would leave _ at the start or end of the name and they would become leading/trailing spaces, which is annoying. Then it replaces remaining underscores with spaces.
See the regex working here: Regex101

Related

How to match first characters with a powershell script

I am trying to move files to a certain folder if they start with a letter and delete them if they start with anything other than a letter.
My code:
Function moveOrDelete($source, $dest)
{
$aToZ = '^[a-zA-Z].*'
$notALetter = '^[^a-zA-Z].*'
Get-ChildItem -Path $source\$aToZ -Recurse | Move-Item -Destination $dest
Get-ChildItem -Path $source\$notALetter -Recurse | Remove-Item
}
As I understand it the caret will match on the first character when it's outside of the brackets. In other words, the regex in the $aToZ variable will match anything that begins with a letter. the .* part will allow the rest of the file name to be anything. The caret inside the brackets negates the statement so if the file name begins with anything other than a letter it will match. I can't get it to work and I'm not getting any errors which leads me to believe that my regex is wrong.
I have checked this with online tools including this one: https://regex101.com/ and they check out.
I have also used variations of the regex like ^[a-zA-Z] that don't work. Some patterns like [a-zA-Z]* move the files but it's not the pattern that I want.
Here is how I'm calling the funcion:
moveOrDelete ".\source" ".\dest"
And here are the sample file names I'm using:
a.txt
z.txt
1.txt
.txt
The -Path argument doesn't understand regular expressions, it takes a string and can perform wildcarding but not complex string processing.
So, you need to check the name of each file against the regex with the -match operator. The following should help:
Function moveOrDelete($source, $dest)
{
$aToZ = '^[a-zA-Z].*'
$notALetter = '^[^a-zA-Z].*'
Get-ChildItem -Path $source -Recurse | Where-Object { $_.name -match $aToZ } | Move-Item -Destination $dest
Get-ChildItem -Path $source -Recurse | Where-Object { $_.name -match $notALetter } | Remove-Item
}
Here, you need to filter the file names with the Where-Object cmdlet, then pipe to the move or remove.

How to use regular expression matching groups in batch renames?

I'm trying to do some regular expression based bulk renames with PowerShell.
This succesfully gives me only the files I need:
Get-ChildItem . | Where-Object { $_.Name -cmatch "(\b|_)(L|H|M|X{1,3})(_|\b)" }
(all those that contain an uppercase L, M, X, ...)
Next, I want to rename, i.e. mycustom_M.png to processed_M.png, another_L.png to processed_L.png, and so forth.
Basically, I would use the regexp .*?(?:\b|_)(L|H|M|X{1,3})(?:_|\b).* to select the names, and processed_\1.png to replace them if I was in Notepad++, but I can't get it to work in PowerShell (I'm surely missing the right syntax here):
[...] | Rename-Item -NewName { $_.Name -replace ".*?(?:\b|_)(L|H|M|X{1,3})(?:_|\b).*","banner_$Matches.groups[1].value" }
Backreferences in PowerShell start with a $, not a \. However, you must either put the replacement expression in single quotes or escape the $, otherwise PowerShell would expand the $1 as a regular variable:
$pattern = ".*?(?:\b|_)(L|H|M|X{1,3})(?:_|\b).*"
... | Rename-Item -NewName { $_.Name -replace $pattern, 'banner_$1' }
or
$pattern = ".*?(?:\b|_)(L|H|M|X{1,3})(?:_|\b).*"
... | Rename-Item -NewName { $_.Name -replace $pattern, "banner_`$1" }

Replace all but last instance of a character

I am writing a quick PowerShell script to replace all periods except the last instance.
EG:
hello. this. is a file.name.doc → hello this is a filename.doc
So far from another post I was able to get this regexp, but it does not work with PowerShell:
\.(?=[^.]*\.)
As per https://www.regex101.com/, it only matches the first occurrence of a period.
EDIT:
Basically I need to apply this match and replace to a directory with sub directories. So far I have this:
Get-ChildItem -Filter "*.*" | ForEach {
$_.BaseName.Replace('.','') + $_.Extension
}
But it does not actually replace the items, and I do not think it is recursive.
I tried a few variations:
Get-Item -Filter "*.*" -Recurse |
Rename-Item -NewName {$_.BaseName.Replace(".","")}
but I get the error message
source and destination path must be different
I had the PowerShell side of things working but was stuck on the RegEx part. I was able to match either all the "." or only the last "." which was part of the file extension. Then I found this post with the missing link: \.(?=[^.]*\.)
I added that to the rest of the PowerShell command and it worked perfectly.
Get-ChildItem -Recurse | Rename-Item -NewName {$_.Name -replace '\.(?=[^.]*\.)',' ' }
Exclude files that don't have a period in their basename from being renamed:
Get-ChildItem -File -Recurse | Where-Object { $_.BaseName -like '*.*' } |
Rename-Item -NewName {$_.BaseName.Replace('.', '') + $_.Extension}

Remove lines from file if do not match regular expression

For every file in a directory I wish to remove lines that match a regular expression (beginning with |B for example) using powershell.
I think I can do this via Get-ChildItem on the directory, foreach-object, get-content and some sort of if -match but I'm really struggling to fit it all together.
Any help would be massively appreciated. This is the first time I've ever written a powershell script.
Something like the below should get you in the right direction
$files = Get-ChildItem "C:\your\dir"
foreach ($file in $files) {
$c = Get-Content $file.fullname | where { $_ -notmatch "^\|B" }
$c | Set-Content $file.fullname
}

Powershell regex group replacing

I want to replace some text in every script file in folder, and I'm trying to use this PS code:
$pattern = '(FROM [a-zA-Z0-9_.]{1,100})(?<replacement_place>[a-zA-Z0-9_.]{1,7})'
Get-ChildItem -Path 'D:\Scripts' -Recurse -Include *.sql | ForEach-Object { (Get-Content $_.fullname) -replace $pattern, 'replace text' | Set-Content $_.fullname }
But I have no idea how to keep first part of expression, and just replace the second one. Any idea how can I do this? Thanks.
Not sure that provided regex for tables names is correct, but anyway you could replace with captures using variables $1, $2 and so on, and following syntax: 'Doe, John' -ireplace '(\w+), (\w+)', '$2 $1'
Note that the replacement pattern either needs to be in single quotes ('') or have the $ signs of the replacement group specifiers escaped ("`$2 `$1").
# may better replace with $pattern = '(FROM) (?<replacement_place>[a-zA-Z0-9_.]{1,7})'
$pattern = '(FROM [a-zA-Z0-9_.]{1,100})(?<replacement_place>[a-zA-Z0-9_.]{1,7})'
Get-ChildItem -Path 'D:\Scripts' -Recurse -Include *.sql | % `
{
(Get-Content $_.fullname) | % `
{ $_-replace $pattern, '$1 replace text' } |
Set-Content $_.fullname -Force
}
If you need to reference other variables in your replacement expression (as you may), you can use a double-quoted string and escape the capture dollars with a backtick
{ $_-replace $pattern, "`$1 replacement text with $somePoshVariable" } |