Regex to Exclude something and bulk rename files - regex

I'm trying to rename all files of a directory, remove parts of files names excluding some parts.
for example:
Before --> After
file1: Something S01E01 Hello There Guys.srt --> S01E01.srt
file2: Something_else S03E22 Good.bye.srt --> S03E22.srt
etc.
I tried following code in powershell:
Get-ChildItem | rename-item -NewName {$_.name -replace "Something",""}
Get-ChildItem | rename-item -NewName {$_.name -replace "Good.bye",""}
Get-ChildItem | rename-item -NewName {$_.name -replace "Something_else",""}
Get-ChildItem | rename-item -NewName {$_.name -replace " Hello(.*?)\.srt",".srt"}
Get-ChildItem | rename-item -NewName {$_.name -replace " ",""}
Any idea about the right regex code instead of hardcoding to exclude "SxxExx.srt" Part of file name and remove the other parts of name?

You'll want to use a pattern like this:
(S\d{2}E\d{2})
to match and capture the S01E01 part.
Get-ChildItem | Rename-Item -NewName {($_.BaseName -replace '^.*(S\d{2}E\d{2}).*$','$1') + $_.Extension}

Maybe,
\b[A-Z][0-9]+[A-Z][0-9]+\b\s*|\.srt
or,
\b[A-Z][0-9]+[A-Z][0-9]+\b\s*|\.srt[^\r\n]*
or some similar expression being replaced with an empty string might be somewhat close.
Demo 1
Demo 2

Related

Renaming files with a plus sign in PowerShell

When I've downloaded a bunch of files from dropbox, all Swedish character ä becomes +ñ. I'd like to replace this +ñ to ä.
My command is the following:
Get-ChildItem -Filter "*+ñ*" -Recurse | Rename-Item -NewName {$_.name -replace '"+ñ"','ä'}
But running this gets the follwing error message:
Rename-Item : The input to the script block for parameter 'NewName' failed. Invalid regular expression pattern: +ñ.
At line:1 char:60
+ Get-ChildItem -Filter "*+ñ*" -Recurse| Rename-Item -NewName <<<< {$_.name -replace $str1,"ä"}
+ CategoryInfo : InvalidArgument: (S+ñker.txt:PSObject) [Rename-Item], ParameterBindingException
+ FullyQualifiedErrorId : ScriptBlockArgumentInvocationFailed,Microsoft.PowerShell.Commands.RenameItemCommand
So I've boiled it down to the + character is the problem. How do I handle + and other types of characters that isn't automatically handled in PowerShell?
The -replace operator does a regex search. Since + is a quantifier you have to escape it using a backslash:
Get-ChildItem -Filter "*+ñ*" -Recurse | Rename-Item -NewName {$_.name -replace '"\+ñ"','ä'}
You could also use the non regex version:
Get-ChildItem -Filter "*+ñ*" -Recurse | Rename-Item -NewName {$_.name.replace('"+ñ"','ä')}

match multi-line string

I am using a PowerShell command to find all *.vue files (it's a simple text format) in a directory, where I need to match this:
7,Id
6,Default
So, these are 2 consecutive lines. With Notepad++ I see CRLF at the end of the line. Following Google searches, this must be close:
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse |
Select-String -Pattern "7,Id\r\n6,Default" -CaseSensitive |
Out-File C:\test.txt
But it does not find the files. I checked that I can find the first part (7,Id) correctly, and also the second part (6,Default), but the combination with the newline is not working.
Any ideas please? Maybe an alternative?
I can have a workaround but it's inefficient and a lot of coding. For example, I could use PowerShell to provide a list of only the first sentence, then process these files to see if it matches the second sentence as well. I want to avoid that.
You need to pass the content of the file as a single string, otherwise Select-String will apply the pattern to each line separately.
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse | ForEach-Object {
Get-Content $_.FullName | Out-String |
Select-String -Pattern "7,Id\r\n6,Default" -CaseSensitive |
Select-Object -Expand Matches |
Select-Object -Expand Groups |
Select-Object -Expand Value
} | Out-File C:\test.txt
On PowerShell v3 and newer you can use Get-Content -Raw instead of Get-Content | Out-String.
As an alternative to Select-String you could use the -cmatch operator in a Where-Object filter:
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse | ForEach-Object {
Get-Content $_.FullName | Out-String | Where-Object {
$_ -cmatch "7,Id\r\n6,Default"
} | ForEach-Object {
$matches[0]
}
} | Out-File C:\test.txt
With Select-String, the -Pattern parameter is regex capable, so try this:
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse |
Select-String -Pattern "7,Id|6,Default" -CaseSensitive |
Out-File C:\test.txt
The vertical pipe bar (|) acts as an alternative separator, or in otherwords, an "or" operator. With the pattern it will match either.

Replace all but last instance of a character

I am writing a quick PowerShell script to replace all periods except the last instance.
EG:
hello. this. is a file.name.doc → hello this is a filename.doc
So far from another post I was able to get this regexp, but it does not work with PowerShell:
\.(?=[^.]*\.)
As per https://www.regex101.com/, it only matches the first occurrence of a period.
EDIT:
Basically I need to apply this match and replace to a directory with sub directories. So far I have this:
Get-ChildItem -Filter "*.*" | ForEach {
$_.BaseName.Replace('.','') + $_.Extension
}
But it does not actually replace the items, and I do not think it is recursive.
I tried a few variations:
Get-Item -Filter "*.*" -Recurse |
Rename-Item -NewName {$_.BaseName.Replace(".","")}
but I get the error message
source and destination path must be different
I had the PowerShell side of things working but was stuck on the RegEx part. I was able to match either all the "." or only the last "." which was part of the file extension. Then I found this post with the missing link: \.(?=[^.]*\.)
I added that to the rest of the PowerShell command and it worked perfectly.
Get-ChildItem -Recurse | Rename-Item -NewName {$_.Name -replace '\.(?=[^.]*\.)',' ' }
Exclude files that don't have a period in their basename from being renamed:
Get-ChildItem -File -Recurse | Where-Object { $_.BaseName -like '*.*' } |
Rename-Item -NewName {$_.BaseName.Replace('.', '') + $_.Extension}

Specific search in powershell

I need to search file names in a directory for position based characters. I am looking for files with parenthesis within parenthesis.
like this:
# 2262281102-03_Cutting_Plate_Lower_Stop_(Anschlag_Cutting_Frame_(Schnittgestell)_unten)_400kN
GET-CHILDITEM C:\BU\p -recurse | WHERE-OBJECT {$_.nAME -MATCH "(?!)((?!)((!?))(!?))(!?)"}
I also need to match any file with 4+ letters and no parenthesis. ie:
# 2277131504-03_Haltebolzen_platte
GET-CHILDITEM C:\BU\p -EXCLUDE "*)*" -recurse | WHERE-OBJECT {$_.nAME -MATCH "\W\.[^\W]"}
I've got this:
$tests = #(
'2262281102-03_Cutting_Plate_Lower_Stop_(Anschlag_Cutting_Frame_(Schnittgestell)_unten)_400kN',
'2277131504-03_Haltebolzen_platte'
)
$regex = '^.*\(.*\(.*\).*\).*$|^[^()]*[a-z]{4}[^()]*$'
$tests -match $regex
2262281102-03_Cutting_Plate_Lower_Stop_(Anschlag_Cutting_Frame_(Schnittgestell)_unten)_400kN
2277131504-03_Haltebolzen_platte

Get regex working in powershell script

I'm trying to rename several files using a regex expression.
ck1823000-23.dat
ck1293834-67.dat
lo1230324-99.dat
pk1232131-34.dat
...
I want to remove -XX
So the result would be like this:
ck1823000.dat
ck1293834.dat
lo1230324.dat
pk1232131.dat
...
I came up with this regex:
(?:.*?)([-\\s].*?).dat
But I get this error:
Rename-Item : The input to the script block for parameter 'NewName'
failed. The regular expression pattern is not valid
When I run this command:
Get-ChildItem . -file -Filter "*.dat" | Rename-Item -newname { $_.name -replace "\(?:.*?)([-\\s].*?).dat\", ""}
Use the below regex and then replace the matched characters with an empty string.
-[^.-]*(?=\\.dat)
DEMO
Get-ChildItem . -file -Filter "*.dat" | Rename-Item -newname { $_.name -replace "-[^.-]*(?=\\.dat)", ""}
Another option you can use basename instead of name property
Get-ChildItem . -file -Filter "*.dat" |
Rename-Item -newname { $_.basename -replace "-.*"}