Rename files using substring extraction - regex

I am using a script to rename files with extension .xml. I need to remove the first 6 characters and truncate the file name to only 11 characters. So far I have script:
Get-ChildItem -Filter "*.xml" | Rename-Item -newname { $_.name.substring(6)}
How am I able to also have this script remove any characters in the string that are in place 11 or after?
For example a file with name: export12345678910112.xml will result in 12345678910.xml.
The length of the file name is always required to be only 11 characters excluding the extension.

Using the -replace operator with a regular expression is probably the best choice:
Get-ChildItem -Filter "*.xml" |
Rename-Item -newname { $_.name -replace '^.{6}(.{11}).*(\.xml)$', '$1$2' } -WhatIf
-WhatIf previews the renaming operation; remove it to perform actual renaming.
To demonstrate the solution with your sample filename:
> 'export12345678910112.xml' -replace '^.{6}(.{11}).*(\.xml)$', '$1$2'
12345678910.xml

Related

Convert date format in file names to yyyy-mm-dd (ISO Format) - Currently in multiple other formats, including yyyy.mm.dd, mm.dd.yyyy, and mm-dd-yyyy

I have a file system that has files at various folder levels with the date in the filename in the wrong date format.
I'm looking to use powershell to go through and update them to the correct date format (ISO Date).
This is what I have figured out so far (for switching yyyy.mm.dd to yyyy-mm-dd), but it's not quite right:
Get-ChildItem -File -Recurse | % { Rename-Item -Path $_.PSPath -NewName $_.Name.replace("201[0-9]\.[0-9][0-9]\.[ -~].","201[0-9]-[0-9][0-9]-[ -~].")}
The following script is successful in changing dots to dashes, but it changes all dots to dashes, and I want to be careful not to change any dots after the 10th character, and only on files that match the formatting mentioned (e.g. 201[0-9].[0-9][0-9].[0-9][0-9]).
Get-ChildItem -Path $_.PSPath -Filter "*.pdf" | Rename-Item -NewName { $_.BaseName.Replace(".","-") + $_.Extension }
I know I'm getting close with that first one, but I'm not quite there. Does anyone have any suggestions for changes to make to it?
Thank you for the help.
Another way to do it with a regex. When you are confident that the files will be renamed correctly, remove the -WhatIf from the Rename-Item cmdlet.
Get-ChildItem -Path . -File |
ForEach-Object {
if ($_.Name -match '(.*)(\d{4})\.(\d{2})\.(\d{2})(.*)') {
Rename-Item -Path $_.FullName -NewName $($_.Name -replace '(.*)(\d{4})\.(\d{2})\.(\d{2})(.*)','$1$2-$3-$4$5') -Whatif
}
}
When working with dates I suggest parsing them into datetime objects so you can take advantage of the built in features.
Since you are working with a number of formats where the days and months are easily transposed, I suggest individual finely tuned filters. You could also use a switch and match on basename. Edit: I rewrote using a switch instead.
You can also use -WhatIf to confirm that the rename-item works properly.
Get-ChildItem -path $path -recurse -include *.pdf | Where-Object BaseName -match ".*[\d\.\-]{10}$" | Foreach-Object {
$null = $PSItem.BaseName -match "(.*)([\d\.\-]{10})$"
Switch -regex ($matches[2]){
"\d{4}\-\d{2}\-\d{2}" {$date = [datetime]::ParseExact($matches[2],"yyyy-dd-MM",$null)}
"\d{2}\.\d{2}\.\d{4}" {$date = [datetime]::ParseExact($matches[2],"MM.dd.yyyy",$null)}
"\d{4}\.\d{2}\.\d{2}" {$date = [datetime]::ParseExact($matches[2],"yyyy.dd.MM",$null)}
}
$PSItem | rename-Item -NewName "$(Matches[1]).($date.ToString("yyyy-MM-dd")$($PSItem.Extenstion))"
}

How to match first characters with a powershell script

I am trying to move files to a certain folder if they start with a letter and delete them if they start with anything other than a letter.
My code:
Function moveOrDelete($source, $dest)
{
$aToZ = '^[a-zA-Z].*'
$notALetter = '^[^a-zA-Z].*'
Get-ChildItem -Path $source\$aToZ -Recurse | Move-Item -Destination $dest
Get-ChildItem -Path $source\$notALetter -Recurse | Remove-Item
}
As I understand it the caret will match on the first character when it's outside of the brackets. In other words, the regex in the $aToZ variable will match anything that begins with a letter. the .* part will allow the rest of the file name to be anything. The caret inside the brackets negates the statement so if the file name begins with anything other than a letter it will match. I can't get it to work and I'm not getting any errors which leads me to believe that my regex is wrong.
I have checked this with online tools including this one: https://regex101.com/ and they check out.
I have also used variations of the regex like ^[a-zA-Z] that don't work. Some patterns like [a-zA-Z]* move the files but it's not the pattern that I want.
Here is how I'm calling the funcion:
moveOrDelete ".\source" ".\dest"
And here are the sample file names I'm using:
a.txt
z.txt
1.txt
.txt
The -Path argument doesn't understand regular expressions, it takes a string and can perform wildcarding but not complex string processing.
So, you need to check the name of each file against the regex with the -match operator. The following should help:
Function moveOrDelete($source, $dest)
{
$aToZ = '^[a-zA-Z].*'
$notALetter = '^[^a-zA-Z].*'
Get-ChildItem -Path $source -Recurse | Where-Object { $_.name -match $aToZ } | Move-Item -Destination $dest
Get-ChildItem -Path $source -Recurse | Where-Object { $_.name -match $notALetter } | Remove-Item
}
Here, you need to filter the file names with the Where-Object cmdlet, then pipe to the move or remove.

Powershell Remove Special Character(s) from Filenames

I am looking for a way to remove several special characters from filenames via a powershell script.
My filenames look like this:
[Report]_first_day_of_month_01_(generated_by_powershell)_[repnbr1].txt
I have been puzzling over removing the [] and everything between them, the () and everything between those, and removing all the _'s as well, with the desired result being a filename that looks like this:
first day of month 01.txt
Thus far, I have tried the below solution to no avail. I have run each of these from the directory in which the files reside.
Get-ChildItem -Path .\ -Filter *.mkv | %{
$Name = $_.Name
$NewName = $Name -Replace "(\s*)\(.*\)",''
$NewName2 = $NewName -Replace "[\s*]\[.*\]",''
$NewName3 = $NewName2 -Replace "_",' '
Rename-Item -Path $_ -NewName $NewName3
}
Since it does not work even if I try and do one set at a time like this:
Get-ChildItem -Path .\ -Filter *.mkv | %{
$Name = $_.Name
$NewName = $Name -Replace "(\s*)\(.*\)",''
Rename-Item -Path $_ -NewName $NewName
}
I assume there is an inherent flaw in the way I am trying to accomplish this task. That being said, I would prefer to use the Rename-Item cmdlet rather than using a move-item solution.
gci *.txt | Rename-Item -NewName {$_ -replace '_*(\[.*?\]|\(.*?\))_*' -replace '_+', ' '}
The rename is a regex which matches [text] or (text) blocks and replaces them with nothing. Parentheses and brackets need escaping in regexes to match them literally. It matches them with optional leading or trailing underscores to get [Report]_ or _[repnbr1] because it would leave _ at the start or end of the name and they would become leading/trailing spaces, which is annoying. Then it replaces remaining underscores with spaces.
See the regex working here: Regex101

Replace all but last instance of a character

I am writing a quick PowerShell script to replace all periods except the last instance.
EG:
hello. this. is a file.name.doc → hello this is a filename.doc
So far from another post I was able to get this regexp, but it does not work with PowerShell:
\.(?=[^.]*\.)
As per https://www.regex101.com/, it only matches the first occurrence of a period.
EDIT:
Basically I need to apply this match and replace to a directory with sub directories. So far I have this:
Get-ChildItem -Filter "*.*" | ForEach {
$_.BaseName.Replace('.','') + $_.Extension
}
But it does not actually replace the items, and I do not think it is recursive.
I tried a few variations:
Get-Item -Filter "*.*" -Recurse |
Rename-Item -NewName {$_.BaseName.Replace(".","")}
but I get the error message
source and destination path must be different
I had the PowerShell side of things working but was stuck on the RegEx part. I was able to match either all the "." or only the last "." which was part of the file extension. Then I found this post with the missing link: \.(?=[^.]*\.)
I added that to the rest of the PowerShell command and it worked perfectly.
Get-ChildItem -Recurse | Rename-Item -NewName {$_.Name -replace '\.(?=[^.]*\.)',' ' }
Exclude files that don't have a period in their basename from being renamed:
Get-ChildItem -File -Recurse | Where-Object { $_.BaseName -like '*.*' } |
Rename-Item -NewName {$_.BaseName.Replace('.', '') + $_.Extension}

Get regex working in powershell script

I'm trying to rename several files using a regex expression.
ck1823000-23.dat
ck1293834-67.dat
lo1230324-99.dat
pk1232131-34.dat
...
I want to remove -XX
So the result would be like this:
ck1823000.dat
ck1293834.dat
lo1230324.dat
pk1232131.dat
...
I came up with this regex:
(?:.*?)([-\\s].*?).dat
But I get this error:
Rename-Item : The input to the script block for parameter 'NewName'
failed. The regular expression pattern is not valid
When I run this command:
Get-ChildItem . -file -Filter "*.dat" | Rename-Item -newname { $_.name -replace "\(?:.*?)([-\\s].*?).dat\", ""}
Use the below regex and then replace the matched characters with an empty string.
-[^.-]*(?=\\.dat)
DEMO
Get-ChildItem . -file -Filter "*.dat" | Rename-Item -newname { $_.name -replace "-[^.-]*(?=\\.dat)", ""}
Another option you can use basename instead of name property
Get-ChildItem . -file -Filter "*.dat" |
Rename-Item -newname { $_.basename -replace "-.*"}