remove date from filename programmatically - regex

I'm trying to find a solution to strip some dates out of filenames programmatically. My files have the following format:
net_20110909_servercleanup.pdf
or
net_servercleanup_20110909.pdf
I've used the solution posted below (found on Stack Overflow also) to update some of the filenames but I would ideally have one solution that could update all files in my directories. I'd like to strip the date and one of the underscores out so the final file looks like this:
net_servercleanup.pdf
I'd like to do this from a batch file or PowerShell. I've seen some solutions that accomplish something like this using RegEx but I don't know enough about them to create something that will work.
Any suggestions on how to accomplish this?
$filelist = (get-childitem c:\folder | Where-Object {$_.mode -match "a"} | foreach-object {$_.name})
foreach ($file in $filelist)
{
$len = $file.length
$newname = $file.substring(0,$len -13)
$newname = $newname + '.txt'
Rename-Item C:\folder\$file $newname
clear-variable newname, len
}

PowerShell, untested but should work:
$filelist = Get-ChildItem C:\folder | Where-Object {$_.Mode -match "a"} `
| Foreach-Object {$_.FullName}
foreach ($fullpath in $filelist)
{
$newpath = $fullpath -replace "_(19|20)[0-9]{6}"
Rename-Item -Path $fullpath -NewName $newpath -WhatIf
}
The _(19|20)[0-9]{6} regular expression matches the following pattern: leading "_" followed by "19" or "20" and then any six digits. If you have file names where date does not strictly match your example, you may need to modify the regex to catch them all.
The -WhatIf switch allows you to do a "dry run" i.e. test cmdlets like Remove-Item without actually performing any file operations. Remove it when everything looks OK and you are ready to proceed with actual renaming.

I don't know what that language(?) is, but in C++, I'd do it by separating it into pieces based on your separator (this case, an underscore). Basically, I'd get the substring from the start to the character before the first underscore, store it into a stream (stringstream to be exact), get substring from the character after the first underscore to the character before the second underscore, ... , and so on. and then from the stream, I'd get the pieces one by one and check if it is an integer, if it is an integer then I discard it, otherwise it is appended to a string, if the string is not empty then I append a separator (an underscrore) before adding the piece.
I could write the code in c++ but I'm not sure if that would help

If you know that your filenames will always be of the form you mentioned you can just remove the underscore and 8 digits. Try this:
get-childitem c:\folder | Where-Object {$_.mode -match "a"} | foreach-object {
rename-item $_.FullName ($_.FullName -replace '_\d{8}') -WhatIF
}
Remove the -whatif to actually perform the rename. the -replace parameter takes a regex that matches an underscore followed by 8 digits. Since you do not specify what to replace the match with, it is replaced with an empty string.
Note that this renames all of the files to the same filename causing Rename-Item to error if the file exists. If these are in nested subfolders and you want to iterate through them all you need to add a -Recursive parameter to get-childitem.

try this regex:
_\d{8}
and replace with empty. this matchs _20110909 in
net_20110909_servercleanup.pdf or net_servercleanup_20110909.pdf
and result is net_servercleanup.pdf.

As this is also tagged as batch,
This code uses a for /f command to remove the numbers and underscores from the filename, keeping the first and second remaining elements joined with an underscore and then renames the file.
#echo off
setlocal enableextensions disabledelayedexpansion
for /r "c:\some\folder" %%f in ("net_*.pdf"
) do for /f "tokens=1,2 delims=_0123456789" %%a in ("%%~nf"
) do echo ren "%%~ff" "%%a_%%b%%~xf"
For testing, ren command is prefixed with a echo command. If the output is correct, remove the echo
Of course, if more than a matching file is found inside a folder, as it is impossible to have two files with the same name inside the same folder, the rename operation will fail for second or later files inside the same folder.

Related

Powershell Rename dynamic filenames containing square brackets, from the filetype scans in the directory

I don't much know(in details and specifics) about Powershell's silly and ridiculous issues/bugs in handling square brackets(just because it escapes strings multiple times internally) in the path strings, where I have to use Regex with asterisk(*) to match/catch the patterns.
I did heavy Googling and found that there's method [WildcardPattern]::Escape($Filename) that could help me Rename-Item such dynamic file paths, I thought the below code would work with such dynamic paths which are result of file-type scans in the current folder, but disappointingly, it doesn't:
Set-Location "$PSScriptRoot"
$MkvFiles = Get-ChildItem -Filter *.mkv -Path $Path
Foreach ($MkvFile in $MkvFiles) {
$MkvOrigName = [WildcardPattern]::Escape($MkvFile.Name)
$MkvOrigFullname = [WildcardPattern]::Escape($MkvFile.FullName)
If ($MkvOrigName -Match '.*(S[0-9]{2}E[0-9]{2}).*') {
$NewNameNoExt = $MkvOrigFullname -Replace '.*(S[0-9]{2}E[0-9]{2}).*', '$1'
$NewName = "$NewNameNoExt.mkv"
Rename-Item $MkvOrigFullname -NewName $NewName
}
}
I am getting the following error with Rename-Item command when I run the above script on the folder that contains the files such as given at the end of question:
Rename-Item : An object at the specified path C:\Users\Username\Downloads\WebseriesName Season
4\WebSeriesName.2016.S04E13.iNTERNAL.480p.x264-mSD`[eztv`].mkv does not exist.
At C:\Users\Username\Downloads\WebseriesName Season 4\BulkFileRenamerFinalv1.ps1:12 char:9
+ Rename-Item $MkvOrigFullname -NewName $NewName
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Webseries file paths in the current folder, that I am dealing with are like these:
WebSeriesName.2016.S04E01.HDTV.x264-SVA[eztv].mkv
WebSeriesName.2016.S04E02.HDTV.x264-SVA[eztv].mkv
....
....
WebSeriesName.2016.S04E12.iNTERNAL.480p.x264-mSD[eztv].mkv
WebSeriesName.2016.S04E13.iNTERNAL.480p.x264-mSD[eztv].mkv
Someone could help me figuring out this problem generically without need to headbang with what the filenames strings contain, as long as they contain the string like S04E01,S04E02 etc. and surely contain square brackets ? That is, how can I escape the square brackets and rename them, as apparent in the code afore-mentioned, to the names given below ?
S04E01.mkv
S04E02.mkv
....
....
S04E12.mkv
S04E13.mkv
If you use the pipeline, you don't need to worry about escaping paths. This is because PSPath property will automatically bind to the -LiteralPath parameter on Rename-Item.
Set-Location "$PSScriptRoot"
$MkvFiles = Get-ChildItem -Filter *.mkv -Path $Path
Foreach ($MkvFile in $MkvFiles) {
If ($MkvFile.Name -Match '.*(S[0-9]{2}E[0-9]{2}).*') {
$MkvFile | Rename-Item -NewName {"{0}{1}" -f $matches.1,$_.Extension}
}
}
Explanation:
The -NewName parameter supports delay-bind scripting. So we can use a script block to do the property/string manipulation.
If wildcards are not needed for the path query, then using -LiteralPath is the best approach. The -LiteralPath value is bound exactly as typed (literal/verbatim string). -Path for Get-ChildItem accepts wildcards, but -Path for Rename-Item does not support wildcards. Yet it seems like PowerShell still cares when parsing the command. If you must escape some wildcard characters in a -Path parameter that accepts wildcards, then double quoted paths require 4 backticks and single quoted paths require 2 backticks. This is because two levels of escape are required.
When using -match against a single string even if in a conditional statement, the $matches automatic variable is updated when a match is successful. Capture group matches are accessed using syntax $matches.capturegroupname or $matches[capturegroupname]. Since you did not name the capture group, it was automatically named 1 by the system. A second set of () around a capturing group, would have been 2. It is important to remember that when -match is False, $matches is not updated from its previous value.
Examples of handling wildcard characters in -Path parameters that support wildcards:
# Using double quotes in the path
$Path = "WebSeriesName.2016.S04E01.HDTV.x264-SVA````[eztv].mkv"
Get-ChildItem -Path $Path
# Using single quotes in the path
$Path = "WebSeriesName.2016.S04E01.HDTV.x264-SVA``[eztv].mkv"
Get-ChildItem -Path $Path
# Using LiteralPath
$Path = "WebSeriesName.2016.S04E01.HDTV.x264-SVA[eztv].mkv"
Get-ChildItem -LiteralPath $Path
Rename-Item -LiteralPath $Path -NewName 'MyNewName.mkv'
# Using WildcardPattern Escape method
$Path = 'WebSeriesName.2016.S04E01.HDTV.x264-SVA[eztv].mkv'
$EscapedPath = ([WildcardPattern]::Escape([WildcardPattern]::escape($path)))
Get-ChildItem -Path $EscapedPath

PowerShell Replace Syntax: Regex Substitution and Environmental Variable

I am trying to substitute both regex and an environment variable and can't find the correct syntax (because of the mismatch of single and double quotes). The short script I am developing will rename files. Here is what my setup looks like a few of the ways I tried.
# Original File Name: (BRP-01-001-06K48b-SC-CC-01).tif
# Desired File Name: (BRP-12-345-06K48b-SC-CC-01).tif
# Variables defined by user:
PS ..\user> $pic,$change="01-001","12-345"
# The problem is with the "-replace" near the end of the command
PS ..\user> get-childitem .\* -recurse | where-object {$_.BaseName -match ".*\(BRP-$pic-.{15}\).*"} | foreach-object {$orig=($_.FullName); $new=($orig -replace "(\(BRP-)($pic)(-.{15}\))", '$1$change$3'); echo $new}
PS ..\user> (BRP-$change-06K48b-SC-CC-01).tif
# Also tried:
PS ..\user> get-childitem .\* -recurse | where-object {$_.BaseName -match ".*\(BRP-$pic-.{15}\).*"} | foreach-object {$orig=($_.FullName); $new=($orig -replace "(\(BRP-)($pic)(-.{15}\))", "`$1$change`$3"); echo $new}
PS ..\user> $112-345-06K48b-SC-CC-01).tif
# If I put a space before $change:
PS ..\user> get-childitem .\* -recurse | where-object {$_.BaseName -match ".*\(BRP-$pic-.{15}\).*"} | foreach-object {$orig=($_.FullName); $new=($orig -replace "(\(BRP-)($pic)(-.{15}\))", "`$1 $change`$3"); echo $new}
PS ..\user> (BRP- 12-345-06K48b-SC-CC-01).tif
In the last example it "works" if I add space before $change ... but I do not want the space. I realize I could do another replace operation to fix the space but I would like to do this all in one command if possible.
What syntax do I need to replace using both environment variables and regex substitutions?
Also, out of curiosity, once working, will this replace all occurrences within a file name or just the first. For instance, will the file:
"Text (BRP-01-001-06K48b-SC-CC-01) Text (BRP-01-001-06K48b-SC-OR-01)"
change to:
"Text (BRP-12-345-06K48b-SC-CC-01) Text (BRP-12-345-06K48b-SC-OR-01)"
or only the first match, like:
"Text (BRP-12-345-06K48b-SC-CC-01) Text (BRP-01-001-06K48b-SC-OR-01)"
Best practice is surrounding your capture group name in {} or using named capture groups within your substitution string. Using {} with your second example, should work out nicely.
"Text (BRP-01-001-06K48b-SC-CC-01) Text (BRP-01-001-06K48b-SC-OR-01)" -replace "(\(BRP-)($pic)(-.{15}\))", "`${1}$change`${3}"
When PowerShell variables, capture groups, and string literals are in the replacement string, you can't use surrounding single quotes. Using surrounding double quotes allows inner expansion to happen. As a result, you will need to backtick escape $ used to identify capture groups.
Your second example has the proper syntax, typically, but because $change begins with digits, it creates unintended consequences. You are escaping $ in the substitution string to use capture groups 1 and 3. Since $change evaluates to 12-345, the intended capture group 1 is actually capture group 112, which doesn't exist. See below for an illustration of your second attempt:
"(\(BRP-)($pic)(-.{15}\))":
Capture Group 1: (BRP-
Capture Group 2: 01-001
Capture Group 3: -06K48b-SC-CC-01)
"`$1$change`$3" at runtime becomes $112-345$3 and then becomes $112-345-06K48b-SC-CC-01). Notice that $112 has been interpolated before the capture groups are substituted. Then capture group 112 is checked. Since it does not exist, $112 is just assumed to be a string.
The below might be what you are after,
$pic = "01-001"
$change = "12-345"
$RenameFiles_FilterStr = "*BRP-$pic*.tif"
gci $RenameFiles_FilterStr -recurse | % { $_.BaseName -replace $pic,$change }
# The above returns renamed strings (files not renamed yet). If the expected result matches the returned ones, then uncomment the below and run to rename the files
# gci $RenameFiles_FilterStr -recurse | % { Rename-Item -NewName ($_.BaseName -replace $pic,$change) }

How to move first 7 characters of a file name to the end using Powershell

My company has millions of old reports in pdf form. They are Typically named in the format: 2018-09-18 - ReportName.pdf
The organization we need to submit these to is now requiring that we name the files in this format: Report Name - 2018-09.pdf
I need to move the first 7 characters of the file name to the end. I'm thinking there is probably an easy code to perform this task, but I cannot figure it out. Can anyone help me.
Thanks!
Caveat:
As jazzdelightsme points out, the desired renaming operation can result in name collisions, given that you're removing the day component from your dates; e.g., 2018-09-18 - ReportName.pdf and 2018-09-19 - ReportName.pdf would result in the same filename, Report Name - 2018-09.pdf.
Either way, I'm assuming that the renaming operation is performed on copies of the original files. Alternatively, you can create copies with new names elsewhere with Copy-Item while enumerating the originals, but the advantage of Rename-Item is that it will report an error in case of a name collision.
Get-ChildItem -Filter *.pdf | Rename-Item -NewName {
$_.Name -replace '^(\d{4}-\d{2})-\d{2} - (.*?)\.pdf$', '$2 - $1.pdf'
} -WhatIf
-WhatIf previews the renaming operation; remove it to perform actual renaming.
Add -Recurse to the Get-CildItem call to process an entire directory subtree.
The use of -Filter is optional, but it speeds up processing.
A script block ({ ... }) is passed to Rename-Item's -NewName parameter, which enables dynamic renaming of each input file ($_) received from Get-ChildItem using a string-transformation (replacement) expression.
The -replace operator uses a regex (regular expression) as its first operand to perform string replacements based on patterns; here, the regex breaks down as follows:
^(\d{4}-\d{2}) matches something like 2018-09 at the start (^) of the name and - by virtue of being enclosed in (...) - captures that match in a so-called capture group, which can be referenced in the replacement string by its index, namely $1, because it is the first capture group.
(.*?) captures the rest of the filename excluding the extension in capture group $2.
The ? after .* makes the sub-expression non-greedy, meaning that it will give subsequent sub-expressions a chance to match too, as opposed to trying to match as many characters as possible (which is the default behavior, termed greedy).
\.pdf$ matches the the filename extension (.pdf) at the end ($) - note that case doesn't matter. . is escaped as \., because it is meant to be matched literally here (without escaping, . matches any single character in a single-line string).
$2 - $1.pdf is the replacement string, which arranges what the capture groups captured in the desired form.
Note that any file whose name doesn't match the regex is quietly left alone, because the -replace operator passes the input string through if there is no match, and Rename-Item does nothing if the new name is the same as the old one.
Get-ChildItem with some RegEx and Rename-Item can do it:
Get-ChildItem -Path "C:\temp" | foreach {
$newName = $_.Name -replace '(^.{7}).*?-\s(.*?)\.(.*$)','$2 - $1.$3'
$_ | Rename-Item -NewName $newName
}
The RegEx
'(^.{7}).*?-\s(.*?)\.(.*$)' / $2 - $1.$3
(^.{7}) matches the first 7 characters
.*?-\s matches any characters until (and including) the first found - (space dash space)
(.*?)\. matches anything until the first found dot ( . )
(.*$) matches the file extension in this case
$2 - $1.$3 puts it all together in the changed order
This won't properly work if there are filenames with multiple dots ( . ) in it.
This should work (added some test data):
$test = '2018-09-18 - ReportName.pdf','2018-09-18 - Other name.pdf','other pattern.pdf','2018-09-18 - double.extension.pdf'
$test | % {
$match = [Regex]::Match($_, '(?<Date>\d{4}-\d\d)-\d\d - (?<Name>.+)\.pdf')
if ($match.Success) {
"$($match.Groups['Name'].Value) - $($match.Groups['Date'].Value).pdf"
} else {
$_
}
}
Something like this -
Get-ChildItem -path $path | Rename-Item -NewName {$_.BaseName.Split(' - ')[-1] + ' - ' + $_.BaseName.SubString(0,7) + $_.Extension} -WhatIf
Explanation -
Split will segregate the name of the file based on the parameter - and [-1] tells PowerShell to select the last of the segregated values.
SubString(0,7) will select 7 characters starting from the first character of the BaseName of the file.
Remove -WhatIf to apply the rename.

Trying to create a power shell script that removes text in a filename between two brackets using regex

I am trying to write a script to take a file name and remove any pair of brackets and the text between them from the string
get-childItem *.* -recurse |
foreach-object {$_ -replace '\(([^\)]+)\)', ''}
this will output a list of new values for every file in the folder to the prompt as it should look, however what I can't seem to find is a way to set the new values as the filenames, the plan is to do this for multiple files in a folder with the format "name(Randomnumbers).ext"
Any help is appreciated
From my understanding of your question, you want to rename each with the names contained within the parenthesis. To accomplish that, you can use the $Matches variable that is written by the -match operator. I'm also assuming you want to maintain the file extension.
Get-ChildItem -Recurse | ForEach-Object {
if ($_ -match '(?<name>.*)(?:\([^\)]+\))(?<ext>.*)') {
Rename-Item $_ "$($matches['name'])$($matches['ext'])"
}
}

Keep first regex match and discard others

Yep another regex question... I am using PowerShell to extract a simple number from a filename when looping through a folder like so:
# sample string "ABCD - (123) Sample Text Here"
Get-ChildItem $processingFolder -filter *.xls | Where-Object {
$name = $_.Name
$pattern = '(\d{2,3})'
$metric = ([regex]$pattern).Matches($name) | { $_.Groups[1].Value }
}
All I am looking for is the number surrounded by brackets. This is successful, but it appears the $_.Name actually grabs more than just the name of the file, and the regex ends up picking up some other bits I don't want.
I understand why, as it's going through each regex match as an object and taking the value out of each and putting in $metric. I need some help editing the code so it only bothers with the first object.
I would just use -match etc if I wasn't bothered with the actual contents of the match, but it needs to be kept.
I don't see a cmdlet call before $_.Groups[1].Value which should be ForEach-Object but that is a minor thing. We need to make a small improvement on your regex pattern as well to account for the brackets but not include them in the return.
$processingFolder = "C:\temp"
$pattern = '\((\d+)\)'
Get-ChildItem $processingFolder -filter "*.xls" | ForEach-Object{
$details = ""
if($_.Name -match $pattern){$details = $matches[1]}
$_ | Add-Member -MemberType NoteProperty -Name Details -Value $details -PassThru
} | select name, details
This will loop all the files and try and match numbers in brackets. If there is more than one match it should only take the first one. We use a capture group in order to ignore the brackets in the results. Next we use Add-Member to make a new property called Details which will contain the matched value.
Currently this will return all files in the $processingFolder but a simple Where-Object{$_.Details} would return just the ones that have the property populated. If you have other properties that you need to make you can chain the Add-Members together. Just don't forget the -passthru.
You could also just make your own new object if you need to go that route with multiple custom parameters. It certainly would be more terse. That last question I answered has an example of that.
After doing some research in to the data being returned itself (System.Text.RegularExpressions.MatchCollection) I found the Item method, so called that on $metric like so:
$name = '(111) 123 456 789 Name of Report Here 123'
$pattern = '(\d{2,3})'
$metric = ([regex]$pattern).Matches($name)
Write-Host $metric.Item(1)
Whilst probably not the best approach, it returns what I'm expecting for now.