Move directories that match given regex - regex

I try to move directories with their contents. Names of the directories are letters followed by digits:
a2,a2321, sdadsa2321321, so the regex would be [a-zA-Z]+\d+. However it doesn't work.
$SourceDirectoryPath = "C:/directory/[a-zA-Z]+\d+"
$TargetFilePath = "C:/directory/target"
New-Item -ItemType "directory" -Path $TargetFilePath
Move-Item -Path $SourceDirectoryPath -Destination $TargetFilePath -Force
If I replace [a-zA-Z]+\d+ with simple wildcards like a* it moves moves multiples directories, this proves that [a-zA-Z]+\d+ is the only incorrect part of the script.
Question: What is the correct form of the regex [a-zA-Z]+\d+ in Powershell? This regex is fully correct in Java, but for some reason it doesn't work here.

Maybe this is what you want:
$sourceDir = 'D:\source'
$destDir = 'D:\destination'
$pattern = '^.*[a-zA-Z]+\d+$'
$baseDir = Get-ChildItem -Path $sourceDir -Recurse -Directory
foreach( $directory in $baseDir ) {
if( $directory.Name -match $pattern ) {
Move-Item -Path $directory.FullName -Destination $destDir -Force
}
}

To use regex matching on files and folders with Get-ChildItem, you will need to use the Where-Object clause.
This should do it:
$SourceDirectoryPath = 'C:\directory'
$TargetFilePath = 'C:\directory\target'
# create the target path if it does not exist yet
if (!(Test-Path -Path $TargetFilePath -PathType Container)) {
$null = New-Item -Path $TargetFilePath -ItemType Directory
}
Get-ChildItem -Path $SourceDirectoryPath -Directory |
Where-Object { $_.Name -match '^[a-z]+\d+$' } |
ForEach-Object { $_ | Move-Item -Destination $TargetFilePath -Force }

If I replace [a-zA-Z]+\d+ with simple wildcards like a* it moves moves multiples directories, this proves that [a-zA-Z]+\d+ is the only incorrect part of the script.
Indeed: The -Path parameter of file-related cmdlets can only accept wildcard expressions (see about_Wildcards), not regular expressions (regexes) (see about_Regular_Expressions).
While distantly related, the two types of expressions are syntactically different: wildcard expressions are conceptually and syntactically simpler, but far less powerful - and not powerful enough for your use case. See AdminOfThings' comment on the question for a quick intro.
Also note that many PowerShell cmdlets conveniently also accept wildcards in other types of arguments (unrelated to the filesystem), such as Get-Command *job* allowing you to find all available commands whose name contains the word job.
By contrast, use of regexes always requires a separate, explicit operation (unless a command is explicitly designed to accept regexes as arguments), via operators such as -match and -replace, cmdlets such as Select-String, or the switch statement with the -Regex option.
In your case, you need to filter the directories of interest from among all subdirectories, by combining the Where-Object cmdlet with -match, the regular-expression matching operator; the syntactically simplest form is to use an operation statement (a cleaner alternative to passing a script block { ... } in which $_ must be used to refer to the input object at hand), as shown in the following command:
# Define the *parent* path of the dirs. to move.
# The actual dirs. must be filtered by regex below.
$SourceDirectoryParentPath = 'C:/directory'
$TargetFilePath = 'C:/directory/target'
# Note: If you add -Force, no error occurs if the directory already exists.
# New-Item produces output, a System.IO.DirectoryInfo in this case.
# To suppress the output, use: $null = New-Item ...
New-Item -ItemType "directory" -Path $TargetFilePath
# Enumerate the child directories of the parent path,
# and filter by whether each child directory's name matches the regex.
Get-ChildItem -Directory $SourceDirectoryParentPath |
Where-Object Name -match '^[a-z]+\d+$' |
Move-Item -Destination $TargetFilePath -Force
Note that I've changed regex [a-zA-Z]+\d+ to ^[a-z]+\d+$, because:
PowerShell's regex matching is case-insensitive by default, so [a-z] covers both upper- and lowercase (English) letters.
The -match operator performs substring matching, so you need to anchor the regex with ^ (match at the start) and $ match at the end in order to ensure tha the entire input string matches your expression.
Also note that I've used a single-quoted string ('...') rather than a double-quoted one ("..."), which is preferable for regex literals, so that no confusion arises between what characters are seen by the regex engine, and which characters PowerShell itself may interpolate, beforehand, notably $ and `.

Related

Powershell Rename dynamic filenames containing square brackets, from the filetype scans in the directory

I don't much know(in details and specifics) about Powershell's silly and ridiculous issues/bugs in handling square brackets(just because it escapes strings multiple times internally) in the path strings, where I have to use Regex with asterisk(*) to match/catch the patterns.
I did heavy Googling and found that there's method [WildcardPattern]::Escape($Filename) that could help me Rename-Item such dynamic file paths, I thought the below code would work with such dynamic paths which are result of file-type scans in the current folder, but disappointingly, it doesn't:
Set-Location "$PSScriptRoot"
$MkvFiles = Get-ChildItem -Filter *.mkv -Path $Path
Foreach ($MkvFile in $MkvFiles) {
$MkvOrigName = [WildcardPattern]::Escape($MkvFile.Name)
$MkvOrigFullname = [WildcardPattern]::Escape($MkvFile.FullName)
If ($MkvOrigName -Match '.*(S[0-9]{2}E[0-9]{2}).*') {
$NewNameNoExt = $MkvOrigFullname -Replace '.*(S[0-9]{2}E[0-9]{2}).*', '$1'
$NewName = "$NewNameNoExt.mkv"
Rename-Item $MkvOrigFullname -NewName $NewName
}
}
I am getting the following error with Rename-Item command when I run the above script on the folder that contains the files such as given at the end of question:
Rename-Item : An object at the specified path C:\Users\Username\Downloads\WebseriesName Season
4\WebSeriesName.2016.S04E13.iNTERNAL.480p.x264-mSD`[eztv`].mkv does not exist.
At C:\Users\Username\Downloads\WebseriesName Season 4\BulkFileRenamerFinalv1.ps1:12 char:9
+ Rename-Item $MkvOrigFullname -NewName $NewName
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Webseries file paths in the current folder, that I am dealing with are like these:
WebSeriesName.2016.S04E01.HDTV.x264-SVA[eztv].mkv
WebSeriesName.2016.S04E02.HDTV.x264-SVA[eztv].mkv
....
....
WebSeriesName.2016.S04E12.iNTERNAL.480p.x264-mSD[eztv].mkv
WebSeriesName.2016.S04E13.iNTERNAL.480p.x264-mSD[eztv].mkv
Someone could help me figuring out this problem generically without need to headbang with what the filenames strings contain, as long as they contain the string like S04E01,S04E02 etc. and surely contain square brackets ? That is, how can I escape the square brackets and rename them, as apparent in the code afore-mentioned, to the names given below ?
S04E01.mkv
S04E02.mkv
....
....
S04E12.mkv
S04E13.mkv
If you use the pipeline, you don't need to worry about escaping paths. This is because PSPath property will automatically bind to the -LiteralPath parameter on Rename-Item.
Set-Location "$PSScriptRoot"
$MkvFiles = Get-ChildItem -Filter *.mkv -Path $Path
Foreach ($MkvFile in $MkvFiles) {
If ($MkvFile.Name -Match '.*(S[0-9]{2}E[0-9]{2}).*') {
$MkvFile | Rename-Item -NewName {"{0}{1}" -f $matches.1,$_.Extension}
}
}
Explanation:
The -NewName parameter supports delay-bind scripting. So we can use a script block to do the property/string manipulation.
If wildcards are not needed for the path query, then using -LiteralPath is the best approach. The -LiteralPath value is bound exactly as typed (literal/verbatim string). -Path for Get-ChildItem accepts wildcards, but -Path for Rename-Item does not support wildcards. Yet it seems like PowerShell still cares when parsing the command. If you must escape some wildcard characters in a -Path parameter that accepts wildcards, then double quoted paths require 4 backticks and single quoted paths require 2 backticks. This is because two levels of escape are required.
When using -match against a single string even if in a conditional statement, the $matches automatic variable is updated when a match is successful. Capture group matches are accessed using syntax $matches.capturegroupname or $matches[capturegroupname]. Since you did not name the capture group, it was automatically named 1 by the system. A second set of () around a capturing group, would have been 2. It is important to remember that when -match is False, $matches is not updated from its previous value.
Examples of handling wildcard characters in -Path parameters that support wildcards:
# Using double quotes in the path
$Path = "WebSeriesName.2016.S04E01.HDTV.x264-SVA````[eztv].mkv"
Get-ChildItem -Path $Path
# Using single quotes in the path
$Path = "WebSeriesName.2016.S04E01.HDTV.x264-SVA``[eztv].mkv"
Get-ChildItem -Path $Path
# Using LiteralPath
$Path = "WebSeriesName.2016.S04E01.HDTV.x264-SVA[eztv].mkv"
Get-ChildItem -LiteralPath $Path
Rename-Item -LiteralPath $Path -NewName 'MyNewName.mkv'
# Using WildcardPattern Escape method
$Path = 'WebSeriesName.2016.S04E01.HDTV.x264-SVA[eztv].mkv'
$EscapedPath = ([WildcardPattern]::Escape([WildcardPattern]::escape($path)))
Get-ChildItem -Path $EscapedPath

Powershell: unexpected behavior of negated -like and -match conditionals

I have 2 folders in my windows folder, software, and softwaretest.
So I have the main folder "software" if statement, then jump to the elseif - here I have the backup folder, so jump to the else...
my problem is that I'm getting the write-host from the elseif, and I have a backup folder that I'm calling softwaretest, so can't see why it give me that output and not the else.
hope someone can guide/help me :-)
If ($SoftwarePathBackup = Get-ChildItem -Path "$Env:SystemRoot" | Where-Object { (!$_.Name -like 'software') }) {
Write-Host ( 'There are no folder named \software\ on this machine - You cant clean/clear/empty the folder!' ) -ForegroundColor Red;
} elseif ($SoftwarePathBackup = Get-ChildItem -Path "$Env:SystemRoot" | Where-Object { ($_.Name -match '.+software$|^software.+') } | Sort-Object) {
Write-Host ( 'There are none folder-backups of \software\ on this machine - You need to make a folder-backup of \software\ before you can clean/clear/empty the folder!' ) -ForegroundColor Red;
} else {
Remove-Item
}
I find it very confusing, to have the negation on the right or even in the RegEx. I think it would be more obvious, to negate in the beginning with a ! or -not.
To test, if a folder exist, you can use Test-Path. Test-Path also has a -Filter parameter, which you can use instead of Where-Object. But I think you don't even have to filter.
$SoftwarePath = "$($Env:SystemRoot)\Software", "$($Env:SystemRoot)\SoftwareBackup"
foreach ($Path in $SoftwarePath) {
if (Test-Path -Path $Path) {
Remove-Item -Path $Path -Force -Verbose
}
else {
Write-Output "$Path not found."
}
}
Would that work for you?
Your primary problem is one of operator precedence:
!$_.Name -like 'software' should be ! ($_.Name -like 'software') or, preferably,
$_.Name -notlike 'software' - using PowerShell's not-prefixed operators for negation.
Similarly, you probably meant to negate $_.Name -match '.+software$|^software.+' which is most easily achieved with $_.Name -notmatch '.+software$|^software.+'
As stated in Get-Help about_Operator_Precedence, ! (a.k.a. -not) has higher precedence than -like, so !$_.Name -like 'software' is evaluated as (!$_.Name) -like 'software', which means that the result of !$_.Name - a Boolean - is (string-)compared to wildcard pattern 'software', which always returns $False, so the If branch is never entered.
That said, you can make do without -like and -match altogether and use the implicit wildcard matching supported by Get-Item's -Include parameter (snippet requires PSv3+):
# Get folders whose name either starts with or ends with 'software', including
# just 'software' itself.
$folders = Get-Item -Path $env:SystemRoot\* -Include 'software*', '*software' |
Where-Object PSIsContainer
# See if a folder named exactly 'software' is among the matches.
$haveOriginal = $folders.Name -contains 'software'
# See if there are backup folders among the matches (too).
# Note that [int] $haveOriginal evaluates to 1 if $haveOriginal is $True,
# and to 0 otherwise.
$haveBackups = ($folders.Count - [int] $haveOriginal) -gt 0
# Now act on $folders as desired, based on flags $haveOriginal and $haveBackups.
Note how Get-Item -Path $env:SystemRoot\* is used to explicitly preselect all items (add -Force if hidden items should be included too), which are then filtered down via -Include.
Since Get-Item - unlike Get-ChildItem- doesn't support -Directory, | Where-Object PSIsContainer is used to further limit the matches to directories (folders).
Note: Get-ChildItem was not used, because -Include only takes effect on child (descendant) items (too) when -Recurse is also specified; while -Recurse can be combined with -Depth 0 (PSv3+) in order to limit matching to immediate child directories, Get-ChildItem apparently still tries to read the entries of all child directories as well, which can result in unwanted access-denied errors from directories that aren't even of interest.
In other words: Get-ChildItem -Recurse -Depth 0 -Directory $env:SystemRoot -include 'software*', '*software' is only equivalent if you have (at least) read access to all child directories of $env:SystemRoot.

Keep first regex match and discard others

Yep another regex question... I am using PowerShell to extract a simple number from a filename when looping through a folder like so:
# sample string "ABCD - (123) Sample Text Here"
Get-ChildItem $processingFolder -filter *.xls | Where-Object {
$name = $_.Name
$pattern = '(\d{2,3})'
$metric = ([regex]$pattern).Matches($name) | { $_.Groups[1].Value }
}
All I am looking for is the number surrounded by brackets. This is successful, but it appears the $_.Name actually grabs more than just the name of the file, and the regex ends up picking up some other bits I don't want.
I understand why, as it's going through each regex match as an object and taking the value out of each and putting in $metric. I need some help editing the code so it only bothers with the first object.
I would just use -match etc if I wasn't bothered with the actual contents of the match, but it needs to be kept.
I don't see a cmdlet call before $_.Groups[1].Value which should be ForEach-Object but that is a minor thing. We need to make a small improvement on your regex pattern as well to account for the brackets but not include them in the return.
$processingFolder = "C:\temp"
$pattern = '\((\d+)\)'
Get-ChildItem $processingFolder -filter "*.xls" | ForEach-Object{
$details = ""
if($_.Name -match $pattern){$details = $matches[1]}
$_ | Add-Member -MemberType NoteProperty -Name Details -Value $details -PassThru
} | select name, details
This will loop all the files and try and match numbers in brackets. If there is more than one match it should only take the first one. We use a capture group in order to ignore the brackets in the results. Next we use Add-Member to make a new property called Details which will contain the matched value.
Currently this will return all files in the $processingFolder but a simple Where-Object{$_.Details} would return just the ones that have the property populated. If you have other properties that you need to make you can chain the Add-Members together. Just don't forget the -passthru.
You could also just make your own new object if you need to go that route with multiple custom parameters. It certainly would be more terse. That last question I answered has an example of that.
After doing some research in to the data being returned itself (System.Text.RegularExpressions.MatchCollection) I found the Item method, so called that on $metric like so:
$name = '(111) 123 456 789 Name of Report Here 123'
$pattern = '(\d{2,3})'
$metric = ([regex]$pattern).Matches($name)
Write-Host $metric.Item(1)
Whilst probably not the best approach, it returns what I'm expecting for now.

Powershell regex to match vhd or vhdx at the end of a string

I'm brand new to powershell and I'm trying to write a script to copy files ending in vhd or vhdx
I can enumerate a directory of files like so:
$NEWEST_VHD = Get-ChildItem -Path $vhdSourceDir | Where-Object Name -match ".vhdx?"
This will match
foo.vhd
foo.vhdx
However this will also match
foo.vhdxxxx
How can I write a match that will only match files ending in exactly vhd or vhdx ?
Unsuccessful attempts
Where-Object Name -match ".vhdx?"
Where-Object Name -like ".vhdx?"
Where-Object Name -match ".[vhd]x?"
Where-Object Name -match ".[vhd]\^x?"
Resources I've investigated
http://ss64.com/ps/syntax-regex.html
https://technet.microsoft.com/en-us/library/ff730947.aspx
http://www.regexr.com/
Put a $ at the end of your pattern:
-match ".vhdx?$"
$ in a Regex pattern represents the end of the string. So, the above will only match .vhdx? if it is at the end. See a demonstration below:
PS > 'foo.vhd' -match ".vhdx?$"
True
PS > 'foo.vhdx' -match ".vhdx?$"
True
PS > 'foo.vhdxxxx' -match ".vhdx?$"
False
PS >
Also, the . character has a special meaning in a Regex pattern: it tells PowerShell to match any character except a newline. So, you could experience behavior such as:
PS > 'foo.xvhd' -match ".vhdx?$"
True
PS >
If this is undesirable, you can add a \ before the .
PS > 'foo.xvhd' -match "\.vhdx?$"
False
PS >
This tells PowerShell to match a literal period instead.
If you only want to check extension, then you can just use Extension property instead of Name:
$NEWEST_VHD = Get-ChildItem -Path $vhdSourceDir | Where-Object Extension -in '.vhd','.vhdx'
Mostly just an FYI but there is no need for a regex solution for this particular issue. You could just use a simple filter.
$NEWEST_VHD = Get-ChildItem -Path $vhdSourceDir -Filter ".vhd?"
Not perfect but if you dont have files called ".vhdz" then you would be safe. Again, this is not meant as an answer but just useful to know. Reminder that ? in this case optionally matches a single character but it not regex just a basic file system wildcard.
Depending on how many files you have here you could argue that this would be more efficient since you will get all the files you need off the get go instead of filtering after the fact.

Powershell 'where' statement -notcontains

I have a simple excerpt form a larger script, basically I'm trying to do a recursive file search, including sub-directories (and any child of the exclude).
clear
$Exclude = "T:\temp\Archive\cst"
$list = Get-ChildItem -Path T:\temp\Archive -Recurse -Directory
$list | where {$_.fullname -notlike $Exclude} | ForEach-Object {
Write-Host "--------------------------------------"
$_.fullname
Write-Host "--------------------------------------"
$files = Get-ChildItem -Path $_.fullname -File
$files.count
}
At the moment this script will exclude the T:\temp\Archive\cst directory, but not the T:\temp\Archive\cst\artwork directory. I'm struggling to overcome this simple thing.
I've tried the -notlike (which I didn't really expect to work) but also the -notcontains which I was hopeful of.
Can anyone offer any advice, I'm thinking it would require a regex match which I'm reading up on now, but not very familiar with.
In the future the $exclude variable will be an array of strings (directories) but at the moment just trying to get it to work with a straight string.
Try:
where {$_.fullname -notlike "$Exclude*"}
You could also try
where {$_.fullname -notmatch [regex]::Escape($Exclude) }
but the notlike apporach is easier.
When used without wildcards the -like operator does the same as the -eq operator. If you want to exclude a folder T:\temp\Archive\cst and everything below it, you need something like this:
$Exclude = 'T:\temp\Archive\cst'
Get-ChildItem -Path T:\temp\Archive -Recurse -Directory | ? {
$_.FullName -ne $Exclude -and
$_.FullName -notlike "$Exclude\*"
} | ...
-notlike "$Exclude\*" would only exclude subfolders of $Exclude, not the folder itself, and -notlike "$Exclude*" would also exclude folders like T:\temp\Archive\cstring, which may be undesired.
The -contains operator is used to check if a list of values contains a particular value. It doesn't check if a string contains a particular substring.
See Get-Help about_Comparison_Operators for further information.
Try changing
$Exclude = "T:\temp\Archive\cst"
To:
$Exclude = "T:\temp\Archive\cst\*"
This will still return the folder CST as it is a child item of Archive, but will exclude anything under cst.
Or:
$Exclude = "T:\temp\Archive\cst*
But that will also exclude anyfiles that start with "cst" under Archive. Same goes for Graimer's answer, jsut be aware of the trailing \ and if it's important to what you are doing
For those looking for a similar answer, what I ended up going with (to parse an array paths for a wildcard match):
# Declare variables
[string]$rootdir = "T:\temp\Archive"
[String[]]$Exclude = "T:\temp\Archive\cst", "T:\temp\archive\as"
[int]$days = 90
# Create Directory list minus excluded directories and their children
$list = Get-ChildItem -Path $rootdir -Recurse -Directory | where {$path = $_.fullname; -not #($exclude | ? {$path -like $_ -or $path -like "$_\*" }) }
Provides what I needed.
Thought I would add to this as I recently had a similar problem answered. You can use the -notcontains condition, but the thing that is counter intuitive is that the $exclude array needs to be at the start of the expression.
Here is an example.
If I perform the following no items are excluded and it returns "a","b","c","d"
$result = #()
$ItemArray = #("a","b","c","d")
$exclusionArray = #("b","c")
$ItemArray | Where-Object { $_ -notcontains $exclusionArray }
If I switch the variables around in the expression then it works and returns "a","d".
$result = #()
$ItemArray = #("a","b","c","d")
$exclusionArray = #("b","c")
$ItemArray | Where-Object { $exclusionArray -notcontains $_ }
I am not sure why the arrays have to be this way around to work. If anyone else can explain that would be great.
EDITED 12/12/20 - I now know that the other operation to use is "-in" as in
$_ -notin $exclusionArray