PowerShell matching strings with Regex - regex

I'm working on a script to move tv shows in to their corresponding folder on my drive. I'm having issues matching shows to their folders. This is the snippet of code I'm having a problem with:
#Remove all non-alphanumeric characters from the name
$newname = $Episode.Name -replace '[^0-9a-zA-Z ]', ' '
#Split the name at S01E01 and store the showname in a variable (Text before S01E01)
$ShowName = [regex]::Split($newname, 'S*(\d{1,2})(x|E)')[0]
#Match and get the destination folder where the names are similar
################## THIS IS WHERE THE ISSUE IS #######################
$DestDir = Gci -Path $DestinationRoot | Where { $ShowName -like "*$($_.Name)*" } | foreach {$_.Name }
For example, a show named "Doctor Who 2005 S02E02 Tooth and Claw.mp4" is not returning a similar folder, which is named "DoctorWho".
Question(s):
Who can I modify the $DestDir so that I can match the names? Is there a better way of doing this?
Working Code:
# Extract the name of the show (text before SxxExx)
$ShowName = [regex]::Split($Episode.Basename, '.(\d{1,3})(X|x|E|e)(\d{1,3})')[0]
# Assumption: There is a folder in TV shows directory that is named correctly, and the input file is named correctly
# Try to match by stripping all non-Alphabet characters from both names and check if the folder name contains the file name
$Folder = gci -Path $DestinationRoot |
Where {$_.PSisContainer -and `
(($_.Name -replace '[^A-Za-z]','') -match ($ShowName -replace '[^A-Za-z]','')) } |
select -ExpandProperty fullname
Some sample output from testing:
Input file name: Arrow S01E02.mp4
Show name: Arrow
Matching folder: C:\Users\Public\Videos\TV Shows\Arrow
-----------------------------------------------------------------------
Input file name: Big Bang Theory S3E03.avi
Show name: Big Bang Theory
Matching folder: C:\Users\Public\Videos\TV Shows\The Big Bang Theory
-----------------------------------------------------------------------
Input file name: Doctor Who S08E03.mp4
Show name: Doctor Who
Matching folder: C:\Users\Public\Videos\TV Shows\Doctor Who (2005)
-----------------------------------------------------------------------
Input file name: GameOfThronesS01E01.mp4
Show name: GameOfThrones
Matching folder: C:\Users\Public\Videos\TV Shows\Game Of Thrones
-----------------------------------------------------------------------

Using the same method as you to figure out what the show name is based on your suggestion. With Doctor Who 2005 S02E02 Tooth and Claw.mp4
$showName = $Episode -replace '[^0-9a-zA-Z ]'
$showName = ($showName -split ('S*(\d{1,2})(x|E)'))[0]
$showName = $showName -replace "\d"
I added the line $showName = $showName -replace "\d" to account for the year in the season. There is a caveat with this if the show contains a number in the middle of it but should work for most. Continuing to the $DestDir determination. Part of the issue is you have your Where comparison backwards. You want to see if the show name is part of the potential folder, not the other way around. Also since the potential folder could contain spaces the comaparison should also contain that assumption.
Get-ChildItem -Path $DestinationRoot -Directory | Where-Object { ($_.name -replace " ") -like "*$($showName)*"}
I would go on to use a Choice selection to have the user confirm the folder since it is possible to have multiple matches. I would like to point out that it might be hard to account for all naming conventions and variances but what you have is a good start.

Related

I'm trying to clean up a script I have by trying to make it build the folder structure based on the file name

I currently have a powershell script that I move files to specific folders based on the file names. The top of the script starts with setting a variables for the destination path where a certain group of files should go:
$FileName = "path to where files with that name go"
Then I read in the contents of the entire directory of files recursively into a variable:
$Files = Get-ChildItem $FileFolder -File -Recurse
Then I have a bunch of lines of the same command for matching and moving:
$Files | Where-Object { $_.Name -match 'some name' } | Move-Item -Destination "$Variable-set-above" -Force
It was fine when it was 10 or 20 matches, but with more and more files being added and needing to be organized, I want to see if I can clean up the script by having it build the destination folder structure based on the file name instead of having a line for every match case, and a line for every move.
I was looking into Split-Path, regex -split, String.split(), and some other options, and I think I'm close, but I can't find an example anywhere of where someone takes the first portion of the file name, up to a certain couple of characters, keeping the first part, and excluding the rest. Kind of like a Split-Ignoresecond or something like that.
I'm testing doing this first before modifying my main script, I have this so far:
3 files in a folder named Test.One.File.D0001.txt, Test.Two.File.D0001.txt, and Test.Three.File.D0001.txt.
My test script:
$Testfiles = Get-ChildItem -Name *.txt
$Testfiles.replace('.',' ') -split "D0"
Which gives me an output of:
Test One File
001 txt
Test Three File
001 txt
Test Two File
001 txt
It's weird that it's not in the right order, but I envision that I'd be just dealing with 1 file at a time anyway so that won't matter.
What I'd like to do is read in a file name, ignore the "001 txt" part, use the first part of the filename to build the last part of a destination path for the file move, and then move the file to that destination. I could use Split-Path -Leafbase but I can't figure out the syntax for it to not give me an error, and I'd still be left with part of the filename I don't want.
Say I have a file called One.Two.ThreeD0001 that needs to go to D:\Files\Onestwosthrees. I want my script to read in the files from a folder, and then process the file One.Two.ThreeD0001.txt so that all that's left is "One Two Three", stick it in a variable like $SplitFile, then move the file to a folder built from the filename like D:\Files\Onestwosthrees\$SplitFile.
There's further parsing I want to do, but if I can get this part down I can figure out the sub parsing I need.
Some sources I've looked at so far for clues are:
https://superuser.com/questions/817955
and
https://kevinmarquette.github.io/2017-07-31-Powershell-regex-regular-expression/
Think you were pretty much there:
cd "C:\Users\users\Downloads\StackTesting"
$testFiles = Get-Childitem -Include *.txt
foreach ( $item in $testfiles ) {
$directory = ($item.name.replace('.', '') -split "D0")[0]
## check if folder exists, if not create
if (!(Test-Path "C:\Users\user\Downloads\StackTesting\$directory"))
{
New-Item -Type Directory "C:\Users\users\Downloads\StackTesting\$directory"
}
ELSE
{
Write-Host "Folder exists"
}
## Move item to folder
Move-item $item.fullname -Destination "C:\Users\users\Downloads\StackTesting\$directory"
}
This is how i got your directory names:
$directory = ($item.name.replace('.', '') -split "D0")[0]
Changed from a space to no space as your examples at the bottom didn't have spaces.

Powershell For Loop, Replace, Split, Regex Matches

$i=0;$pnp = pnputil -e;
$matched = [regex]::matches($pnp, ".......................................Lexmark International");
$split = $matched -split (".........inf");
$replace = $split -replace " Driver package provider : Lexmark International","";
$replace1 = $replace -replace " ","`n";write-output $replace1;
foreach ($i in $replace1){;$pnpdel = pnputil -f -d $i;$pnpdel;};
Reg delete "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Print\Environments\Windows x64\Drivers\Version-3\Lexmark Universal v2 XL" /f;
net stop spooler;net start spooler;start \\officechicprt5\111W-22E-CPRN-01
As you can hopefully see, my script tries to pull oem*.inf values from a pnpitil -e command. I am trying to get each oem##.inf file to be it's own variable in a For loop. My script is a bit of a mess with all the replaces and splits, but that part seems to get the part of the command that I need. I am having issues with the data in $i. It appears that the script will sometimes work, and sometimes not. I want pnputil -d oem99.inf for each oem# it finds in the pnputil enumeration. What am I doing wrong in my For loop? There has to be a better way... I'm still pretty new to this as you can tell.
Thanks again.
Brent
Leveraging the power in PowerShell we can turn the output of pnputil into an object array that will make it much easier to parse the data you are looking for (since it appears you are looking for something specific).
Each entry is a group of variables with a blank line in-between them. Using that lets turn this data into custom objects.
$rawdata = pnputil -e | Select-Object -Skip 1
$rawdata = $rawdata -join "`r`n" -split "`r`n`r`n"
$entries = $rawdata | ForEach-Object{
$props = $_ -replace ":","=" | ConvertFrom-StringData
new-object -TypeName PSCustomObject -Property $props
}
$rawdata initially contains the text from pnputil -e. We use Select-Object -Skip 1 to remove the "Microsoft PnP Utility" line. Since $rawdata is an array this approach requires that is it one long string so -join "`r`n". Immediately after we split it up into separate array elements of each property group with -split "`r`n`r`n" which splits on the blank line you see in cmd output.
The magic comes from ConvertFrom-StringData which create a hashtable from key value pairs in stings. In needs = to work so we convert the colons as such. With each hashtable that is created we convert it to an object and save it to the variable $entries. $entries will always be an array since it is safe to expect more than one entry.
Sample Entry once converted:
Class : Printers
Driver date and version : 12/03/2014 1.5.0.0
Signer name : Microsoft Windows Hardware Compatibility Publisher
Published name : oem27.inf
Driver package provider : Ricoh
Now we can use PowerShell to filter out exactly what you are looking for!
$entries | Where-Object{$_."Driver package provider" -match "Ricoh"} | Select-Object -ExpandProperty "Published name"
Note that this can also return an array but for me there was only one entry. The output for this was oem27.inf
Then using the information you are actually looking for you can run your other commands.

Batch rename files with regex

I have a number of files with the following format:
name_name<number><number>[TIF<11 numbers>].jpg
e.g. john_sam01 [TIF 15355474840].jpg
And I would like to remove the [TIF 15355474840] from all of these files
This includes a leading space before the '[TIF...' and a different combination of 11 numbers each time.
So the previous example would become:
josh_sam01.jpg
In short, using powershell (or cmd.exe) with regex I would like to turn this filename:
josh_sam01 [TIF 15355474840].jpg
Into this:
josh_sam01.jpg
With variables being: 'john' 'sam' two numbers and the numbers after TIF.
Something like, with added newlines for clarity:
dir ‹parameters to select the set of files› |
% {
$newName = $_.Name -replace '\s\[TIF \d+\]',''
rename-item -newname $newName -literalPath $_.Fullname
}
Almost certainly adding -whatif to the rename until I was sure I had the file selection and rename correct.

How to get a value from Select-String

I have several files in a folder, those are .xml files.
I want to get a value from those files.
A line in the file, could look like this:
<drives name="Virtual HD ATA Device" deviceid="\\.\PHYSICALDRIVE0" interface="IDE" totaldisksize="49,99">
What i'm trying to do is get the value 49,99 in this case.
I am able to get the line out of the file with:
$Strings = Select-String -Path "XML\*.xml" -Pattern totaldisksize
foreach ($String in $Strings) {
Write-Host "Line is" $String
}
But getting just the value in "" i don't get how. I've also played around with
$Strings.totaldisksize
But no dice.
Thanks in advance.
You can do this in one line as follows:
$(select-string totaldisksize .\XML\*.xml).line -replace '.*totaldisksize="(\d+,\d+)".*','$1'
The Select-String will give you a collection of objects that contains information about the match. The line property is the one you're interested in, so you can pull that directly.
Using the -replace operator, every time the .line property is a match of totaldisksize, you can run the regex on it. The $1 replacement will grab the group in the regex, the group being the part in parentheses (\d+,\d+) which will match one or more digits, followed by a comma, followed by one or more digits.
This will print to screen because by default powershell will print an object to the screen. Because you're only accessing the .line property, that's the only bit that's printed and also only after the replacement has been run.
If you wanted to explicitly use a Write-Host to see the results, or do anything else with them, you could store to a variable as follows:
$sizes = $(select-string totaldisksize .\XML\*.xml).line -replace '.*totaldisksize="(\d+,\d+)".*','$1'
$sizes | % { Write-Host $_ }
The above stores the results to an array, $sizes, and you iterate over it by piping it to the Foreach-Object or %. You can then access the array elements with $_ inside the block.
But.. but.. PowerShell knows XML.
$XMLfile = '<drives name="Virtual HD ATA Device" deviceid="\\.\PHYSICALDRIVE0" interface="IDE" totaldisksize="49,99"></drives>'
$XMLobject = [xml]$XMLfile
$XMLobject.drives.totaldisksize
Output
49,99
Or walk the tree and return the content of "drives":
$XMLfile = #"
<some>
<nested>
<tags>
<drives someOther="stuff" totaldisksize="49,99" freespace="22,33">
</drives>
</tags>
</nested>
</some>
"#
$drives = [xml]$XMLfile | Select-Xml -XPath "//drives" | select -ExpandProperty node
Output
PS> $drives
someOther totaldisksize freespace
--------- ------------- ---------
stuff 49,99 22,33
PS> $drives.freespace
22,33
XPath query of "//drives" = Find all nodes named "drives" anywhere in the XML tree.
Reference: Windows PowerShell Cookbook 3rd Edition (Lee Holmes). Page 930.
I am not sure about powershell but if you prefer using python below is the way of doing it.
import re
data = open('file').read()
item = re.findall('.*totaldisksize="([\d,]+)">', data)
print(item[0])
Output
49,99

Remove a substring from a filename with Powershell

In a deployment szenario, I need to rename config files. There are config files for every environment (Dev.Test, Dev.Prod, Integration, Prod). For example a web.config would be called web.Dev.Test.config if it was for the Dev.Test environment. On the target machine, I need to rename the files back to their original name (i.e. from web.Dev.Test.config to web.config) with Powershell.
$test = "web.Dev.Prod.config"
$environment = $test | Select-String -Pattern ".*\.(?<environment>(Dev.Test|Dev.Prod|Prod|Integration))\.config" | select -expand Matches | foreach {$_.groups["environment"].value}
if ($test -match "Dev.Prod")
{
$environment = "Dev.Prod"
}
$environment
$newFileName = $test.Remove($test.IndexOf($environment),$environment.Length + 1)
$newFileName
The problem I have with this is, that the Regex does not find the Dev.Prod evironment, but returns Prod instead. This is why I introduced the if statement. I was wondering if there was a more elegant way of renaming the files with Powershell.
Watch out for greedy matching. Modify your regex that starts ".*\.(?" to ".*?\.(?".