Comparing two text files is not working in Powershell - list

I am trying to compare the contents of two text files and have only the differences be outputted to the console.
The first text file is based on the file names in a folder.
$AsyFolder = Get-ChildItem -Path .\asy-data -Name
I then remove the prefix of the file name that is set up by the user and is the same for every file and is separated from the relevant info with a dash.
$AsyFolder| ForEach-Object{$_.Split("-").Replace("$Prefix", "")} | Where-Object {$_}|Set-Content -Path .\templog.txt
The output looks like $Asyfolder Output
bpm.art
gbr.pdf
asy.pdf
fab.pdf
as1.art
odb.tgz
ccam.cad
read_me_asy.txt
There is another file that is the reference and contains the suffixes of files that should be there.
It looks like this Reference File
tpm.art
bpm.art
gbr.pdf
asy.pdf
fab.pdf
as1.art
as2.art
odb.tgz
xyp.txt
ccam.cad
And its contents are received with $AsyTemplate = Get-Content -Path C:\Users\asy_files.txt
The logic is as follows
$AsyTemplate |
ForEach-Object{
If(Select-String -Path .\templog.txt -Pattern $_ -NotMatch -Quiet){
Write-Host "$($_)"
}
}
I have tried various ways of setting up the templog.txt with -InputObject: using Get-Content, Get-Content -Raw, a variable, writing an array manually. I have also tried removing -NotMatch and using -eq $False for the output of select string.
Everytime though the output is just the contents of asy_files.txt (Reference File). It doesn't seem to care what is in templog.txt ($AsyFolder Output).
I have tried using compare-object/where-object method as well and it just says that both files are completely different.

Thank you #Lee_Dailey for your help in figuring out how to properly ask a question...
It ended up being additional whitespace (3 tabs) after the characters in the reference file asy_files.txt.
It was an artifact from where I copied from, and powershell was seeing "as2.art" and "as2.art " I am not 100% as to why that matters, but I found that sorting for any whitespace with /S that appears after a word character /W and removing it made the comparison logic work. The Compare-Object|Where-Object worked as well after removing the whitespace.

Related

Add leading zero to file names using PowerShell

I have a folder that contains multiple hunderd .mp3-files, all with the same name and an ascending number. The filenames look like this:
Test 01.mp3
Test 02.mp3
Test 03.mp3
Test 100.mp3
Test 101.mp3
Test 102.mp3
As you can see, the number of leading zeros in the first files is wrong, as they should have one more. I'd like to use PowerShell to solve the problem as I am currently learning to operate this quite helpful tool.
I tried to count the digits in the file names using the Replace Operator to filter out any non-digit characters. I assumed that the first 99 files would have three digits while the other files would have more (counting the '3' of the .mp3 file extension)
Get-Childitem | Where {($_.Name.Replace("\D","")).Length -le 3}
That should give me any files that have 3 or less digits in their file name - but it doesnt. In fact, it shows none. If i increase the number at he end to 11, i get the first three test files, increasing it to 12 shows all six of them. I assume that the Replace-Operator doesn't get applied to the file name before the filtering based on the Length-Operator, although I used brackets around $_.Name.Replace("\D","")
What the hell am I doing wrong?
I figures it out: Get-ChildItem | Where {($_.Name -replace "\D","").Length -le 3} returns the files that I need to rename.
The whole command I used was
Get-ChildItem | Where {($_.Name -replace "\D","").Length -le 3} | Rename-Item -NewName { $_.Name -replace "Test ","Test 0"}
Its also possible to rename all files to an number-only scheamtic and use the padleft command as shown here
By replacing "Test " with "Test 0", you would still not achieve what you want on files that are numbered Test 1.mp3 (as this will become Test 01.mp3, which is one leading zero short).
You can make sure all files will have a 3-digit sequence number by doing this:
Get-Childitem -Path 'D:\Test' -Filter '*.mp3' -File |
Where-Object {$_.BaseName -match '(\D+)(\d+)$'} |
Rename-Item -NewName { '{0}{1:D3}{2}' -f $Matches[1], [int]$Matches[2], $_.Extension }
With this, also wrongly named files like Test 00000003.mp3 wil be renamed as Test 003.mp3

I'm trying to clean up a script I have by trying to make it build the folder structure based on the file name

I currently have a powershell script that I move files to specific folders based on the file names. The top of the script starts with setting a variables for the destination path where a certain group of files should go:
$FileName = "path to where files with that name go"
Then I read in the contents of the entire directory of files recursively into a variable:
$Files = Get-ChildItem $FileFolder -File -Recurse
Then I have a bunch of lines of the same command for matching and moving:
$Files | Where-Object { $_.Name -match 'some name' } | Move-Item -Destination "$Variable-set-above" -Force
It was fine when it was 10 or 20 matches, but with more and more files being added and needing to be organized, I want to see if I can clean up the script by having it build the destination folder structure based on the file name instead of having a line for every match case, and a line for every move.
I was looking into Split-Path, regex -split, String.split(), and some other options, and I think I'm close, but I can't find an example anywhere of where someone takes the first portion of the file name, up to a certain couple of characters, keeping the first part, and excluding the rest. Kind of like a Split-Ignoresecond or something like that.
I'm testing doing this first before modifying my main script, I have this so far:
3 files in a folder named Test.One.File.D0001.txt, Test.Two.File.D0001.txt, and Test.Three.File.D0001.txt.
My test script:
$Testfiles = Get-ChildItem -Name *.txt
$Testfiles.replace('.',' ') -split "D0"
Which gives me an output of:
Test One File
001 txt
Test Three File
001 txt
Test Two File
001 txt
It's weird that it's not in the right order, but I envision that I'd be just dealing with 1 file at a time anyway so that won't matter.
What I'd like to do is read in a file name, ignore the "001 txt" part, use the first part of the filename to build the last part of a destination path for the file move, and then move the file to that destination. I could use Split-Path -Leafbase but I can't figure out the syntax for it to not give me an error, and I'd still be left with part of the filename I don't want.
Say I have a file called One.Two.ThreeD0001 that needs to go to D:\Files\Onestwosthrees. I want my script to read in the files from a folder, and then process the file One.Two.ThreeD0001.txt so that all that's left is "One Two Three", stick it in a variable like $SplitFile, then move the file to a folder built from the filename like D:\Files\Onestwosthrees\$SplitFile.
There's further parsing I want to do, but if I can get this part down I can figure out the sub parsing I need.
Some sources I've looked at so far for clues are:
https://superuser.com/questions/817955
and
https://kevinmarquette.github.io/2017-07-31-Powershell-regex-regular-expression/
Think you were pretty much there:
cd "C:\Users\users\Downloads\StackTesting"
$testFiles = Get-Childitem -Include *.txt
foreach ( $item in $testfiles ) {
$directory = ($item.name.replace('.', '') -split "D0")[0]
## check if folder exists, if not create
if (!(Test-Path "C:\Users\user\Downloads\StackTesting\$directory"))
{
New-Item -Type Directory "C:\Users\users\Downloads\StackTesting\$directory"
}
ELSE
{
Write-Host "Folder exists"
}
## Move item to folder
Move-item $item.fullname -Destination "C:\Users\users\Downloads\StackTesting\$directory"
}
This is how i got your directory names:
$directory = ($item.name.replace('.', '') -split "D0")[0]
Changed from a space to no space as your examples at the bottom didn't have spaces.

Batch rename files with regex

I have a number of files with the following format:
name_name<number><number>[TIF<11 numbers>].jpg
e.g. john_sam01 [TIF 15355474840].jpg
And I would like to remove the [TIF 15355474840] from all of these files
This includes a leading space before the '[TIF...' and a different combination of 11 numbers each time.
So the previous example would become:
josh_sam01.jpg
In short, using powershell (or cmd.exe) with regex I would like to turn this filename:
josh_sam01 [TIF 15355474840].jpg
Into this:
josh_sam01.jpg
With variables being: 'john' 'sam' two numbers and the numbers after TIF.
Something like, with added newlines for clarity:
dir ‹parameters to select the set of files› |
% {
$newName = $_.Name -replace '\s\[TIF \d+\]',''
rename-item -newname $newName -literalPath $_.Fullname
}
Almost certainly adding -whatif to the rename until I was sure I had the file selection and rename correct.

How can I split and select from an array of filenames in Powershell?

I have a script I wrote in my company for clearing Citrix UPM profiles. Not very complicated, but it generates logs for every user it is run on. Along the format of:
UPMreset-e0155555-20150112-0733
UPMreset-n9978524-20150114-1128
UPMreset-jsmith-20150113-0840
etc.
So I want to grab the folder with all the .txt files, select only the username and count to see if one appears more than a certain number of times. To check for problem children. Putting them into an array is easy enough, but when doing a -split I can't seem to find a regex combination to select only the username. I thought I could just do a ('-')[1], but that doesn't appear to work. Do you have any suggestions?
$arrFiles = Get-Childitem "c:\logs"
$arrFiles | %{ $arrfile = $_ -split ('-'); Write-Host $arrfile[0]}
edit: Included test code for posterity sake.
I'd try something like this:
$Path = 'N:\Folder\*.txt';
Get-ChildItem $Path |
ForEach-Object {
Write-Output $_.BaseName.Split('-')[1];
} |
Group-Object |
Where-Object { $_.Count -gt 1 } |
Sort-Object -Property Name |
Select-Object Name, Count;
To answer the question.
$_ is one of the objects returned by Get-ChildItem. Those objects are not strings. They're .Net objects of type System.IO.DirectoryInfo or System.IO.FileInfo. That means if we use $_, we're referencing the whole object. Worse, neither of those objects has a Split() method, so $_.Split('-') would refer to a function that didn't exist.
BaseName is a property of a FileInfo or DirectoryInfo object. That property contains the name of the file without the path or the extension. Critically, this property is also a String, which does have the Split() method. So using this property does two things: It removes the path name and the extension since we don't care about that and we don't want it to potentially break something (e.g., if someone put a dash in the parent folder's name), and it gives us a String object which we can manipulate with String methods and do things like call the Split function.
Try something like this at the command line:
$x = Get-ChildItem 'N:\Folder\UPMreset-e0155555-20150112-0733.txt';
$x | Get-Member;
You'll get a huge list of Methods (functions) that the object can do and Properties (attribute values) of the object. Name, FullName, BaseName, and Extension are all very common properties to use. You should also see NoteProperties and CodeProperties, which are added by the PowerShell provider to make using them easier (they wouldn't be available in a C# program). The definition tells you how to call the method or what the type of the property is and what you can do with it. You can usually Google and find MSDN documentation for how to use them, although it's not always the easiest way to do things.
Compare the above to this:
$x.BaseName | Get-Member;
You can see that it's a String, that there all kinds of methods like Split, Replace, IndexOf, etc.
Another helpful one is:
$x | Select-Object *;
This returns all the Propety, NoteProperty, and CodeProperty values this object has.
This highlights one of the best ways to learn about what you can do with an object. Pipe it to Get-Member, and you learn the type and any methods or properties that you can access. That, combined with piping something to Select-Object *, can tell you a lot about what you're working with.
What problem were you having with .split('-')[1]?
$filenames = #(
'UPMreset-e0155555-20150112-0733',
'UPMreset-n9978524-20150114-1128',
'UPMreset-jsmith-20150113-0840'
)
$filenames |% {$_.split('-')[1]}
e0155555
n9978524
jsmith
It looks like the filenames are always UPMreset-, followed by the username. So use this:
UPMreset-(.+?)-
and the capture group will contain the username. It's using a lazy quantifier to get anything up to the next dash.
You could also do the split in a calculated property with Group-Object:
$FileNames = Get-ChildItem -Path $LogDir -Filter "*.txt" -Name
$FileNames | Group-Object #{Expression={($_ -split "-")[1]}} | Where-Object {$_.Count -gt 1}

How to Find Replace Multiple strings in multiple text files using Powershell

I am new to scripting, and Powershell. I have been doing some study lately and trying to build a script to find/replace text in a bunch of text files (Each text file having code, not more than 4000 lines). However, I would like to keep the FindString and ReplaceString as variables, for there are multiple values, which can in turn be read from a separate csv file.
I have come up with this code, which is functional, but I would like to know if this is the optimal solution for the aforementioned requirement. I would like to keep the FindString and ReplaceString as regular expression compatible in the script, as I would also like to Find/Replace patterns. (I am yet to test it with Regular Expression Pattern)
Sample contents of Input.csv: (Number of objects in csv may vary from 50 to 500)
FindString ReplaceString
AA1A 171PIT9931A
BB1B 171PIT9931B
CC1C 171PIT9931E
DD1D 171PIT9932A
EE1E 171PIT9932B
FF1F 171PIT9932E
GG1G 171PIT9933A
The Code
$Iteration = 0
$FDPATH = 'D:\opt\HMI\Gfilefind_rep'
#& 'D:\usr\fox\wp\bin\tools\fdf_g.exe' $FDPATH\*.fdf
$GraphicsList = Get-ChildItem -Path $FDPATH\*.g | ForEach-Object FullName
$FindReplaceList = Import-Csv -Path $FDPATH\Input.csv
foreach($Graphic in $Graphicslist){
Write-Host "Processing Find Replace on : $Graphic"
foreach($item in $FindReplaceList){
Get-Content $Graphic | ForEach-Object { $_ -replace "$($item.FindString)", "$($item.ReplaceString)" } | Set-Content ($Graphic+".tmp")
Remove-Item $Graphic
Rename-Item ($Graphic+".tmp") $Graphic
$Iteration = $Iteration +1
Write-Host "String Replace Completed for $($item.ReplaceString)"
}
}
I have gone through other posts here in Stackoverflow, and gathered valuable inputs, based on which the code was built. This post from Ivo Bosticky came pretty close to my requirement, but I had to perform the same on a nested foreach loop with Find/Replace Strings as Variables reading from an external source.
To summarize,
I would like to know if the above code can be optimized for
execution, since I feel it takes a long time to execute. (I prefer
not using aliases for now, as I am just starting out, and am fine
with a long and functional script rather than a concise one which is
hard to understand)
I would like to add the number of Iterations being carried out in
the loop. I was able to add the current Iteration number onto the
console, but couldn't figure how to pipe the output of
Measure-Command onto a variable, which could be used in Write-Host
Command. I would also like to display the time taken for code
execution, on completion.
Thanks for the time taken to read this Query. Much appreciate your support!
First of all, unless your replacement string is going to contain newlines (which would change the line boundaries), I would advise getting and setting each $Graphic file's contents only once, and doing all replacements in a single pass. This will also result in fewer file renames and deletions.
Second, it would be (probably marginally) faster to pass $item.FindString and $item.ReplaceString directly to the -replace operator rather than invoking the templating engine to inject the values into string literals.
Third, unless you truly need the output to go directly to the console instead of going to the normal output stream, I would avoid Write-Host. See Write-Host Considered Harmful.
And fourth, you might actually want to remove the Write-Host that gets called for every find and replace, as it may have a fair bit of effect on the overall execution time, depending on how many replacements there are.
You'd end up with something like this:
$timeTaken = (measure-command {
$Iteration = 0
$FDPATH = 'D:\opt\HMI\Gfilefind_rep'
#& 'D:\usr\fox\wp\bin\tools\fdf_g.exe' $FDPATH\*.fdf
$GraphicsList = Get-ChildItem -Path $FDPATH\*.g | ForEach-Object FullName
$FindReplaceList = Import-Csv -Path $FDPATH\Input.csv
foreach($Graphic in $Graphicslist){
Write-Output "Processing Find Replace on : $Graphic"
Get-Content $Graphic | ForEach-Object {
foreach($item in $FindReplaceList){
$_ = $_ -replace $item.FindString, $item.ReplaceString
}
$Iteration += 1
$_
} | Set-Content ($Graphic+".tmp")
Remove-Item $Graphic
Rename-Item ($Graphic+".tmp") $Graphic
}
}).TotalMilliseconds
I haven't tested it but it should run a fair bit faster, plus it will save the elapsed time to a variable.