How to Find Replace Multiple strings in multiple text files using Powershell - regex

I am new to scripting, and Powershell. I have been doing some study lately and trying to build a script to find/replace text in a bunch of text files (Each text file having code, not more than 4000 lines). However, I would like to keep the FindString and ReplaceString as variables, for there are multiple values, which can in turn be read from a separate csv file.
I have come up with this code, which is functional, but I would like to know if this is the optimal solution for the aforementioned requirement. I would like to keep the FindString and ReplaceString as regular expression compatible in the script, as I would also like to Find/Replace patterns. (I am yet to test it with Regular Expression Pattern)
Sample contents of Input.csv: (Number of objects in csv may vary from 50 to 500)
FindString ReplaceString
AA1A 171PIT9931A
BB1B 171PIT9931B
CC1C 171PIT9931E
DD1D 171PIT9932A
EE1E 171PIT9932B
FF1F 171PIT9932E
GG1G 171PIT9933A
The Code
$Iteration = 0
$FDPATH = 'D:\opt\HMI\Gfilefind_rep'
#& 'D:\usr\fox\wp\bin\tools\fdf_g.exe' $FDPATH\*.fdf
$GraphicsList = Get-ChildItem -Path $FDPATH\*.g | ForEach-Object FullName
$FindReplaceList = Import-Csv -Path $FDPATH\Input.csv
foreach($Graphic in $Graphicslist){
Write-Host "Processing Find Replace on : $Graphic"
foreach($item in $FindReplaceList){
Get-Content $Graphic | ForEach-Object { $_ -replace "$($item.FindString)", "$($item.ReplaceString)" } | Set-Content ($Graphic+".tmp")
Remove-Item $Graphic
Rename-Item ($Graphic+".tmp") $Graphic
$Iteration = $Iteration +1
Write-Host "String Replace Completed for $($item.ReplaceString)"
}
}
I have gone through other posts here in Stackoverflow, and gathered valuable inputs, based on which the code was built. This post from Ivo Bosticky came pretty close to my requirement, but I had to perform the same on a nested foreach loop with Find/Replace Strings as Variables reading from an external source.
To summarize,
I would like to know if the above code can be optimized for
execution, since I feel it takes a long time to execute. (I prefer
not using aliases for now, as I am just starting out, and am fine
with a long and functional script rather than a concise one which is
hard to understand)
I would like to add the number of Iterations being carried out in
the loop. I was able to add the current Iteration number onto the
console, but couldn't figure how to pipe the output of
Measure-Command onto a variable, which could be used in Write-Host
Command. I would also like to display the time taken for code
execution, on completion.
Thanks for the time taken to read this Query. Much appreciate your support!

First of all, unless your replacement string is going to contain newlines (which would change the line boundaries), I would advise getting and setting each $Graphic file's contents only once, and doing all replacements in a single pass. This will also result in fewer file renames and deletions.
Second, it would be (probably marginally) faster to pass $item.FindString and $item.ReplaceString directly to the -replace operator rather than invoking the templating engine to inject the values into string literals.
Third, unless you truly need the output to go directly to the console instead of going to the normal output stream, I would avoid Write-Host. See Write-Host Considered Harmful.
And fourth, you might actually want to remove the Write-Host that gets called for every find and replace, as it may have a fair bit of effect on the overall execution time, depending on how many replacements there are.
You'd end up with something like this:
$timeTaken = (measure-command {
$Iteration = 0
$FDPATH = 'D:\opt\HMI\Gfilefind_rep'
#& 'D:\usr\fox\wp\bin\tools\fdf_g.exe' $FDPATH\*.fdf
$GraphicsList = Get-ChildItem -Path $FDPATH\*.g | ForEach-Object FullName
$FindReplaceList = Import-Csv -Path $FDPATH\Input.csv
foreach($Graphic in $Graphicslist){
Write-Output "Processing Find Replace on : $Graphic"
Get-Content $Graphic | ForEach-Object {
foreach($item in $FindReplaceList){
$_ = $_ -replace $item.FindString, $item.ReplaceString
}
$Iteration += 1
$_
} | Set-Content ($Graphic+".tmp")
Remove-Item $Graphic
Rename-Item ($Graphic+".tmp") $Graphic
}
}).TotalMilliseconds
I haven't tested it but it should run a fair bit faster, plus it will save the elapsed time to a variable.

Related

Comparing two text files is not working in Powershell

I am trying to compare the contents of two text files and have only the differences be outputted to the console.
The first text file is based on the file names in a folder.
$AsyFolder = Get-ChildItem -Path .\asy-data -Name
I then remove the prefix of the file name that is set up by the user and is the same for every file and is separated from the relevant info with a dash.
$AsyFolder| ForEach-Object{$_.Split("-").Replace("$Prefix", "")} | Where-Object {$_}|Set-Content -Path .\templog.txt
The output looks like $Asyfolder Output
bpm.art
gbr.pdf
asy.pdf
fab.pdf
as1.art
odb.tgz
ccam.cad
read_me_asy.txt
There is another file that is the reference and contains the suffixes of files that should be there.
It looks like this Reference File
tpm.art
bpm.art
gbr.pdf
asy.pdf
fab.pdf
as1.art
as2.art
odb.tgz
xyp.txt
ccam.cad
And its contents are received with $AsyTemplate = Get-Content -Path C:\Users\asy_files.txt
The logic is as follows
$AsyTemplate |
ForEach-Object{
If(Select-String -Path .\templog.txt -Pattern $_ -NotMatch -Quiet){
Write-Host "$($_)"
}
}
I have tried various ways of setting up the templog.txt with -InputObject: using Get-Content, Get-Content -Raw, a variable, writing an array manually. I have also tried removing -NotMatch and using -eq $False for the output of select string.
Everytime though the output is just the contents of asy_files.txt (Reference File). It doesn't seem to care what is in templog.txt ($AsyFolder Output).
I have tried using compare-object/where-object method as well and it just says that both files are completely different.
Thank you #Lee_Dailey for your help in figuring out how to properly ask a question...
It ended up being additional whitespace (3 tabs) after the characters in the reference file asy_files.txt.
It was an artifact from where I copied from, and powershell was seeing "as2.art" and "as2.art " I am not 100% as to why that matters, but I found that sorting for any whitespace with /S that appears after a word character /W and removing it made the comparison logic work. The Compare-Object|Where-Object worked as well after removing the whitespace.

How to Optimize extended event to JSON conversion

I have a small process for ingesting a .xel file, converting it to custom objects with a dba-tools module, and then turning them into single-line JSON and exporting them to a file that gets sent off to wherever it goes. Here:
$path = 'C:\temp\big_xe_file.xel'
#Read in file
$xes = Read-DbaXEFile -Path $path
#Output Variable
$file = ""
foreach ($xe in $xes) {
#format date column
$xe.timestamp = ($xe.timestamp.DateTime).ToString("yyyy-MM-ddThh:mm:ss.ffff")
# convert to JSON and change escaped unicode characters back
$xe = (($xe | ConvertTo-Json -compress)) | % { #| % { [System.Text.RegularExpressions.Regex]::Unescape($_) }
[Regex]::Replace($_,
"\\u(?<Value>[a-zA-Z0-9]{4})", {
param($m) ([char]([int]::Parse($m.Groups['Value'].Value,
[System.Globalization.NumberStyles]::HexNumber))).ToString() } )}
#Write line to file
Add-Content -Value "$($xe)`n" -Path 'C:\temp\myevents.json' -Encoding utf8 -NoNewline
}
This fits the bill and does exactly what I need it to. The nasty regex in the middle is because when you convertto-json, it HANDILY escapes all unicode characters, and the regex magically turns them all back to the characters we know and love.
However, it's all a bit too slow. We churn out lots of .xel files, usually 500mb in size, and we would like to have a shorter delay between the traces being written and being ingested. As it stands, it takes ~35 minutes to serially process a file this way. The delay would likely grow if we got behind, which seems likely at that speed.
I've already sped this up quite a bit. I've tried using [System.Text.RegularExpressions.Regex]::Unescape in place of the regex code I have, but it is only slightly faster and does not provide the correct formatting that we need anyway. My next step is to split the files into smaller pieces and process them in parallel, but that would be significantly more CPU intensive and I'd like to avoid that if possible.
Any help optimizing this is much appreciated!
It turns out there was a config issue and we were able to ditch that regex nonsense and leave the escape characters in the JSON. However, I did also find a solution for speeding it up, in case anyone ever sees this. The solution was changing the writer to use a .NET class instead of the powershell method
$stream = [System.IO.StreamWriter] $outfile
foreach ($xe in $xes) {
#format date column
$xe.timestamp = ($xe.timestamp.DateTime).ToString("yyyy-MM-ddThh:mm:ss.ffff")
$xe | Add-Member -MemberType NoteProperty -Name 'source_host_name' -Value $server_name
# convert to JSON and change escaped unicode characters back
$xe = (($xe | ConvertTo-Json -compress)) #| % { #| % { [System.Text.RegularExpressions.Regex]::Unescape($_) }
# [Regex]::Replace($_,
# "\\u(?<Value>[a-zA-Z0-9]{4})", {
# param($m) ([char]([int]::Parse($m.Groups['Value'].Value,
# [System.Globalization.NumberStyles]::HexNumber))).ToString() } )}
#Add-Content -Value "$($xe)`n" -Path 'C:\DBA Notes\Traces\Xel.json' -Encoding utf8 -NoNewline
$stream.WriteLine($xe)
}
$stream.close()
It takes 1/10 the amount of time. Cheers

Powershell For Loop, Replace, Split, Regex Matches

$i=0;$pnp = pnputil -e;
$matched = [regex]::matches($pnp, ".......................................Lexmark International");
$split = $matched -split (".........inf");
$replace = $split -replace " Driver package provider : Lexmark International","";
$replace1 = $replace -replace " ","`n";write-output $replace1;
foreach ($i in $replace1){;$pnpdel = pnputil -f -d $i;$pnpdel;};
Reg delete "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Print\Environments\Windows x64\Drivers\Version-3\Lexmark Universal v2 XL" /f;
net stop spooler;net start spooler;start \\officechicprt5\111W-22E-CPRN-01
As you can hopefully see, my script tries to pull oem*.inf values from a pnpitil -e command. I am trying to get each oem##.inf file to be it's own variable in a For loop. My script is a bit of a mess with all the replaces and splits, but that part seems to get the part of the command that I need. I am having issues with the data in $i. It appears that the script will sometimes work, and sometimes not. I want pnputil -d oem99.inf for each oem# it finds in the pnputil enumeration. What am I doing wrong in my For loop? There has to be a better way... I'm still pretty new to this as you can tell.
Thanks again.
Brent
Leveraging the power in PowerShell we can turn the output of pnputil into an object array that will make it much easier to parse the data you are looking for (since it appears you are looking for something specific).
Each entry is a group of variables with a blank line in-between them. Using that lets turn this data into custom objects.
$rawdata = pnputil -e | Select-Object -Skip 1
$rawdata = $rawdata -join "`r`n" -split "`r`n`r`n"
$entries = $rawdata | ForEach-Object{
$props = $_ -replace ":","=" | ConvertFrom-StringData
new-object -TypeName PSCustomObject -Property $props
}
$rawdata initially contains the text from pnputil -e. We use Select-Object -Skip 1 to remove the "Microsoft PnP Utility" line. Since $rawdata is an array this approach requires that is it one long string so -join "`r`n". Immediately after we split it up into separate array elements of each property group with -split "`r`n`r`n" which splits on the blank line you see in cmd output.
The magic comes from ConvertFrom-StringData which create a hashtable from key value pairs in stings. In needs = to work so we convert the colons as such. With each hashtable that is created we convert it to an object and save it to the variable $entries. $entries will always be an array since it is safe to expect more than one entry.
Sample Entry once converted:
Class : Printers
Driver date and version : 12/03/2014 1.5.0.0
Signer name : Microsoft Windows Hardware Compatibility Publisher
Published name : oem27.inf
Driver package provider : Ricoh
Now we can use PowerShell to filter out exactly what you are looking for!
$entries | Where-Object{$_."Driver package provider" -match "Ricoh"} | Select-Object -ExpandProperty "Published name"
Note that this can also return an array but for me there was only one entry. The output for this was oem27.inf
Then using the information you are actually looking for you can run your other commands.

How can I split and select from an array of filenames in Powershell?

I have a script I wrote in my company for clearing Citrix UPM profiles. Not very complicated, but it generates logs for every user it is run on. Along the format of:
UPMreset-e0155555-20150112-0733
UPMreset-n9978524-20150114-1128
UPMreset-jsmith-20150113-0840
etc.
So I want to grab the folder with all the .txt files, select only the username and count to see if one appears more than a certain number of times. To check for problem children. Putting them into an array is easy enough, but when doing a -split I can't seem to find a regex combination to select only the username. I thought I could just do a ('-')[1], but that doesn't appear to work. Do you have any suggestions?
$arrFiles = Get-Childitem "c:\logs"
$arrFiles | %{ $arrfile = $_ -split ('-'); Write-Host $arrfile[0]}
edit: Included test code for posterity sake.
I'd try something like this:
$Path = 'N:\Folder\*.txt';
Get-ChildItem $Path |
ForEach-Object {
Write-Output $_.BaseName.Split('-')[1];
} |
Group-Object |
Where-Object { $_.Count -gt 1 } |
Sort-Object -Property Name |
Select-Object Name, Count;
To answer the question.
$_ is one of the objects returned by Get-ChildItem. Those objects are not strings. They're .Net objects of type System.IO.DirectoryInfo or System.IO.FileInfo. That means if we use $_, we're referencing the whole object. Worse, neither of those objects has a Split() method, so $_.Split('-') would refer to a function that didn't exist.
BaseName is a property of a FileInfo or DirectoryInfo object. That property contains the name of the file without the path or the extension. Critically, this property is also a String, which does have the Split() method. So using this property does two things: It removes the path name and the extension since we don't care about that and we don't want it to potentially break something (e.g., if someone put a dash in the parent folder's name), and it gives us a String object which we can manipulate with String methods and do things like call the Split function.
Try something like this at the command line:
$x = Get-ChildItem 'N:\Folder\UPMreset-e0155555-20150112-0733.txt';
$x | Get-Member;
You'll get a huge list of Methods (functions) that the object can do and Properties (attribute values) of the object. Name, FullName, BaseName, and Extension are all very common properties to use. You should also see NoteProperties and CodeProperties, which are added by the PowerShell provider to make using them easier (they wouldn't be available in a C# program). The definition tells you how to call the method or what the type of the property is and what you can do with it. You can usually Google and find MSDN documentation for how to use them, although it's not always the easiest way to do things.
Compare the above to this:
$x.BaseName | Get-Member;
You can see that it's a String, that there all kinds of methods like Split, Replace, IndexOf, etc.
Another helpful one is:
$x | Select-Object *;
This returns all the Propety, NoteProperty, and CodeProperty values this object has.
This highlights one of the best ways to learn about what you can do with an object. Pipe it to Get-Member, and you learn the type and any methods or properties that you can access. That, combined with piping something to Select-Object *, can tell you a lot about what you're working with.
What problem were you having with .split('-')[1]?
$filenames = #(
'UPMreset-e0155555-20150112-0733',
'UPMreset-n9978524-20150114-1128',
'UPMreset-jsmith-20150113-0840'
)
$filenames |% {$_.split('-')[1]}
e0155555
n9978524
jsmith
It looks like the filenames are always UPMreset-, followed by the username. So use this:
UPMreset-(.+?)-
and the capture group will contain the username. It's using a lazy quantifier to get anything up to the next dash.
You could also do the split in a calculated property with Group-Object:
$FileNames = Get-ChildItem -Path $LogDir -Filter "*.txt" -Name
$FileNames | Group-Object #{Expression={($_ -split "-")[1]}} | Where-Object {$_.Count -gt 1}

Powershell v2. Delete or Remove lines of text. Regex Issue

I've been through so many posts today that offer Powershell examples of how to remove entire lines from a file by using line numbers. Unfortunately none of them do quite what I need, or they have some 'but' type clauses in them.
The closest example I found uses the -remove method. This managed to do the job, until I noticed that not all lines that I was trying to remove, were removed. After some more research I found that -remove is reliant on Regex and Regex does not like certain special characters which happen to be in some of the lines I wish to delete/remove.
The example I'm using is not my own work. user2233949 made it (cheers buddy) I found it very simple though, so I made it into a Function:
Function DeleteLineFromFile ($Path, $File, $LineNumber){
$Contents = Get-Content $Path\$File
$Contents -replace $Contents[$LineNumber -1],"" | Set-Content $Path\$File
}
A great example I reckon. Only regex won't work with the special chars.
The goal is this: To make a function that doesn't care about what is in the line. I wan't to feed it a path, file and line, then have the function delete that line and save the file.
This is fairly easy and does not need regex at all.
Read the file:
$lines = Get-Content $Path\$File
We then have an array that contains the lines in the file. When we have an array we can use indexes to get elements from the array back, e.g.
$lines[4]
would be the fifth line. You can also pass an array into the index to get multiple lines back:
$lines[0,1,5] # returns the first, second and 6th line
$lines[0..5] # returns the first 6 lines
We can make use of that with a little trick. PowerShell's comparison operators, e.g. -eq work differently with an array on the left side, in that they don't return $true or $false, but rather all elements from the array matching the comparison:
1..5 -ge 3 # returns 3,4,5
0..8 -ne 7 # returns 0 through 8, *except* 7
You probably can see where this is going ...
$filteredLines = $lines[0..($lines.Length-1) -ne $LineNumber - 1]
Technically you can ignore the - 1 after $lines.Length because indexing outside of an array simply does nothing. This does actually remove the line you want to remove, though. If you just want it replaced by an empty line (which your code seems to be doing, but it doesn't sound like that's what you want), then this approach won't work.
There are other options, though, e.g. with a ForEach-Object:
Get-Content $Path\$File |
ForEach-Object { $n = 1 } {
if ($n -ne $LineNumber) { $_ } else { '' }
}
A word of advice on writing functions: Usually you don't have separate $Path and $File parameters. They serve no real useful purpose. Every cmdlet uses only a $Path parameter that points to a file if needed. If you need the folder that file resides in, you can always use Split-Path -Parent $Path to get it.