I have a small process for ingesting a .xel file, converting it to custom objects with a dba-tools module, and then turning them into single-line JSON and exporting them to a file that gets sent off to wherever it goes. Here:
$path = 'C:\temp\big_xe_file.xel'
#Read in file
$xes = Read-DbaXEFile -Path $path
#Output Variable
$file = ""
foreach ($xe in $xes) {
#format date column
$xe.timestamp = ($xe.timestamp.DateTime).ToString("yyyy-MM-ddThh:mm:ss.ffff")
# convert to JSON and change escaped unicode characters back
$xe = (($xe | ConvertTo-Json -compress)) | % { #| % { [System.Text.RegularExpressions.Regex]::Unescape($_) }
[Regex]::Replace($_,
"\\u(?<Value>[a-zA-Z0-9]{4})", {
param($m) ([char]([int]::Parse($m.Groups['Value'].Value,
[System.Globalization.NumberStyles]::HexNumber))).ToString() } )}
#Write line to file
Add-Content -Value "$($xe)`n" -Path 'C:\temp\myevents.json' -Encoding utf8 -NoNewline
}
This fits the bill and does exactly what I need it to. The nasty regex in the middle is because when you convertto-json, it HANDILY escapes all unicode characters, and the regex magically turns them all back to the characters we know and love.
However, it's all a bit too slow. We churn out lots of .xel files, usually 500mb in size, and we would like to have a shorter delay between the traces being written and being ingested. As it stands, it takes ~35 minutes to serially process a file this way. The delay would likely grow if we got behind, which seems likely at that speed.
I've already sped this up quite a bit. I've tried using [System.Text.RegularExpressions.Regex]::Unescape in place of the regex code I have, but it is only slightly faster and does not provide the correct formatting that we need anyway. My next step is to split the files into smaller pieces and process them in parallel, but that would be significantly more CPU intensive and I'd like to avoid that if possible.
Any help optimizing this is much appreciated!
It turns out there was a config issue and we were able to ditch that regex nonsense and leave the escape characters in the JSON. However, I did also find a solution for speeding it up, in case anyone ever sees this. The solution was changing the writer to use a .NET class instead of the powershell method
$stream = [System.IO.StreamWriter] $outfile
foreach ($xe in $xes) {
#format date column
$xe.timestamp = ($xe.timestamp.DateTime).ToString("yyyy-MM-ddThh:mm:ss.ffff")
$xe | Add-Member -MemberType NoteProperty -Name 'source_host_name' -Value $server_name
# convert to JSON and change escaped unicode characters back
$xe = (($xe | ConvertTo-Json -compress)) #| % { #| % { [System.Text.RegularExpressions.Regex]::Unescape($_) }
# [Regex]::Replace($_,
# "\\u(?<Value>[a-zA-Z0-9]{4})", {
# param($m) ([char]([int]::Parse($m.Groups['Value'].Value,
# [System.Globalization.NumberStyles]::HexNumber))).ToString() } )}
#Add-Content -Value "$($xe)`n" -Path 'C:\DBA Notes\Traces\Xel.json' -Encoding utf8 -NoNewline
$stream.WriteLine($xe)
}
$stream.close()
It takes 1/10 the amount of time. Cheers
Related
I have a small script that I can use to find and replace characters or strings in a file. It works and I can use it to replace the non UTF-8 characters.
What I need to do is run the script once and replace all the invalid data in one shot AND create another file that has the File name and bad characters.
Right now I have to run the script over and over with however many invalid characters I can ID by eyeball. Then I edit my tracking file with the contents of the script I ran and the File I ran it against.
Not efficient at all. Just to be clear, I have almost no clue how to code the second part of keeping track of what is corrected.
Can anyone offer a better way of doing this?
Thank you,
-Ron
$old = 'BAD DATA'
$new = ' '
$configFiles = Get-ChildItem . *.* -rec
foreach ($file in $configFiles)
{
(Get-Content $file.PSPath) |
Foreach-Object { $_ -replace "$old", "$new" } |
Set-Content $file.PSPath
}
Here is a sample of my DATA...
"PARTHENIA STREET °212 "," "," "," ","CAUGA PARK "
The data ' °' in HEX is c2 and b0. The original file before FTP is a single byte HEX 09. Not only did it convert wrong it added a btye to the file.
Here's an example translating ebcidic to ascii based on ASCII-to-EBCDIC or EBCDIC-to-ASCII and Working with non-native PowerShell encoding (EBCDIC), but the ebcidic file is completely unrecognizable. It doesn't have a BOM.
The file was downloaded with sftp, but it sounds like it was already corrupted.
"hi`tthere","how`tare" | set-content file.txt # tab 0x09 in the middle
# From ASCII to EBCDIC
$asciibytes = get-content file.txt -Encoding byte
$rawstring = [System.Text.Encoding]::ASCII.GetString($asciibytes)
$ebcdicbytes = [System.Text.Encoding]::GetEncoding('ebcdic-cp-us').getbytes($rawstring)
$ebcdicbytes | set-content ebcidic.txt -Encoding Byte
# From EBCDIC to ASCII
$ebcidicbytes = get-content ebcidic.txt -Encoding byte
$rawstring = [System.Text.Encoding]::getencoding('ebcdic-cp-us').GetString($ebcidicbytes)
$asciibytes = [system.text.encoding]::ASCII.GetBytes($rawstring)
$asciibytes | set-content ascii.txt -Encoding Byte
Here's a script called nonascii.ps1 that strips non-ascii characters (not between space and tilde in the ascii table, and also tab) and writes to the same filename.
(get-content $args[0]) -replace '[^ -~\t]' | set-content $args[0]
Note that powershell 5.1's get-content can't recognize utf8 no bom files without the '-encoding utf8' parameter.
get-content file -encoding utf8
Also note that powershell 6.2 and above can use any encoding known by .net, although tab completion doesn't reflect this:
"hi`tthere" | set-content ebcidic.txt -encoding ebcdic-cp-us
get-content ebcidic.txt -encoding ebcdic-cp-us
My issue us similiar to this question:
Json file to powershell and back to json file
When importing and exporting ARM templates in powershell, using Convert-FromJson and Convert-ToJson, introduces unicode escape sequences.
I used the code here to unescape again.
Some example code (mutltiline for clarity):
$armADF = Get-Content -Path $armFile -Raw | ConvertFrom-Json
$armADFString = $armADF | ConvertTo-Json -Depth 50
$armADFString |
ForEach-Object { [System.Text.RegularExpressions.Regex]::Unescape($_) } |
Out-File $outputFile
Here's the doco I've been reading for Unescape
Results in the the output file being identical except that all instances of literal \n (that were in the original JSON file) are turned into actual carriage returns. Which breaks the ARM template.
If I don't include the Unescape code, the \n are preserved but so are the unicode characters which also breaks the ARM template.
It seems like I need to pre-escape the \n so when I call Unescape they are turned into nice little \n. I've tried a couple of things like adding this before calling unescape.
$armADFString = $armADFString -replace("\\n","\u000A")
Which does not give me the results I need.
Anyone come across this and solved it? Any accomplished escape artists?
I reread the Unescape doco and noticed that it would also basically remove leading \ characters so I tried this unlikely bit of code:
$armADF = Get-Content -Path $armFile -Raw | ConvertFrom-Json
$armADFString = $armADF | ConvertTo-Json -Depth 50
$armADFString = $armADFString -replace("\\n","\\n")
$armADFString |
ForEach-Object { [System.Text.RegularExpressions.Regex]::Unescape($_) } |
Out-File $outputFile
Of course - replacing \\n with \\n makes complete sense :|
More than happy for anyone to pose a more elegant solution.
EDIT: I am deploying ADF ARM templates which are themselves JSON based. TO cut a long story short I also found I needed to add this to stop it unescaping legitimately escaped quotes:
$armADFString = $armADFString -replace('\\"','\\"')
$i=0;$pnp = pnputil -e;
$matched = [regex]::matches($pnp, ".......................................Lexmark International");
$split = $matched -split (".........inf");
$replace = $split -replace " Driver package provider : Lexmark International","";
$replace1 = $replace -replace " ","`n";write-output $replace1;
foreach ($i in $replace1){;$pnpdel = pnputil -f -d $i;$pnpdel;};
Reg delete "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Print\Environments\Windows x64\Drivers\Version-3\Lexmark Universal v2 XL" /f;
net stop spooler;net start spooler;start \\officechicprt5\111W-22E-CPRN-01
As you can hopefully see, my script tries to pull oem*.inf values from a pnpitil -e command. I am trying to get each oem##.inf file to be it's own variable in a For loop. My script is a bit of a mess with all the replaces and splits, but that part seems to get the part of the command that I need. I am having issues with the data in $i. It appears that the script will sometimes work, and sometimes not. I want pnputil -d oem99.inf for each oem# it finds in the pnputil enumeration. What am I doing wrong in my For loop? There has to be a better way... I'm still pretty new to this as you can tell.
Thanks again.
Brent
Leveraging the power in PowerShell we can turn the output of pnputil into an object array that will make it much easier to parse the data you are looking for (since it appears you are looking for something specific).
Each entry is a group of variables with a blank line in-between them. Using that lets turn this data into custom objects.
$rawdata = pnputil -e | Select-Object -Skip 1
$rawdata = $rawdata -join "`r`n" -split "`r`n`r`n"
$entries = $rawdata | ForEach-Object{
$props = $_ -replace ":","=" | ConvertFrom-StringData
new-object -TypeName PSCustomObject -Property $props
}
$rawdata initially contains the text from pnputil -e. We use Select-Object -Skip 1 to remove the "Microsoft PnP Utility" line. Since $rawdata is an array this approach requires that is it one long string so -join "`r`n". Immediately after we split it up into separate array elements of each property group with -split "`r`n`r`n" which splits on the blank line you see in cmd output.
The magic comes from ConvertFrom-StringData which create a hashtable from key value pairs in stings. In needs = to work so we convert the colons as such. With each hashtable that is created we convert it to an object and save it to the variable $entries. $entries will always be an array since it is safe to expect more than one entry.
Sample Entry once converted:
Class : Printers
Driver date and version : 12/03/2014 1.5.0.0
Signer name : Microsoft Windows Hardware Compatibility Publisher
Published name : oem27.inf
Driver package provider : Ricoh
Now we can use PowerShell to filter out exactly what you are looking for!
$entries | Where-Object{$_."Driver package provider" -match "Ricoh"} | Select-Object -ExpandProperty "Published name"
Note that this can also return an array but for me there was only one entry. The output for this was oem27.inf
Then using the information you are actually looking for you can run your other commands.
I have several files in a folder, those are .xml files.
I want to get a value from those files.
A line in the file, could look like this:
<drives name="Virtual HD ATA Device" deviceid="\\.\PHYSICALDRIVE0" interface="IDE" totaldisksize="49,99">
What i'm trying to do is get the value 49,99 in this case.
I am able to get the line out of the file with:
$Strings = Select-String -Path "XML\*.xml" -Pattern totaldisksize
foreach ($String in $Strings) {
Write-Host "Line is" $String
}
But getting just the value in "" i don't get how. I've also played around with
$Strings.totaldisksize
But no dice.
Thanks in advance.
You can do this in one line as follows:
$(select-string totaldisksize .\XML\*.xml).line -replace '.*totaldisksize="(\d+,\d+)".*','$1'
The Select-String will give you a collection of objects that contains information about the match. The line property is the one you're interested in, so you can pull that directly.
Using the -replace operator, every time the .line property is a match of totaldisksize, you can run the regex on it. The $1 replacement will grab the group in the regex, the group being the part in parentheses (\d+,\d+) which will match one or more digits, followed by a comma, followed by one or more digits.
This will print to screen because by default powershell will print an object to the screen. Because you're only accessing the .line property, that's the only bit that's printed and also only after the replacement has been run.
If you wanted to explicitly use a Write-Host to see the results, or do anything else with them, you could store to a variable as follows:
$sizes = $(select-string totaldisksize .\XML\*.xml).line -replace '.*totaldisksize="(\d+,\d+)".*','$1'
$sizes | % { Write-Host $_ }
The above stores the results to an array, $sizes, and you iterate over it by piping it to the Foreach-Object or %. You can then access the array elements with $_ inside the block.
But.. but.. PowerShell knows XML.
$XMLfile = '<drives name="Virtual HD ATA Device" deviceid="\\.\PHYSICALDRIVE0" interface="IDE" totaldisksize="49,99"></drives>'
$XMLobject = [xml]$XMLfile
$XMLobject.drives.totaldisksize
Output
49,99
Or walk the tree and return the content of "drives":
$XMLfile = #"
<some>
<nested>
<tags>
<drives someOther="stuff" totaldisksize="49,99" freespace="22,33">
</drives>
</tags>
</nested>
</some>
"#
$drives = [xml]$XMLfile | Select-Xml -XPath "//drives" | select -ExpandProperty node
Output
PS> $drives
someOther totaldisksize freespace
--------- ------------- ---------
stuff 49,99 22,33
PS> $drives.freespace
22,33
XPath query of "//drives" = Find all nodes named "drives" anywhere in the XML tree.
Reference: Windows PowerShell Cookbook 3rd Edition (Lee Holmes). Page 930.
I am not sure about powershell but if you prefer using python below is the way of doing it.
import re
data = open('file').read()
item = re.findall('.*totaldisksize="([\d,]+)">', data)
print(item[0])
Output
49,99
I am new to scripting, and Powershell. I have been doing some study lately and trying to build a script to find/replace text in a bunch of text files (Each text file having code, not more than 4000 lines). However, I would like to keep the FindString and ReplaceString as variables, for there are multiple values, which can in turn be read from a separate csv file.
I have come up with this code, which is functional, but I would like to know if this is the optimal solution for the aforementioned requirement. I would like to keep the FindString and ReplaceString as regular expression compatible in the script, as I would also like to Find/Replace patterns. (I am yet to test it with Regular Expression Pattern)
Sample contents of Input.csv: (Number of objects in csv may vary from 50 to 500)
FindString ReplaceString
AA1A 171PIT9931A
BB1B 171PIT9931B
CC1C 171PIT9931E
DD1D 171PIT9932A
EE1E 171PIT9932B
FF1F 171PIT9932E
GG1G 171PIT9933A
The Code
$Iteration = 0
$FDPATH = 'D:\opt\HMI\Gfilefind_rep'
#& 'D:\usr\fox\wp\bin\tools\fdf_g.exe' $FDPATH\*.fdf
$GraphicsList = Get-ChildItem -Path $FDPATH\*.g | ForEach-Object FullName
$FindReplaceList = Import-Csv -Path $FDPATH\Input.csv
foreach($Graphic in $Graphicslist){
Write-Host "Processing Find Replace on : $Graphic"
foreach($item in $FindReplaceList){
Get-Content $Graphic | ForEach-Object { $_ -replace "$($item.FindString)", "$($item.ReplaceString)" } | Set-Content ($Graphic+".tmp")
Remove-Item $Graphic
Rename-Item ($Graphic+".tmp") $Graphic
$Iteration = $Iteration +1
Write-Host "String Replace Completed for $($item.ReplaceString)"
}
}
I have gone through other posts here in Stackoverflow, and gathered valuable inputs, based on which the code was built. This post from Ivo Bosticky came pretty close to my requirement, but I had to perform the same on a nested foreach loop with Find/Replace Strings as Variables reading from an external source.
To summarize,
I would like to know if the above code can be optimized for
execution, since I feel it takes a long time to execute. (I prefer
not using aliases for now, as I am just starting out, and am fine
with a long and functional script rather than a concise one which is
hard to understand)
I would like to add the number of Iterations being carried out in
the loop. I was able to add the current Iteration number onto the
console, but couldn't figure how to pipe the output of
Measure-Command onto a variable, which could be used in Write-Host
Command. I would also like to display the time taken for code
execution, on completion.
Thanks for the time taken to read this Query. Much appreciate your support!
First of all, unless your replacement string is going to contain newlines (which would change the line boundaries), I would advise getting and setting each $Graphic file's contents only once, and doing all replacements in a single pass. This will also result in fewer file renames and deletions.
Second, it would be (probably marginally) faster to pass $item.FindString and $item.ReplaceString directly to the -replace operator rather than invoking the templating engine to inject the values into string literals.
Third, unless you truly need the output to go directly to the console instead of going to the normal output stream, I would avoid Write-Host. See Write-Host Considered Harmful.
And fourth, you might actually want to remove the Write-Host that gets called for every find and replace, as it may have a fair bit of effect on the overall execution time, depending on how many replacements there are.
You'd end up with something like this:
$timeTaken = (measure-command {
$Iteration = 0
$FDPATH = 'D:\opt\HMI\Gfilefind_rep'
#& 'D:\usr\fox\wp\bin\tools\fdf_g.exe' $FDPATH\*.fdf
$GraphicsList = Get-ChildItem -Path $FDPATH\*.g | ForEach-Object FullName
$FindReplaceList = Import-Csv -Path $FDPATH\Input.csv
foreach($Graphic in $Graphicslist){
Write-Output "Processing Find Replace on : $Graphic"
Get-Content $Graphic | ForEach-Object {
foreach($item in $FindReplaceList){
$_ = $_ -replace $item.FindString, $item.ReplaceString
}
$Iteration += 1
$_
} | Set-Content ($Graphic+".tmp")
Remove-Item $Graphic
Rename-Item ($Graphic+".tmp") $Graphic
}
}).TotalMilliseconds
I haven't tested it but it should run a fair bit faster, plus it will save the elapsed time to a variable.