Parsing Data in powershell, with the format of Label:Data - regex

I am doing a Invoke-Webrequest in powershell to an url that does not contain any HTML, just text. I am needing to pick out a specific part of this data that is in the format of Label:Data. Each piece of data is one it's own separate line. I'm looking for some ideas on how to accomplish this. Here is a sample of the $Response.Contentdata below. I am looking to isolate the speed-over-ground:0.0
rate-of-turn:0.0
course-over-ground:293.0
speed-over-ground:0.0
heading-true:243.0
hdop:1.0
active-waypoint-name:
bearing-to-waypoint:
distance-to-waypoint:
cross-track-error:0
cross-track-error-limit:
cross-track-error-scale:0
lateral-speed-bow:0.09
lateral-speed-stern:-0.05
longitudinal-speed:-0.05

I guess it's a single string, rather than an array of lines. So, split it into lines:
$Response.Content -split "`r?`n"
Find the one which says speed-over-ground
$line = $Response.Content -split "`r?`n" | Where-Object { $_ -match 'speed-over-ground' }
Split the text from the number, using the : separator, and take the second item, converted from text to a number if appropriate:
[decimal]$speedOverGround = $line.Split(':')[1]
Although, I might try to turn all of them into an object in a bulk transform. Complexity varies with the exact possible inputs, but this tries to convert numbers to numbers and leave empty ones as nulls:
$data = New-Object -TypeName PSCustomObject
$Response.Content -split "`r?`n" -replace ':\s*$', ':$null' |
ForEach-Object {
$name, $value = $_.Split(':').Trim()
$decimalValue = 0
if ([decimal]::TryParse($value, [ref]$decimalValue))
{
$value = $decimalValue
}
$data | Add-Member -NotePropertyName $name -NotePropertyValue $value
}
# Then you can do:
$data.'speed-over-ground'

Related

Powershell regex to grab data between second set of double quotes

I have a dataset like the below, I've been trying to use split to get the second column of data. Can this be done with a regex? I just need the data in the second column, the unc paths.
"\app\DATA\8161ST1\201901\20190111\2562233\" "\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562233\*.*"
"\app\DATA\8161ST1\201901\20190111\2562234\" "\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562234\*.*"
"\app\DATA\8161ST1\201901\20190111\2562235\" "\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562235\*.*"
A conceptually simple solution:
# Array of input lines, such as would be returned by Get-Content
$lines = #'
"\app\DATA\8161ST1\201901\20190111\2562233\" "\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562233\*.*"
"\app\DATA\8161ST1\201901\20190111\2562234\" "\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562234\*.*"
"\app\DATA\8161ST1\201901\20190111\2562235\" "\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562235\*.*"
'# -split [Environment]::NewLine
# Extract the content of the last "..."-enclosed token from each line.
$lines | ForEach-Object { ($_ -split '"')[-2] }
The above yields:
\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562233\*.*
\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562234\*.*
\\server\i\Run\client\AHFC\201901\app\DATA\8161ST1\201901\20190111\2562235\*.*

Exclude array items based on dynamic criteria

I have an array of strings like:
File1
File2
File1_s1
File2_s1
Print$
PSDrive
PSParentPath
I have a need to select all strings that do not conform to a dynamic set of rules. I don't really need fancy regex, I just want to match a dynamic amount of very simple regex rules. Basically:
$Arr | Where {($_.Name -notlike '_s1') -and ($_.Name -notlike 'Print$')}
But I need a dynamic amount of -ands specified by an input to the function. Is there any easy way to do this?
Ok so you can do this
$omits = "_s1","Print$"
$regex = '({0})' -f (($omits | ForEach-Object{[regex]::Escape($_)}) -join "|")
$arr | Where-Object{$_ -notmatch $regex}
$omits would contain the list of strings you want to -match/-notmatch. Then we take each member and run a regex escape on it ($ is a special regex character. The end of line anchor) The take each scrubbed string and build a matching group. So in the above example $regex would be
(_s1|Print\$)
Add more entries to $omit as you see fit. Which would give the filtered results as
File1
File2
PSDrive
PSParentPath
If you can be trusted to escape your own regex your options open up more.
$omits = "_s1","Print\$","^PS"
$regex = '({0})' -f ($omits -join "|")
That way the PS has to be at the beginning of the string.
As the #Matt suggests, without wildcards in your strings -NotLike is basically equivalent to -ne (and then you could just use -NotIn). I'm assuming your examples are missing wildcards but your actual patterns are not.
$patternArray = 'File1','File2','File1_s1','File2_s1','Print$','PSDrive','PSParentPath';
foreach ($pattern in $patternArray) {
$Arr = $Arr | Where-Object {$_.Name -notlike $pattern};
}
Or you could do some basic matching with:
$patternArray = 'File1','File2','File1_s1','File2_s1','Print$','PSDrive','PSParentPath';
foreach ($pattern in $patternArray) {
$Arr = $Arr | Where-Object {$_.Name -notlike "*$pattern*"};
}

Is there a way to optimise my Powershell function for removing pattern matches from a large file?

I've got a large text file (~20K lines, ~80 characters per line).
I've also got a largish array (~1500 items) of objects containing patterns I wish to remove from the large text file. Note, if the pattern from the array appears on a line in the input file, I wish to remove the entire line, not just the pattern.
The input file is CSVish with lines similar to:
A;AAA-BBB;XXX;XX000029;WORD;WORD-WORD-1;00001;STRING;2015-07-01;;010;
The pattern in the array which I search each line in the input file for resemble the
XX000029
part of the line above.
My somewhat naïve function to achieve this goal looks like this currently:
function Remove-IdsFromFile {
param(
[Parameter(Mandatory=$true,Position=0)]
[string]$BigFile,
[Parameter(Mandatory=$true,Position=1)]
[Object[]]$IgnorePatterns
)
try{
$FileContent = Get-Content $BigFile
}catch{
Write-Error $_
}
$IgnorePatterns | ForEach-Object {
$IgnoreId = $_.IgnoreId
$FileContent = $FileContent | Where-Object { $_ -notmatch $IgnoreId }
Write-Host $FileContent.count
}
$FileContent | Set-Content "CleansedBigFile.txt"
}
This works, but is slow.
How can I make it quicker?
function Remove-IdsFromFile {
param(
[Parameter(Mandatory=$true,Position=0)]
[string]$BigFile,
[Parameter(Mandatory=$true,Position=1)]
[Object[]]$IgnorePatterns
)
# Create the pattern matches
$regex = ($IgnorePatterns | ForEach-Object{[regex]::Escape($_)}) -join "|"
If(Test-Path $BigFile){
$reader = New-Object System.IO.StreamReader($BigFile)
$line=$reader.ReadLine()
while ($line -ne $null)
{
# Check if the line should be output to file
If($line -notmatch $regex){$line | Add-Content "CleansedBigFile.txt"}
# Attempt to read the next line.
$line=$reader.ReadLine()
}
$reader.close()
} Else {
Write-Error "Cannot locate: $BigFile"
}
}
StreamReader is one of the preferred methods to read large text files. We also use regex to build pattern string to match based on. With the pattern string we use [regex]::Escape() as a precaution if regex control characters are present. Have to guess since we only see one pattern string.
If $IgnorePatterns can easily be cast as strings this should working in place just fine. A small sample of what $regex looks like would be:
XX000029|XX000028|XX000027
If $IgnorePatterns is populated from a database you might have less control over this but since we are using regex you might be able to reduce that pattern set by actually using regex (instead of just a big alternative match) like in my example above. You could reduce that to XX00002[7-9] for instance.
I don't know if the regex itself will provide an performance boost with 1500 possibles. The StreamReader is supposed to be the focus here. However I did sully the waters by using Add-Content to the output which does not get any awards for being fast either (could use a stream writer in its place).
Reader and Writer
I still have to test this to be sure it works but this just uses streamreader and streamwriter. If it does work better I am just going to replace the above code.
function Remove-IdsFromFile {
param(
[Parameter(Mandatory=$true,Position=0)]
[string]$BigFile,
[Parameter(Mandatory=$true,Position=1)]
[Object[]]$IgnorePatterns
)
# Create the pattern matches
$regex = ($IgnorePatterns | ForEach-Object{[regex]::Escape($_)}) -join "|"
If(Test-Path $BigFile){
# Prepare the StreamReader
$reader = New-Object System.IO.StreamReader($BigFile)
#Prepare the StreamWriter
$writer = New-Object System.IO.StreamWriter("CleansedBigFile.txt")
$line=$reader.ReadLine()
while ($line -ne $null)
{
# Check if the line should be output to file
If($line -notmatch $regex){$writer.WriteLine($line)}
# Attempt to read the next line.
$line=$reader.ReadLine()
}
# Don't cross the streams!
$reader.Close()
$writer.Close()
} Else {
Write-Error "Cannot locate: $BigFile"
}
}
You might need some error prevention in there for the streams but it does appear to work in place.

Expansion of a variable in a regex pattern doesn't work

As a novice in powershell coding, I have some difficulties with expansion of a variable in PowerShell regex patterns.
What I wanted to do is:
Scan for logfiles that have been changed between two timeframes
For each of the logfiles, I get part of the name which indicates the date it is referencing to.
That date is stored in the variable $filedate.
Then go trough each line logfiles
Whenever I find a line that looks like:
14:00:15 blablabla
In a file named blabla20130620.log
I want that the data line becomes
2013-06-20 14:00:15 blablabla
It should write the output in append mode to a text file (to concatenate different log files)
Here is what I got until now (I'm testing in a sandbox now, so no comments etc...)
$Logpath = "o:\Log"
$prevcheck="2013-06-24 19:27:14"
$currenttd="{0:yyyy-MM-dd HH:mm:ss}" -f (get-date)
$batch = 1000
[regex]$match_regex = '^([01]\d|2[0-3]):([0-5]\d):([0-5]\d)'
If (Test-Path "$Logpath\test.txt"){
Remove-Item "$Logpath\test.txt"
}
$files=Get-ChildItem $LogPath\*.log | Where-Object { $_.LastWriteTime -ge "$prevcheck" - and $_.LastWriteTime -le "$currenttd" -and !$_.PSIsContainer }
foreach ($file in $files)
{
$filedate=$file.Name.Substring(6,4) + "-" + $file.Name.Substring(10,2) + "-" + $file.Name.Substring(12,2)
## This doesn't seem to work fine
## results look like:
## "$filedate" 14:00:15 blablabla
$replace_regex = '"$filedate" $_'
## I tried this too, but without success
## The time seems to dissappear now
## results look like:
## 2013-06-20 blablabla
#$replace_regex = iex('$filedate' + $_)
(Get-Content $file.PSPath -ReadCount $batch) |
foreach-object {if ($_ -match $match_regex) { $_ -replace $match_regex, $replace_regex} else { $_ }}|
out-file -Append "o:\log\test.txt"
You're over-complicating things.
You're comparing dates in your Where-Object filter, so you don't need to transform your reference dates to strings. Just use dates:
$prevcheck = Get-Date "2013-06-24 19:27:14"
$currenttd = Get-Date
You can use a regular expression to extract the date from the file name and transform it into the desired format:
$filedate = $file.BaseName -replace '^.*(\d{4})(\d{2})(\d{2})$', '$1-$2-$3'
Your regular expression for matching the time is overly correct. Use ^(\d{2}:\d{2}:\d{2}) instead. It's a little sloppier, but it will most likely suffice and is a lot easier on the eye.
To prepend the time-match with the date, use "$filedate `$1". The double quotes will cause $filedate to be expanded to the date from the file name, and the escaped $ (``$1`) will keep the grouped match (see Richard's explanation).
While you can assign the results from each step to variables, it'd be simpler to just use a single pipeline.
Try this:
$Logpath = "o:\Log"
$Logfile = "$Logpath\test.txt"
$prevcheck = Get-Date "2013-06-24 19:27:14"
$currenttd = Get-Date
If (Test-Path -LiteralPath $Logfile) { Remove-Item $Logfile }
Get-ChildItem "$LogPath\*.log" | ? {
-not $_.PSIsContainer -and
$_.LastWriteTime -ge $prevcheck -and
$_.LastWriteTime -le $currenttd
} | % {
$filedate = $_.BaseName -replace '^.*(\d{4})(\d{2})(\d{2})$', '$1-$2-$3'
Get-Content $_ | % {
$_ -replace '^(\d{2}:\d{2}:\d{2})', "$filedate `$1"
} | Out-File -Append $Logfile
}
In PowerShell strings have to be in double quotes (") for variable substitution. Single quoted (') strings do not perform variable substitution.
In your script (in which I suggest you indent the content of code blocks to make the structure easier to follow):
$replace_regex = '"$filedate" $_'
where the string is single quoted, so no variable substitution. This can be fixed by remembering the back-quote (`) character can be used to escape double quotes embedded in a double quoted string:
$replace_regex = "`"$filedate`" $_"
But remember:
$ is a regex meta-character, so if you want to include a $ in a regex in double quotes it will need to be escaped to avoid PSH treating it as the start of the variable name.
Any regex meta-characters in the variable will have their regex meaning. Consider escaping the content of the variable before substitution ([regex]::Escape(string)).

Powershell: Leave item alone if regex doesn't match

I have a list of pdf files (from daily processing), some with date stamps of various formatting, some without.
Example:
$f = #("testLtr06-09-02.pdf", "otherletter.pdf","WelcomeLtr043009.pdf")
I am trying to remove the datestamp by stripping out dashes, then replacing any consecutive group of numbers (4 or more, I may change this to 6) with the string "DATESTAMP".
So far I have this:
$d = $f | foreach {$_ -replace "-", ""} | foreach { $_ -replace ([regex]::Matches($_ , "\d{4,}")), "DATESTAMP"}
echo $d
The output:
testLtrDATESTAMP.pdf
DATESTAMPoDATESTAMPtDATESTAMPhDATESTAMPeDATESTAMPrDATESTAMPlDATESTAMPeDATESTAMPtDATESTAMPtDATESTAMPeDATESTAMPrDATESTAMP.DATESTAMPpDATESTAMPdDATESTAMPfDATESTAMP
WelcomeLtrDATESTAMP.pdf
It works fine if the file has a datestamp but it seems to be freaking out the -replace and inserting DATESTAMP after every character. Is there a way to fix this? I tried to change it to a foreach loop but I couldn't figure out how to get true/false from regex.
Thanks in advance.
You can simply do:
PS > $f -replace "(\d{2}-){2}\d{2}|\d{4,}","DATESTAMP"
testLtrDATESTAMP.pdf
otherletter.pdf
WelcomeLtrDATESTAMP.pdf
$_ -replace ([regex]::Matches($_ , "\d{4,}")), "DATESTAMP"
Means in $_ replace every finding of ([regex]::Matches($_ , "\d{4,}")) with "DATESTAMP".
As in a filename with no timestamp (or at least 4 consecutive numbers) there is no match, it returns "" (an empty string).
Thus every empty string gets replaced with DATESTAMP. And such a empty string "" sits at the start of the string and after every other character.
Thats why you get this long string with every character surrounded by DATESTAMP.
To check if there even exists a \d{4,} in your string you should able to use
[regex]::IsMatch($_, "\d{4,}")
I'm no Powershell user but this line alone should do the job. But I'm not sure about being able to use the if in a pipeline and wether or not the assignment and the echo $d are needed
$f | foreach-object {$_ -replace "-", ""} | foreach-object {if ($_ -match "\d{4,}") { $_ -replace "\d{4,}", "DATESTAMP"} else { $_ }}