Skip Header Row in a High Performance Powershell Regex Script Block - regex
I received some amazing help from Stack Overflow ... however ... it was so amazing I need a little more help to get to closer to the finish line. I'm parsing multiple enormous 4GB files 2X per month. I need be able to be able to skip the header, count the total lines, matched lines, and the not matched lines. I'm sure this is super-simple for a PowerShell superstar, but at my newbie PS level my skills are not yet strong. Perhaps a little help from you would save the week. :)
Data Sample:
ID FIRST_NAME LAST_NAME COLUMN_NM_TOO_LON5THCOLUMN
10000000001MINNIE MOUSE COLUMN VALUE LONGSTARTS
10000000002MICKLE ROONEY MOUSE COLUMN VALUE LONGSTARTS
Code Block (based on this answer):
#$match_regex matches each fixed length field by length; the () specifies that each matched field be stored in a capture group:
[regex]$match_regex = '^(.{10})(.{50})(.{50})(.{50})(.{50})(.{3})(.{8})(.{4})(.{50})(.{2})(.{30})(.{6})(.{3})(.{4})(.{25})(.{2})(.{10})(.{3})(.{8})(.{4})(.{50})(.{2})(.{30})(.{6})(.{3})(.{2})(.{25})(.{2})(.{10})(.{3})(.{10})(.{10})(.{10})(.{2})(.{10})(.{50})(.{50})(.{50})(.{50})(.{8})(.{4})(.{50})(.{2})(.{30})(.{6})(.{3})(.{2})(.{25})(.{2})(.{10})(.{3})(.{4})(.{2})(.{4})(.{10})(.{38})(.{38})(.{15})(.{1})(.{10})(.{2})(.{10})(.{10})(.{10})(.{10})(.{38})(.{38})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})$'
Measure-Command {
& {
switch -File $infile -Regex {
$match_regex {
# Join what all the capture groups matched with a tab char.
$Matches[1..($Matches.Count-1)].Trim() -join "`t"
}
}
} | Out-File $outFile
}
You only need to keep track of two counts - matched, and unmatched lines - and then a Boolean to indicate whether you've skipped the first line
$first = $false
$matched = 0
$unmatched = 0
. {
switch -File $infile -Regex {
$match_regex {
if($first){
# Join what all the capture groups matched with a tab char.
$Matches[1..($Matches.Count-1)].Trim() -join "`t"
$matched++
}
$first = $true
}
default{
$unmatched++
# you can remove this, if the pattern always matches the header
$first = $true
}
}
} | Out-File $outFile
$total = $matched + $unmatched
Using System.IO.StreamReader reduced the processing time to about 20% of what it had been. This was absolutely needed for my requirement.
I added logic and counters without sacrificing much on performance. The field counter and row by row comparison is particularly helpful in finding bad records.
This is a copy/paste of actual code but I shortened some things, made some things slightly pseudo code, so you may have to play with it to get things working just so for yourself.
Function Get-Regx-Data-Format() {
Param ([String] $filename)
if ($filename -eq 'FILE NAME') {
[regex]$match_regex = '^(.{10})(.{10})(.{10})(.{30})(.{30})(.{30})(.{4})(.{1})'
}
return $match_regex
}
Foreach ($file in $cutoff_files) {
$starttime_for_file = (Get-Date)
$source_file = $file + '_' + $proc_yyyymm + $source_file_suffix
$source_path = $source_dir + $source_file
$parse_file = $file + '_' + $proc_yyyymm + '_load' +$parse_target_suffix
$parse_file_path = $parse_target_dir + $parse_file
$error_file = $file + '_err_' + $proc_yyyymm + $error_target_suffix
$error_file_path = $error_target_dir + $error_file
[regex]$match_data_regex = Get-Regx-Data-Format $file
Remove-Item -path "$parse_file_path" -Force -ErrorAction SilentlyContinue
Remove-Item -path "$error_file_path" -Force -ErrorAction SilentlyContinue
[long]$matched_cnt = 0
[long]$unmatched_cnt = 0
[long]$loop_counter = 0
[boolean]$has_header_row=$true
[int]$field_cnt=0
[int]$previous_field_cnt=0
[int]$array_length=0
$parse_minutes = Measure-Command {
try {
$stream_log = [System.IO.StreamReader]::new($source_path)
$stream_in = [System.IO.StreamReader]::new($source_path)
$stream_out = [System.IO.StreamWriter]::new($parse_file_path)
$stream_err = [System.IO.StreamWriter]::new($error_file_path)
while ($line = $stream_in.ReadLine()) {
if ($line -match $match_data_regex) {
#if matched and it's the header, parse and write to the beg of output file
if (($loop_counter -eq 0) -and $has_header_row) {
$stream_out.WriteLine(($Matches[1..($array_length)].Trim() -join "`t"))
} else {
$previous_field_cnt = $field_cnt
#add year month to line start, trim and join every captured field w/tabs
$stream_out.WriteLine("$proc_yyyymm`t" + `
($Matches[1..($array_length)].Trim() -join "`t"))
$matched_cnt++
$field_cnt=$Matches.Count
if (($previous_field_cnt -ne $field_cnt) -and $loop_counter -gt 1) {
write-host "`nError on line $($loop_counter + 1). `
The field count does not match the previous correctly `
formatted (non-error) row."
}
}
} else {
if (($loop_counter -eq 0) -and $has_header_row) {
#if the header, write to the beginning of the output file
$stream_out.WriteLine($line)
} else {
$stream_err.WriteLine($line)
$unmatched_cnt++
}
}
$loop_counter++
}
} finally {
$stream_in.Dispose()
$stream_out.Dispose()
$stream_err.Dispose()
$stream_log.Dispose()
}
} | Select-Object -Property TotalMinutes
write-host "`n$file_list_idx. File $file parsing results....`nMatched Count =
$matched_cnt UnMatched Count = $unmatched_cnt Parse Minutes = $parse_minutes`n"
$file_list_idx++
$endtime_for_file = (Get-Date)
write-host "`nEnded processing file at $endtime_for_file"
$TimeDiff_for_file = (New-TimeSpan $starttime_for_file $endtime_for_file)
$Hrs_for_file = $TimeDiff_for_file.Hours
$Mins_for_file = $TimeDiff_for_file.Minutes
$Secs_for_file = $TimeDiff_for_file.Seconds
write-host "`nElapsed Time for file $file processing:
$Hrs_for_file`:$Mins_for_file`:$Secs_for_file"
}
$endtime = (Get-Date -format "HH:mm:ss")
$TimeDiff = (New-TimeSpan $starttime $endtime)
$Hrs = $TimeDiff.Hours
$Mins = $TimeDiff.Minutes
$Secs = $TimeDiff.Seconds
write-host "`nTotal Elapsed Time: $Hrs`:$Mins`:$Secs"
Related
How do I use the values (read from a .ps1 file) to update the values of another .ps1 file
I have a 4.ps1 file that looks like this #akabradabra $one = 'o' #bibi $two = 't' $three = 't' #ok thr #four $four = 'four' And a 3.ps1 file that looks like this #akabradabra $one = 'one' #biblibablibo $two = 'two' $three = 'three' #ok threer My goal is to read the key-value pair from 4.ps1 and update the values in 3.ps1 and if new key-value pairs are introduced in 4.ps1, simply append them to the end of 3.ps1. My idea is to use string functions such as .Split('=') and .Replace(' ', '') to extract the keys and if the keys match, replace the entire line in 3.ps1 with the one found in 4.ps1 I know that using Get-Variable might does the trick and also it will be a lot easier to work with the data if I convert all the key-value pairs into a .xml or a .json file but can anyone please show me how can I make it work in my own silly way? Here is my code to do so # Ignore this function, this is used to skip certain key-value pairs #---------------------------------------------------------------------------- Function NoChange($something) { switch ($something) { '$CurrentPath' {return $true} '$pathToAdmin' {return $true} '$hostsPathTocompare' {return $true} '$logs' {return $true} '$LogFile' {return $true} default {return $false} } } #---------------------------------------------------------------------------- $ReadFromVARS = Get-Content $PSScriptRoot\4.ps1 $WriteToVARS = Get-Content $PSScriptRoot\3.ps1 foreach ($oldVar in $ReadFromVARS) { if (('' -eq $oldVar) -or ($oldVar -match '\s*#+\w*')) { continue } elseif ((NoChange ($oldVar.Split('=').Replace(' ', '')[0]))) { continue } else { $var = 0 #$flag = $false while ($var -ne $WriteToVARS.Length) { if ($WriteToVARS[$var] -eq '') { $var += 1 continue } elseif ($WriteToVARS[$var] -match '\s*#+\w*') { $var += 1 continue } elseif ($oldVar.Split('=').Replace(' ', '')[0] -eq $WriteToVARS[$var].Split('=').Replace(' ', '')[0]<# -and !$flag#>) { $oldVar $WriteToVARS.replace($WriteToVARS[$var], $oldVar) | Set-Content -Path $PSScriptRoot\3.ps1 -Force break #$var += 1 #$flag = $true } elseif (<#!$flag -and #>($var -eq $WriteToVARS.Length)) { Add-Content -Path $PSScriptRoot\3.ps1 -Value $oldVar -Force $var += 1 } else { $var += 1 } } } } I did not ran into any errors but it only updated one key-value pair ($two = t) and it did not append new key-value pairs at the end. Here is the result I got #akabradabra $one = 'one' #biblibablibo $two = 't' $three = 'three' #ok threer
If I understand your question correctly, I think Dot-Sourcing is what you're after. The PowerShell dot-source operator brings script files into the current session scope. It is a way to reuse script. All script functions and variables defined in the script file become part of the script it is dot sourced into. It is like copying and pasting text from the script file directly into your script. To make it visible, use Dot-Sourcing to read in the variables from file 3.ps1, show the variables and their values. Next dot-source file 4.ps1 and show the variables again: . 'D:\3.ps1' Write-Host "Values taken from file 3.ps1" -ForegroundColor Yellow "`$one : $one" "`$two : $two" "`$three : $three" "`$four : $four" # does not exist yet . 'D:\4.ps1' Write-Host "Values after dot-sourcing file 4.ps1" -ForegroundColor Yellow "`$one : $one" "`$two : $two" "`$three : $three" "`$four : $four" The result is Values taken from file 3.ps1 $one : one $two : two $three : three $four : Values after dot-sourcing file 4.ps1 $one : o $two : t $three : t $four : four If you want to write these variables back to a ps1 script file you can: 'one','two','three','four' | Get-Variable | ForEach-Object { '${0} = "{1}"' -f $_.Name,$_.Value } | Set-Content 'D:\5.ps1' -Force
Theo's answer provides a easier way to do the same thing Also, converting your Config files to JSON or XML will make the job lot more easier too My original idea was to read both 4.ps1 and 3.ps1 ( these are my config files, I only store variables inside and switch statement to help choosing the correct variables ) then overwrite 3.ps1 with all the difference found but I could not get it working so I created a new 5.ps1 and just simply write everything I need to 5.ps1. Here is my code if you would like to use it for your own project :-) The obstacles for me were that I had switch statements and certain $variables that I wanted to ignore (in my actual project) so I used some Regex to avoided it. $ReadFromVARS = Get-Content $PSScriptRoot\4.ps1 $WriteToVARS = Get-Content $PSScriptRoot\3.ps1 New-Item -ItemType File -Path $PSScriptRoot\5.ps1 -Force Function NoChange($something) { switch ($something) { '$CurrentPath' {return $true} '$pathToAdmin' {return $true} '$hostsPathTocompare' {return $true} '$logs' {return $true} '$LogFile' {return $true} default {return $false} } } $listOfOldVars = #() $switchStatementStart = "^switch(\s)*\(\`$(\w)+\)(\s)*(\n)*\{" $switchStatementContent = "(\s)*(\n)*(\t)*\'\w+(\.\w+)+\'(\s)*\{(\s)*\`$\w+(\s)*=(\s)*\#\((\s)*\'\w+(\.\w+)+\'(\s)*(,(\s)*\'\w+(\.\w+)+\'(\s)*)*\)\}" $switchStatementDefault = "(\s)*(\n)*(\t)*Default(\s)*\{\`$\w+(\s)*=(\s)*\#\((\s)*\'\w+(\.\w+)+\'(\s)*(,(\s)*\'\w+(\.\w+)+\'(\s)*)*\)\}\}" $switchStatementEnd = "(\s)*(\n)*(\t)*\}" foreach ($oldVar in $ReadFromVARS) { if (('' -eq $oldVar) -or ($oldVar -match '^#+\w*')) { continue } elseif ((NoChange $oldVar.Split('=').Replace(' ', '')[0])) { continue } else { $var = 0 while ($var -ne $WriteToVARS.Length) { if ($WriteToVARS[$var] -eq '') { $var += 1 continue } elseif ($WriteToVARS[$var] -match '^#+\w*') { $var += 1 continue } elseif ($oldVar -match $switchStatementStart -or $oldVar -match $switchStatementContent -or $oldVar -match $switchStatementDefault -or $oldVar -match $switchStatementEnd) { Add-Content -Path "$PSScriptRoot\5.ps1" -Value $oldVar -Force $listOfOldVars += ($oldVar) break } elseif ($oldVar.Split('=').Replace(' ', '')[0] -eq $WriteToVARS[$var].Split('=').Replace(' ', '')[0]) { Add-Content -Path "$PSScriptRoot\5.ps1" -Value $oldVar -Force $listOfOldVars += ($oldVar.Remove(0,1).Split('=').Replace(' ', '')[0]) break } else { $var += 1 } } } } foreach ($newVar in $WriteToVARS) { if ($newVar.StartsWith('#') -or $newVar -eq '') { continue } elseif ($newVar -match $switchStatementStart -or $newVar -match $switchStatementContent -or $newVar -match $switchStatementDefault -or $newVar -match $switchStatementEnd) { } elseif (($newVar.Remove(0,1).Split('=').Replace(' ', '')[0]) -in $listOfOldVars) { continue } else { Add-Content -Path "$PSScriptRoot\5.ps1" -Value $newVar -Force } }
Loop through a text file and Extract a set of 100 IP's from a text file and output to separate text files
I have a text file that contains around 900 IP's. I need to create batch of 100 IP's from that file and output them into new files. That would create around 9 text files. Our API only allows to POST 100 IP's at a time. Could you please help me out here? Below is the format of the text file 10.86.50.55,10.190.206.20,10.190.49.31,10.190.50.117,10.86.50.57,10.190.49.216,10.190.50.120,10.190.200.27,10.86.50.58,10.86.50.94,10.190.38.181,10.190.50.119,10.86.50.53,10.190.50.167,10.190.49.30,10.190.49.89,10.190.50.115,10.86.50.54,10.86.50.56,10.86.50.59,10.190.50.210,10.190.49.20,10.190.50.172,10.190.49.21,10.86.49.18,10.190.50.173,10.86.49.49,10.190.50.171,10.190.50.174,10.86.49.63,10.190.50.175,10.13.12.200,10.190.49.27,10.190.49.19,10.86.49.29,10.13.12.201,10.86.49.28,10.190.49.62,10.86.50.147,10.86.49.24,10.86.50.146,10.190.50.182,10.190.50.25,10.190.38.252,10.190.50.57,10.190.50.54,10.86.50.78,10.190.50.23,10.190.49.8,10.86.50.80,10.190.50.53,10.190.49.229,10.190.50.58,10.190.50.130,10.190.50.22,10.86.52.22,10.19.68.61,10.41.43.130,10.190.50.56,10.190.50.123,10.190.49.55,10.190.49.66,10.190.49.68,10.190.50.86,10.86.49.113,10.86.49.114,10.86.49.101,10.190.50.150,10.190.49.184,10.190.50.152,10.190.50.151,10.86.49.43,10.190.192.25,10.190.192.23,10.190.49.115,10.86.49.44,10.190.38.149,10.190.38.151,10.190.38.150,10.190.38.152,10.190.38.145,10.190.38.141,10.190.38.148,10.190.38.142,10.190.38.144,10.190.38.147,10.190.38.143,10.190.38.146,10.190.192.26,10.190.38.251,10.190.49.105,10.190.49.110,10.190.49.137,10.190.49.242,10.190.50.221,10.86.50.72,10.86.49.16,10.86.49.15,10.190.49.112,10.86.49.32,10.86.49.11,10.190.49.150,10.190.49.159,10.190.49.206,10.86.52.28,10.190.49.151,10.190.49.207,10.86.49.19,10.190.38.103,10.190.38.101,10.190.38.116,10.190.38.120,10.190.38.102,10.190.38.123,10.190.38.140,10.190.198.50,10.190.38.109,10.190.38.108,10.190.38.111,10.190.38.112,10.190.38.113,10.190.38.114,10.190.49.152,10.190.50.43,10.86.49.23,10.86.49.205,10.86.49.220,10.190.50.230,10.190.192.238,10.190.192.237,10.190.192.239,10.190.50.7,10.190.50.10,10.86.50.86,10.190.38.125,10.190.38.127,10.190.38.126,10.190.50.227,10.190.50.149,10.86.49.59,10.190.49.158,10.190.49.157,10.190.44.11,10.190.38.124,10.190.50.153,10.190.49.40,10.190.192.235,10.190.192.236,10.190.50.241,10.190.50.240,10.86.46.8,10.190.38.234,10.190.38.233,10.86.50.163,10.86.50.180,10.86.50.164,10.190.49.245,10.190.49.244,10.190.192.244,10.190.38.130,10.86.49.142,10.86.49.102,10.86.49.141,10.86.49.67,10.190.50.206,10.190.192.243,10.190.192.241 I tried looking online to come up with a bit of working code but can't really think what would best work in this situation $IP = 'H:\IP.txt' $re = '\d*.\d*.\d*.\d*,' Select-String -Path $IP -Pattern $re -AllMatches | Select-Object -Expand Matches | ForEach-Object { $Out = 'C:\path\to\out.txt' -f | Set-Content $clientlog }
This will do what you are after $bulkIP = (get-content H:\IP.txt) -split ',' $i = 0 # Created loop Do{ # Completed an action every 100 counts (including 0) If(0 -eq $i % 100) { # If the array is a valid entry. Removing this will usually end up creating an empty junk file called -1 or something If($bulkIP[$i]) { # outputs 100 lines into a folder with the starting index as the name. # Eg. The first 1-100, the file would be called 1.txt. 501-600 would be called 501.txt etc $bulkIP[$($i)..$($i+99)] | Out-File "C:\path\to\$($bulkip.IndexOf($bulkip[$($i)+1])).txt" } } $i++ }While($i -le 1000)
what this does ... calculates the number of batches calcs the start & end index of each batch creates a range from the above creates a PSCustomObject to hold each batch creates an array slice from the range sends that out to the collection $Var shows what is in the collection & in the 1st batch from that collection here's the code ... # fake reading in a raw text file # in real life, use Get-Content -Raw $InStuff = #' 10.86.50.55,10.190.206.20,10.190.49.31,10.190.50.117,10.86.50.57,10.190.49.216,10.190.50.120,10.190.200.27,10.86.50.58,10.86.50.94,10.190.38.181,10.190.50.119,10.86.50.53,10.190.50.167,10.190.49.30,10.190.49.89,10.190.50.115,10.86.50.54,10.86.50.56,10.86.50.59,10.190.50.210,10.190.49.20,10.190.50.172,10.190.49.21,10.86.49.18,10.190.50.173,10.86.49.49,10.190.50.171,10.190.50.174,10.86.49.63,10.190.50.175,10.13.12.200,10.190.49.27,10.190.49.19,10.86.49.29,10.13.12.201,10.86.49.28,10.190.49.62,10.86.50.147,10.86.49.24,10.86.50.146,10.190.50.182,10.190.50.25,10.190.38.252,10.190.50.57,10.190.50.54,10.86.50.78,10.190.50.23,10.190.49.8,10.86.50.80,10.190.50.53,10.190.49.229,10.190.50.58,10.190.50.130,10.190.50.22,10.86.52.22,10.19.68.61,10.41.43.130,10.190.50.56,10.190.50.123,10.190.49.55,10.190.49.66,10.190.49.68,10.190.50.86,10.86.49.113,10.86.49.114,10.86.49.101,10.190.50.150,10.190.49.184,10.190.50.152,10.190.50.151,10.86.49.43,10.190.192.25,10.190.192.23,10.190.49.115,10.86.49.44,10.190.38.149,10.190.38.151,10.190.38.150,10.190.38.152,10.190.38.145,10.190.38.141,10.190.38.148,10.190.38.142,10.190.38.144,10.190.38.147,10.190.38.143,10.190.38.146,10.190.192.26,10.190.38.251,10.190.49.105,10.190.49.110,10.190.49.137,10.190.49.242,10.190.50.221,10.86.50.72,10.86.49.16,10.86.49.15,10.190.49.112,10.86.49.32,10.86.49.11,10.190.49.150,10.190.49.159,10.190.49.206,10.86.52.28,10.190.49.151,10.190.49.207,10.86.49.19,10.190.38.103,10.190.38.101,10.190.38.116,10.190.38.120,10.190.38.102,10.190.38.123,10.190.38.140,10.190.198.50,10.190.38.109,10.190.38.108,10.190.38.111,10.190.38.112,10.190.38.113,10.190.38.114,10.190.49.152,10.190.50.43,10.86.49.23,10.86.49.205,10.86.49.220,10.190.50.230,10.190.192.238,10.190.192.237,10.190.192.239,10.190.50.7,10.190.50.10,10.86.50.86,10.190.38.125,10.190.38.127,10.190.38.126,10.190.50.227,10.190.50.149,10.86.49.59,10.190.49.158,10.190.49.157,10.190.44.11,10.190.38.124,10.190.50.153,10.190.49.40,10.190.192.235,10.190.192.236,10.190.50.241,10.190.50.240,10.86.46.8,10.190.38.234,10.190.38.233,10.86.50.163,10.86.50.180,10.86.50.164,10.190.49.245,10.190.49.244,10.190.192.244,10.190.38.130,10.86.49.142,10.86.49.102,10.86.49.141,10.86.49.67,10.190.50.206,10.190.192.243,10.190.192.241 '# $SplitInStuff = $InStuff.Split(',') $BatchSize = 25 $BatchCount = [math]::Truncate($SplitInStuff.Count / $BatchSize) + 1 $Start = $End = 0 $Result = foreach ($BC_Item in 1..$BatchCount) { $Start = $End if ($BC_Item -eq 1) { $End = $Start + $BatchSize - 1 } else { $End = $Start + $BatchSize } $Range = $Start..$End [PSCustomObject]#{ IP_List = $SplitInStuff[$Range] } } $Result '=' * 20 $Result[0] '=' * 20 $Result[0].IP_List.Count '=' * 20 $Result[0].IP_List screen output ... IP_List ------- {10.86.50.55, 10.190.206.20, 10.190.49.31, 10.190.50.117...} {10.86.49.18, 10.190.50.173, 10.86.49.49, 10.190.50.171...} {10.86.50.80, 10.190.50.53, 10.190.49.229, 10.190.50.58...} {10.190.49.115, 10.86.49.44, 10.190.38.149, 10.190.38.151...} {10.86.49.32, 10.86.49.11, 10.190.49.150, 10.190.49.159...} {10.86.49.23, 10.86.49.205, 10.86.49.220, 10.190.50.230...} {10.190.50.240, 10.86.46.8, 10.190.38.234, 10.190.38.233...} ==================== {10.86.50.55, 10.190.206.20, 10.190.49.31, 10.190.50.117...} ==================== 25 ==================== 10.86.50.55 10.190.206.20 10.190.49.31 10.190.50.117 10.86.50.57 10.190.49.216 10.190.50.120 10.190.200.27 10.86.50.58 10.86.50.94 10.190.38.181 10.190.50.119 10.86.50.53 10.190.50.167 10.190.49.30 10.190.49.89 10.190.50.115 10.86.50.54 10.86.50.56 10.86.50.59 10.190.50.210 10.190.49.20 10.190.50.172 10.190.49.21 10.86.49.18
try this $cpt=0 $Rang=1 #remove old file Get-ChildItem "H:\FileIP_*.txt" -file | Remove-Item -Force (Get-Content "H:\IP.txt") -split ',' | %{ if (!($cpt++ % 100)) {$FileResult="H:\FileIP_{0:D3}.txt" -f $Rang++} # build filename if cpt divisile by 100 $_ | Out-File $FileResult -Append }
Add capture group values in a PowerShell replace loop
Needing to replace a string in multiple text files with the same string , except with capture group 2 replaced by the sum of itself and capture group 4. String: Total amount $11.39 | Change $0.21 Desired Result: Total amount $11.60 | Change $0.21 I have attempted several methods. Here is my last attempt which seems to run without error, but without any changes to the string . $Originalfolder = "$ENV:userprofile\Documents\folder\" $Originalfiles = Get-ChildItem -Path "$Originalfolder\*" $RegexPattern = '\b(Total\s\amount\s\$)(\d?\d?\d?\d?\d\.?\d?\d?)(\s\|\sChange\s\$)(\d?\d?\d\.?\d?\d?)\b' $Substitution = { Param($Match) $Result = $GP1 + $Sumtotal + $GP3 + $Change $GP1 = $Match.Groups[1].Value $Total = $Match.Groups[2].Value $GP3 = $Match.Groups[3].Value $Change = $Match.Groups[4].Value $Sumtotal = ($Total + $Change) return [string]$Result } foreach ($file in $Originalfiles) { $Lines = Get-Content $file.FullName $Lines | ForEach-Object { [Regex]::Replace($_, $RegexPattern, $Substitution) } | Set-Content $file.FullName }
For one thing, your regular expression doesn't even match what you're trying to replace, because you escaped the a in amount: \b(Total\s\amount\s\$)(\d?\d?\d?... # ^^ \a is an escape sequence that matches the "alarm" or "bell" character \u0007. Also, if you want to calculate the sum of two captures you need to convert them to numeric values first, otherwise the + operator would just concatenate the two strings. $Total = $Match.Groups[2].Value $Change = $Match.Groups[4].Value $Sumtotal = $Total + $Change # gives 11.390.21 $Sumtotal = [double]$Total + [double]$Change # gives 11.6 And you need to build $Result after you defined the other variables, otherwise the replacement function would just return an empty string. Change this: $RegexPattern = '\b(Total\s\amount\s\$)(\d?\d?\d?\d?\d\.?\d?\d?)(\s\|\sChange\s\$)(\d?\d?\d\.?\d?\d?)\b' $Substitution = { param ($Match) $Result = $GP1 + $Sumtotal + $GP3 + $Change $GP1 = $Match.Groups[1].Value $Total = $Match.Groups[2].Value $GP3 = $Match.Groups[3].Value $Change = $Match.Groups[4].Value $Sumtotal = ($Total + $Change) return [string]$Result } into this: $RegexPattern = '\b(Total\samount\s\$)(\d?\d?\d?\d?\d\.?\d?\d?)(\s\|\sChange\s\$)(\d?\d?\d\.?\d?\d?)\b' $Substitution = { Param($Match) $GP1 = $Match.Groups[1].Value $Total = [double]$Match.Groups[2].Value $GP3 = $Match.Groups[3].Value $Change = [double]$Match.Groups[4].Value $Sumtotal = ($Total + $Change) $Result = $GP1 + $Sumtotal + $GP3 + $Change return [string]$Result } and the code will mostly do what you want. "Mostly", because it will not format the calculated number to double decimals. You need to do that yourself. Use the format operator (-f) and change your replacement function to something like this: $Substitution = { Param($Match) $GP1 = $Match.Groups[1].Value $Total = [double]$Match.Groups[2].Value $GP3 = $Match.Groups[3].Value $Change = [double]$Match.Groups[4].Value $Sumtotal = $Total + $Change return ('{0}{1:n2}{2}{3:n2}' -f $GP1, $Sumtotal, $GP3, $Change) } As a side note: the sub-expression \d?\d?\d?\d?\d\.?\d?\d? could be shortened to \d+(?:\.\d+)? (one or more digit, optionally followed by a period and one or more digits) or, more exactly, to \d{1,4}(?:\.\d{0,2})? (one to four digits, optionally followed by a period and up to 2 digits).
here's how I'd do it: this is pulled out of a larger script that regularly scans a directory for files, then does a similar manipulation, and I've changed variables quickly to obfuscate, so shout if it doesn't work and I'll take a more detailed look tomorrow. It takes a backup of each file as well, and works on a temp copy before renaming. Note it also sends an email alert (code at the end) to say if any processing was done - this is because it's designed to run as as scheduled task in the original $backupDir = "$pwd\backup" $stringToReplace = "." $newString = "." $files = #(Get-ChildItem $directoryOfFiles) $intFiles = $files.count $tmpExt = ".tmpDataCorrection" $DataCorrectionAppend = ".DataprocessBackup" foreach ($file in $files) { $content = Get-Content -Path ( $directoryOfFiles + $file ) # Check whether there are any instances of the string If (!($content -match $stringToReplace)) { # Do nothing if we didn't match } Else { #Create another blank temporary file which the corrected file contents will be written to $tmpFileName_DataCorrection = $file.Name + $tmpExt_DataCorrection $tmpFile_DataCorrection = $directoryOfFiles + $tmpFileName_DataCorrection New-Item -ItemType File -Path $tmpFile_DataCorrection foreach ( $line in $content ) { If ( $line.Contains("#")) { Add-Content -Path $tmpFile_DataCorrection -Value $line.Replace($stringToReplace,$newString) #Counter to know whether any processing was done or not $processed++ } Else { Add-Content -Path $tmpFile_DataCorrection -Value $line } } #Backup (rename) the original file, and rename the temp file to be the same name as the original Rename-Item -Path $file.FullName -NewName ($file.FullName + $DataCorrectionAppend) -Force -Confirm:$false Move-Item -Path ( $file.FullName + $DataCorrectionAppend ) -Destination backupDir -Force -Confirm:$false Rename-Item -Path $tmpFile_DataCorrection -NewName $file.FullName -Force -Confirm:$false # Check to see if anything was done, then populate a variable to use in final email alert if there was If (!$processed) { #no message as did nothing } Else { New-Variable -Name ( "processed" + $file.Name) -Value $strProcessed } } # Out of If loop }
Powershell log tailer issues
I have written a log tailer with Powershell, the tailer loads in an xml file which contains configuration information regarding when to report on a word match in the log tail (basically if certain patterns occur X amount of times in the tail). At the moment the tailer is not returning matches for many of the lines that contain matches. For example we are retrieving a log file with many INFO lines, if I check for the word INFO nothing is detected, however if I look for the work shutdown it returns matches (the line with shutdown also contains INFO on the line). The really strange thing is that using the same log file and same Powershell script seems to produce perfectly accurate results on my own machine but behaves strangely on the server. I suspect that this might be an issue with the version of Powershell that is running on the server, so I was hoping someone here might know of issues that can come up with different versions. I have also noticed that when I print out the number of matches, if nothing is found the output is blank, perhaps this should be 0 and is causing some weird issue to trigger? function Main() { #### GLOBAL SETTINGS $DebugPreference = "Continue" $serverName = $env:COMPUTERNAME $scriptPath = Split-Path $script:MyInvocation.MyCommand.Path $logConfigPath = "$scriptPath/config.xml" #### VARIABLES RELATING TO THE LOG FILE #contains the log path and log file mask $logPaths = #() $logFileMasks = #() # the total number of lines grabbed from the end of the log file for evaluation $numLinesToTail = 1000 # key value pair for the strings to match and the max count of matches before they are considered an issue $keywords = #() $maxCounts = #() #### VARIABLES RELATING TO THE EMAIL SETTINGS $smtpServer = "mail server" $emailSubject = "$serverName log report" $toEmailAddress = "email accounts" $fromEmailAddress = "" # any initial content you want in the email body should go here (e.g. the name of the server that this is on) $htmlBodyContent = "<p><h3>SERVER $serverName : </h3></p><p>Items that appear in red have exceeded their match threshold and should be investigated.<br/>Tail Lines: $numLinesToTail</p>" #### FUNCTION CALLS LoadLogTailerConfig $logConfigPath ([ref]$logPaths) ([ref]$logFileMasks) ([ref]$keywords) ([ref]$maxCounts) for ($i = 0; $i -lt $logPaths.Count; $i++) { $tail = GetLogTail $numLinesToTail $logPaths[$i] $logFileMasks[$i] $tailIssueTable = CheckForKeywords $tail $keywords[$i] $maxCounts[$i] if ($tailIssueTable -ne "") { $htmlBodyContent += "<br/>Logs scanned: " + (GetLatestLogFileFullName $logPaths[$i] $logFileMasks[$i]) + "<br/><br/>" + $tailIssueTable SendIssueEmail $smtpServer $emailSubject $toEmailAddress $ccEmailAddress $fromEmailAddress $htmlBodyContent } } } # Loads in configuration data for the utility to use function LoadLogTailerConfig($logConfigPath, [ref]$logPaths, [ref]$logFileMasks, [ref]$keywords, [ref]$maxCounts) { Write-Debug "Loading config file data from $logConfigPath" [xml]$configData = Get-Content $logConfigPath foreach ($log in $configData.Logs.Log) { $logPaths.Value += $log.FilePath $logFileMasks.Value += $log.FileMask $kwp = #() $kwc = #() foreach ($keywordSet in $log.Keywords.Keyword) { $kwp += $keywordSet.Pattern $kwc += $keywordSet.MaxMatches } $keywords.Value += #(,$kwp) $maxCounts.Value += #(,$kwc) } } # Gets a string containing the last X lines of the most recent log file function GetLogTail($numLinesToTail, $logPath, $logFileMask) { $logFile = GetLatestLogFileFullName $logPath $logFileMask #Get-ChildItem $logPath -Filter $logFileMask | sort LastWriteTime | select -Last 1 Write-Debug "Getting $numLinesToTail line tail of $logFile" $tail = Get-Content "$logFile" | select -Last $numLinesToTail return $tail } function GetLatestLogFileFullName($logPath, $logFileMask) { $logFile = Get-ChildItem $logPath -Filter $logFileMask | sort LastWriteTime | select -Last 1 return "$logPath$logFile" } # Returns body text for email containing details on keywords in the log file and their frequency function CheckForKeywords($tail, $keywords, $maxCounts) { $issuesFound = 0 $htmlBodyContent += "<table><tr><th style=""text-align : left;"">Keyword</th><th>Max Count Value</th><th>Count Total<th></tr>" for ($i = 0; $i -lt $keywords.Count; $i++) { $keywordCount = ($tail | Select-String $keywords[$i] -AllMatches).Matches.Count Write-Debug (("Match count for {0} : {1}" -f $keywords[$i], $keywordCount)) if ($keywordCount -gt $maxCounts[$i]) { # style red if the count threshold has been exceeded $htmlBodyContent += "<tr style=""color : red;""><td>" + $keywords[$i] + "</td><td>" + $maxCounts[$i] + "</td><td>" + $keywordCount + "</td></tr>" $issuesFound = 1 } else { # style green if the count threshold has not been exceeded $htmlBodyContent += "<tr style=""color : green;""><td>" + $keywords[$i] + "</td><td>" + $maxCounts[$i] + "</td><td>" + $keywordCount + "</td></tr>" } } $htmlBodyContent += "</table>" if ($issuesFound -eq 1) { return $htmlBodyContent } return "" } # Sends out an email to the specified email address function SendIssueEmail($smtpServer, $subject, $toAddress, $ccAddress, $fromAddress, $bodyContent) { Write-Debug "Sending email with subject: $subject, To: $toAddress, via SMTP ($smtpServer)" Send-MailMessage -SmtpServer $smtpServer -Subject $subject -To $toAddress -From $fromAddress -BodyAsHtml $bodyContent } cls Main And a XML config example: <Logs> <Log> <FilePath>C:/Some/Path</FilePath> <FileMask>log.*</FileMask> <Keywords> <Keyword> <Pattern>NullReferenceException</Pattern> <MaxMatches>10</MaxMatches> </Keyword> <Keyword> <Pattern>Exception</Pattern> <MaxMatches>10</MaxMatches> </Keyword> </Keywords> </Log> <Log> <FilePath>C:/Some/Path</FilePath> <FileMask>test.*</FileMask> <Keywords> <Keyword> <Pattern>NullReferenceException</Pattern> <MaxMatches>100</MaxMatches> </Keyword> </Keywords> </Log> </Logs> EDIT : The server that is having the issues is running Powershell V 1.0, however the test servers are also running the same version perfectly fine...
Your function GetLatestLogFileFullName is one problem. It can and will generate invalid paths. function GetLatestLogFileFullName($logPath, $logFileMask) { $logFile = Get-ChildItem $logPath -Filter $logFileMask | sort LastWriteTime | select -Last 1 return "$logPath$logFile" } Use this instead: return $logfile.FullName And you should also check for cases where there is no valid log file: if ($logfile) { return $logfile.FullName } else { return $null } The second problem will be your Select-String usage. $keywordCount = ($tail | Select-String $keywords[$i] -AllMatches).Matches.Count In PowerShell v1 Select-String does not have -AllMatches parameter. PS> Get-Help Select-String NAME Select-String SYNOPSIS Identifies patterns in strings. SYNTAX Select-String [-pattern] <string[]> -inputObject <psobject>[-include <string[]>] [-exclude <string[]>] [-simpleMatch] [-caseSensitive] [-quiet] [-list] [<CommonParameters>] Select-String [-pattern] <string[]> [-path] <string[]> [-include<string[]>] [-exclude <string[]>] [-simpleMatch] [-caseSensitive] [-quiet] [-list] [<CommonParameters>] Check the PowerShell versions on your servers using the $PSVersionTable variable. Do not rely on the version displayed in the title bar! If the variable does not exist you have Version 1.
Use Powershell to print out line number of code matching a RegEx
I think we have a bunch of commented out code in our source, and rather than delete it immediately, we've just left it. Now I would like to do some cleanup. So assuming that I have a good enough RegEx to find comments (the RegEx below is simple and I could expand on it based on our coding standards), how do I take the results of the file that I read up and output the following: Filename Line Number The actual line of code I think I have the basis of an answer here, but I don't know how to take the file that I've read up and parsed with RegEx and spit it out in this format. I'm not looking for the perfect solution - I just want to find big blocks of commented out code. By looking at the result and seeing a bunch of files with the same name and sequential line numbers, I should be able to do this. $Location = "c:\codeishere" [regex]$Regex = "//.*;" #simple example - Will expand on this... $Files = get-ChildItem $Location -include *cs -recurse foreach ($File in $Files) { $contents = get-Content $File $Regex.Matches($contents) | WHAT GOES HERE? }
You could do: dir c:\codeishere -filter *.cs -recurse | select-string -Pattern '//.*;' | select Line,LineNumber,Filename
gci c:\codeishere *.cs -r | select-string "//.*;" The select-string cmdlet already does exactly what you're asking for, though the filename displayed is a relative path.
I would go personally even further. I would like to compute number of consecutive following lines. Then print the file name, count of lines and the lines itself. You may sort the result by count of lines (candidates for delete?). Note that my code doesn't count with empty lines between commented lines, so this part is considered as two blocks of commented code: // int a = 10; // int b = 20; // DoSomething() // SomethingAgain() Here is my code. $Location = "c:\codeishere" $occurences = get-ChildItem $Location *cs -recurse | select-string '//.*;' $grouped = $occurences | group FileName function Compute([Microsoft.PowerShell.Commands.MatchInfo[]]$lines) { $local:lastLineNum = $null $local:lastLine = $null $local:blocks = #() $local:newBlock = $null $lines | % { if (!$lastLineNum) { # first line $lastLineNum = -2 # some number so that the following if is $true (-2 and lower) } if ($_.LineNumber - $lastLineNum -gt 1) { #new block of commented code if ($newBlock) { $blocks += $newBlock } $newBlock = $null } else { # two consecutive lines of commented code if (!$newBlock) { $newBlock = '' | select File,StartLine,CountOfLines,Lines $newBlock.File, $newBlock.StartLine, $newBlock.CountOfLines, $newBlock.Lines = $_.Filename,($_.LineNumber-1),2, #($lastLine,$_.Line) } else { $newBlock.CountOfLines += 1 $newBlock.Lines += $_.Line } } $lastLineNum=$_.LineNumber $lastLine = $_.Line } if ($newBlock) { $blocks += $newBlock } $blocks } # foreach GroupInfo objects from group cmdlet # get Group collection and compute $result = $grouped | % { Compute $_.Group } #how to print $result | % { write-host "`nFile $($_.File), line $($_.StartLine), count of lines: $($_.CountOfLines)" -foreground Green $_.Lines | % { write-host $_ } } # you may sort it by count of lines: $result2 = $result | sort CountOfLines -desc $result2 | % { write-host "`nFile $($_.File), line $($_.StartLine), count of lines: $($_.CountOfLines)" -foreground Green $_.Lines | % { write-host $_ } } If you have any idea how to improve the code, post it! I have a feeling that I could do it using some standard cmdlets and the code could be shorter..
I would look at doing something like: dir $location -inc *.cs -rec | ` %{ $file = $_; $n = 0; get-content $_ } | ` %{ $_.FileName = $file; $_.Line = ++$n; $_ } | ` ?{ $_ -match $regex } | ` %{ "{0}:{1}: {2}" -f ($_.FileName, $_.Line, $_)} I.e. add extra properties to the string to specify the filename and line number, which can be carried through the pipeline after the regex match. (Using ForEach-Object's -begin/-end script blocks should be able to simplify this.)