how use break statement on powershell "switch -regex -file"? - regex

i'm using the next function to parse one ini file.
I have one problem that i don't know how to solve:
If one match is found, break loop and read next line.
I know that i can put break to do this. But, break exit of all switch code. ¿?
I suppose it's a special scenario when using switch with regex to parse a file.
¿How can i do to not parse same line in two code blocks?
See in the output:
`Comment ; Parameters that are strings are specified parameter=string, like this:`
`Key ; Parameters that are strings are specified parameter=string, like this:`
and
`Comment ; Parameters that are numbers are specified parameter=number, like this:`
`Key ; Parameters that are numbers are specified parameter=number, like this:`
Thanks
function Get-IniContent ($filePath)
{
$ini = #{}
switch -regex -file $FilePath
{
"^\[(.+)\]$" # Section
{
$section = $matches[1]
$ini[$section] = #{}
$CommentCount = 0
Write-Host "Section $section"
}
"^(;.*)$" # Comment
{
if (!($section))
{
$section = "No-Section"
$ini[$section] = #{}
}
$value = $matches[1]
$CommentCount = $CommentCount + 1
$name = "Comment" + $CommentCount
$ini[$section][$name] = $value
Write-Host "Comment $value"
}
"(.+?)\s*=\s*(.*)" # Key
{
if (!($section))
{
$section = "No-Section"
$ini[$section] = #{}
}
$name,$value = $matches[1..2]
$ini[$section][$name] = $value
Write-Host "Key $name=$value"
}
default
{
# Next line causes NullArray error
#$line=$matches[1]
Write-Host "Strange line $line"
}
}
return $ini
}
$iniFile=Get-IniContent (Join-Path (Split-Path -parent $MyInvocation.MyCommand.Path) "test.config")
This it's a sample ini file:
Independent Bad Line
timer=19
[Common]
; 4 - default value hard-coded in ntfs-hardlink-backup.ps1 script
; Blank lines also do no harm
; Parameters that are strings are specified parameter=string, like this:
backupDestination=X:\Backup
; Parameters that are numbers are specified parameter=number, like this:
backupsToKeep=20
[server-01.mycompany.example.org]
; Parameters that are specific to a particular server/computer go in a section for that computer.
; The section name is the fully-qualified domain name (FQDN) of the computer.
backupSources=D:\Shares\Admin,E:\Shares\Finance,E:\Shares\ICT,D:\Shares\Users
:Bad Line
=10 ; Also Bad Line
And this it's script output:
E:\Config\NtfsBackup>powershell -ExecutionPolicy unrestricted -file "E:\Config\NtfsBackup\nt1.ps1"
Strange line
Key timer=19
Section Common
Comment ; 4 - default value hard-coded in ntfs-hardlink-backup.ps1 script
Strange line
Comment ; Blank lines also do no harm
Strange line
Comment ; Parameters that are strings are specified parameter=string, like this:
Key ; Parameters that are strings are specified parameter=string, like this:
Key backupDestination=X:\Backup
Comment ; Parameters that are numbers are specified parameter=number, like this:
Key ; Parameters that are numbers are specified parameter=number, like this:
Key backupsToKeep=20
Strange line
Section server-01.mycompany.example.org
Comment ; Parameters that are specific to a particular server/computer go in a section for that computer.
Comment ; The section name is the fully-qualified domain name (FQDN) of the computer.
Key backupSources=D:\Shares\Admin,E:\Shares\Finance,E:\Shares\ICT,D:\Shares\Users
Strange line
Strange line

The answer is simple. You do not want to use Break but instead use Continue which will stop the loop for the current item and start the loop for the next item. Placing Continue at the end of all of your script blocks (probably on a new line just after the Write-Host line) for the switch will do what you need.
function Get-IniContent ($filePath)
{
$ini = #{}
switch -regex -file $FilePath
{
"^\[(.+)\]$" # Section
{
$section = $matches[1]
$ini[$section] = #{}
$CommentCount = 0
Write-Host "Section $section"
Continue
}
"^(;.*)$" # Comment
{
if (!($section))
{
$section = "No-Section"
$ini[$section] = #{}
}
$value = $matches[1]
$CommentCount = $CommentCount + 1
$name = "Comment" + $CommentCount
$ini[$section][$name] = $value
Write-Host "Comment $value"
Continue
}
"(.+?)\s*=\s*(.*)" # Key
{
if (!($section))
{
$section = "No-Section"
$ini[$section] = #{}
}
$name,$value = $matches[1..2]
$ini[$section][$name] = $value
Write-Host "Key $name=$value"
Continue
}
default
{
# Next line causes NullArray error
#$line=$matches[1]
Write-Host "Strange line $line"
}
}
return $ini
}
$iniFile=Get-IniContent (Join-Path (Split-Path -parent $MyInvocation.MyCommand.Path) "test.config")

Related

Powershell script using RegEx to look for a pattern in one .txt and find line in a second .txt

I have a real "headsmasher" on my plate.
I have this piece of script:
$lines = Select-String -List -Path $sourceFile -Pattern $pattern -Context 20
foreach ($id in $lines) {
if (Select-String -Quiet -LiteralPath export.txt -Pattern "$($Matches[1]).+$($id.Pattern)") {
}
else {
Select-String -Path $sourceFile -Pattern $pattern -Context 20 >> $duplicateTransactionsFile
}
}
but it is not working for me as I wanted it to.
I have two .txt files: "$sourcefile = source.txt" and "export.txt"
The source.txt looks like something like this:
Some text here ***********
------------------------------------------------
F I N A L C O U N T 1 9 , 9 9
**************
** [0000123456]
ID Number:0000123456
Complete!
****************!
***********
Some other text here*******
------------------------------------------------
F I N A L C O U N T 9 , 9 9
**********
** [0000789000]
ID Number:0000789000
Complete!
******************!
************
The export.txt is like this:
0000123456 19,99
0000555555 ,89
0000666666 3,05
0000777777 31,19
0000789000 9,99
What I am trying to do is look into source.txt and search for the number that I enter (spaced out in my case)
*e.g: "9,99" but only that. As you can see, the next number in the source.txt is "19,99" and it also contains "9,99" but I do not want it to be matched.
and once I find the number, look for the next line in the source.txt that contains the text "ID Number:" then get the numbers right after the ":" Once I get those numbers after the ":", I want to now look into the export.txt and see if the numbers after the ":" are there and whether it has the "9,99" on the same line next to it but exactly "9,99" and nothing else lie "19,99", "29,99", and so on.
Then the rest is easy:
if (*true*) {
do this
}
else {
do that
}
Could you guys give me some love here and help a brother out?
I very much appreciate any help or hint you could share.
Best of wishes!
You could approach this like below:
# read the export.txt file and convert to a Hashtable for fast lookup
$export = ((Get-Content -Path 'D:\Test\export.txt').Trim() -replace '\s+', '=') -join "`r`n" | ConvertFrom-StringData
# read the source file and split into multiline data blocks
$source = ((Get-Content -Path 'D:\Test\source.txt' -Raw) -split '-{2,}').Trim() | Where-Object { $_ -match '(?sm)^\s?F I N A L C O U N T' }
# make sure the number given is spaced-out
$search = (((Read-Host "Search for Final Count number") -replace '\s' -split '') -join ' ').Trim()
Write-Host "Looking for a matching item using Final Count '$search'"
# see if we can find a data block that matches the $search
$blocks = $source | Where-Object { $_ -match "(?sm)^F I N A L C O U N T\s+$search\s?$" }
if (!$blocks) {
Write-Host "No item in source.txt could be found with Final Count '$search'" -ForegroundColor Red
}
else {
# loop over the data block(s) and pick the one that matches the search count
$blocks | ForEach-Object {
# parse out the ID
$id = $_ -replace '(?sm).*ID Number:(\d+).*', '$1'
# check if the $export Hashtable contains a key with that ID number
if ($export.Contains($id)) {
# check if that item has a value of $search without the spaces
if ($export[$id] -eq ($search -replace '\s')) {
# found it; do something
Write-Host "Found a match in the export.txt" -ForegroundColor Green
}
else {
# found ID with different FinalCount
Write-Host "An item with ID '$id' was found, but with different Final Count ($($export[$id]))" -ForegroundColor Red
}
}
else {
# ID not found
Write-Host "No item with ID '$id' could be found in the export.txt" -ForegroundColor Red
}
}
}
If as per your comment, you would like the code to loop over the Final Count numbers found in the source.txt file instead of a user typing in a number to search for, you can shorten the above code to:
# read the export.txt file and convert to a Hashtable for fast lookup
$export = ((Get-Content -Path 'D:\Test\export.txt').Trim() -replace '\s+', '=') -join "`r`n" | ConvertFrom-StringData
# read the source file and split into multiline data blocks
$blocks = ((Get-Content -Path 'D:\Test\source.txt' -Raw) -split '-{2,}').Trim() |
Where-Object { $_ -match '(?sm)^\s?F I N A L C O U N T' }
if (!$blocks) {
Write-Host "No item in source.txt could be found with Final Count '$search'" -ForegroundColor Red
}
else {
# loop over the data block(s)
$blocks | ForEach-Object {
# parse out the FINAL COUNT number to look for in the export.txt
$search = ([regex]'(?sm)^F I N A L C O U N T\s+([\d,\s]+)$').Match($_).Groups[1].Value
# remove the spaces, surrounding '0' and trailing comma (if any)
$search = ($search -replace '\s').Trim('0').TrimEnd(',')
Write-Host "Looking for a matching item using Final Count '$search'"
# parse out the ID
$id = $_ -replace '(?sm).*ID Number:(\d+).*', '$1'
# check if the $export Hashtable contains a key with that ID number
if ($export.Contains($id)) {
# check if that item has a value of $search without the spaces
if ($export[$id] -eq $search) {
# found it; do something
Write-Host "Found a match in the export.txt with ID: $($export[$id])" -ForegroundColor Green
}
else {
# found ID with different FinalCount
Write-Host "An item with ID '$id' was found, but with different Final Count ($($export[$id]))" -ForegroundColor Red
}
}
else {
# ID not found
Write-Host "No item with ID '$id' could be found in the export.txt" -ForegroundColor Red
}
}
}
There are surely multiple valid ways to accomplish this. Here is my approach:
(See comments for explanations. Let me know if you have any questions)
param (
# You can provide this when calling the script using "-Search 9,99"
# If not privided, powershell will prompt to enter the value
[Parameter(Mandatory)]
$Search,
$Source = "source.txt",
$Export = "export.txt"
)
# insert spaces
$pattern = $Search.ToCharArray() -join " "
# Search for the value in the source file.
$found = $false
switch -Regex -File $Source {
# This regex looks for something that is not a number,
# followed by only whitespace, and then your (spaced) search value.
# This makes sure "19,99" is not matched with "9,99".
# You could use a more elaborate regex here, but for your example,
# this one should work fine.
"\D\s+$pattern" {
$found = $true
}
"ID Number:(\d+)" {
# Get the ID number from the match.
$id = $Matches[1]
# If the search value was found
# (that means, this ID number is immediately followed by the search value)
# we can stop looking.
if ($found) {
break
}
}
}
# quick check if the value was actually found
if (-not $found) {
throw "Value $Search not found in $Source."
}
# Search for the id in the export file.
switch -Regex -File $Export {
"$id\s+(\S+)" {
# Get the amount value from the match
$value = $Matches[1]
# If the value matches your search...
if ($value -eq $search) {
# do this
}
else {
# otherwise do that
}
break
}
}
Note: You could additionally convert the values to decimal to account for different text representations when searching and comparing.

How to use a conditional statement with regex in PowerShell?

There are about ten lines of data. For each line of data I want to indicate whether that line contains numerals.
How can I print out "yes, this line has numerals" or "no, this line has no numerals" for each and every line, exactly once?
output:
thufir#dur:~/flwor/csv$
thufir#dur:~/flwor/csv$ pwsh import.ps1
no digits
Name
----
people…
thufir#dur:~/flwor/csv$
code:
$text = Get-Content -Raw ./people.csv
[array]::Reverse($text)
$tempAttributes = #()
$collectionOfPeople = #()
ForEach ($line in $text) {
if($line -notmatch '.*?[0-9].*?') {
$tempAttributes += $line
Write-Host "matches digits"
}
else {
Write-Host "no digits"
$newPerson = [PSCustomObject]#{
Name = $line
Attributes = $tempAttributes
}
$tempAttributes = #()
$collectionOfPeople += $newPerson
}
}
$collectionOfPeople
data:
people
joe
phone1
phone2
phone3
sue
cell4
home5
alice
atrib6
x7
y9
z10
The only reason I'm printing "digits" or "no digits" is as a marker to aid in building the object.
You can use the following:
switch -regex -file people.csv {
'\d' { "yes" ; $_ }
default { "no"; $_ }
}
\d is a regex character matching a digit. A switch statement with -regex allows for regex expressions to be used for matching text. The default condition is picked when no other condition is met. $_ is the current line being processed.
switch is generally faster than Get-Content for line by line processing. Since you do want to perform certain actions per line, you likely don’t want to use the -Raw parameter because that will read in all file contents as one single string.
# For Reverse Output
$output = switch -regex -file people.csv {
'\d' { "yes" ; $_ }
default { "no"; $_ }
}
$output[($output.GetUpperBound(0))..0)]

Skip Header Row in a High Performance Powershell Regex Script Block

I received some amazing help from Stack Overflow ... however ... it was so amazing I need a little more help to get to closer to the finish line. I'm parsing multiple enormous 4GB files 2X per month. I need be able to be able to skip the header, count the total lines, matched lines, and the not matched lines. I'm sure this is super-simple for a PowerShell superstar, but at my newbie PS level my skills are not yet strong. Perhaps a little help from you would save the week. :)
Data Sample:
ID FIRST_NAME LAST_NAME COLUMN_NM_TOO_LON5THCOLUMN
10000000001MINNIE MOUSE COLUMN VALUE LONGSTARTS
10000000002MICKLE ROONEY MOUSE COLUMN VALUE LONGSTARTS
Code Block (based on this answer):
#$match_regex matches each fixed length field by length; the () specifies that each matched field be stored in a capture group:
[regex]$match_regex = '^(.{10})(.{50})(.{50})(.{50})(.{50})(.{3})(.{8})(.{4})(.{50})(.{2})(.{30})(.{6})(.{3})(.{4})(.{25})(.{2})(.{10})(.{3})(.{8})(.{4})(.{50})(.{2})(.{30})(.{6})(.{3})(.{2})(.{25})(.{2})(.{10})(.{3})(.{10})(.{10})(.{10})(.{2})(.{10})(.{50})(.{50})(.{50})(.{50})(.{8})(.{4})(.{50})(.{2})(.{30})(.{6})(.{3})(.{2})(.{25})(.{2})(.{10})(.{3})(.{4})(.{2})(.{4})(.{10})(.{38})(.{38})(.{15})(.{1})(.{10})(.{2})(.{10})(.{10})(.{10})(.{10})(.{38})(.{38})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})$'
Measure-Command {
& {
switch -File $infile -Regex {
$match_regex {
# Join what all the capture groups matched with a tab char.
$Matches[1..($Matches.Count-1)].Trim() -join "`t"
}
}
} | Out-File $outFile
}
You only need to keep track of two counts - matched, and unmatched lines - and then a Boolean to indicate whether you've skipped the first line
$first = $false
$matched = 0
$unmatched = 0
. {
switch -File $infile -Regex {
$match_regex {
if($first){
# Join what all the capture groups matched with a tab char.
$Matches[1..($Matches.Count-1)].Trim() -join "`t"
$matched++
}
$first = $true
}
default{
$unmatched++
# you can remove this, if the pattern always matches the header
$first = $true
}
}
} | Out-File $outFile
$total = $matched + $unmatched
Using System.IO.StreamReader reduced the processing time to about 20% of what it had been. This was absolutely needed for my requirement.
I added logic and counters without sacrificing much on performance. The field counter and row by row comparison is particularly helpful in finding bad records.
This is a copy/paste of actual code but I shortened some things, made some things slightly pseudo code, so you may have to play with it to get things working just so for yourself.
Function Get-Regx-Data-Format() {
Param ([String] $filename)
if ($filename -eq 'FILE NAME') {
[regex]$match_regex = '^(.{10})(.{10})(.{10})(.{30})(.{30})(.{30})(.{4})(.{1})'
}
return $match_regex
}
Foreach ($file in $cutoff_files) {
$starttime_for_file = (Get-Date)
$source_file = $file + '_' + $proc_yyyymm + $source_file_suffix
$source_path = $source_dir + $source_file
$parse_file = $file + '_' + $proc_yyyymm + '_load' +$parse_target_suffix
$parse_file_path = $parse_target_dir + $parse_file
$error_file = $file + '_err_' + $proc_yyyymm + $error_target_suffix
$error_file_path = $error_target_dir + $error_file
[regex]$match_data_regex = Get-Regx-Data-Format $file
Remove-Item -path "$parse_file_path" -Force -ErrorAction SilentlyContinue
Remove-Item -path "$error_file_path" -Force -ErrorAction SilentlyContinue
[long]$matched_cnt = 0
[long]$unmatched_cnt = 0
[long]$loop_counter = 0
[boolean]$has_header_row=$true
[int]$field_cnt=0
[int]$previous_field_cnt=0
[int]$array_length=0
$parse_minutes = Measure-Command {
try {
$stream_log = [System.IO.StreamReader]::new($source_path)
$stream_in = [System.IO.StreamReader]::new($source_path)
$stream_out = [System.IO.StreamWriter]::new($parse_file_path)
$stream_err = [System.IO.StreamWriter]::new($error_file_path)
while ($line = $stream_in.ReadLine()) {
if ($line -match $match_data_regex) {
#if matched and it's the header, parse and write to the beg of output file
if (($loop_counter -eq 0) -and $has_header_row) {
$stream_out.WriteLine(($Matches[1..($array_length)].Trim() -join "`t"))
} else {
$previous_field_cnt = $field_cnt
#add year month to line start, trim and join every captured field w/tabs
$stream_out.WriteLine("$proc_yyyymm`t" + `
($Matches[1..($array_length)].Trim() -join "`t"))
$matched_cnt++
$field_cnt=$Matches.Count
if (($previous_field_cnt -ne $field_cnt) -and $loop_counter -gt 1) {
write-host "`nError on line $($loop_counter + 1). `
The field count does not match the previous correctly `
formatted (non-error) row."
}
}
} else {
if (($loop_counter -eq 0) -and $has_header_row) {
#if the header, write to the beginning of the output file
$stream_out.WriteLine($line)
} else {
$stream_err.WriteLine($line)
$unmatched_cnt++
}
}
$loop_counter++
}
} finally {
$stream_in.Dispose()
$stream_out.Dispose()
$stream_err.Dispose()
$stream_log.Dispose()
}
} | Select-Object -Property TotalMinutes
write-host "`n$file_list_idx. File $file parsing results....`nMatched Count =
$matched_cnt UnMatched Count = $unmatched_cnt Parse Minutes = $parse_minutes`n"
$file_list_idx++
$endtime_for_file = (Get-Date)
write-host "`nEnded processing file at $endtime_for_file"
$TimeDiff_for_file = (New-TimeSpan $starttime_for_file $endtime_for_file)
$Hrs_for_file = $TimeDiff_for_file.Hours
$Mins_for_file = $TimeDiff_for_file.Minutes
$Secs_for_file = $TimeDiff_for_file.Seconds
write-host "`nElapsed Time for file $file processing:
$Hrs_for_file`:$Mins_for_file`:$Secs_for_file"
}
$endtime = (Get-Date -format "HH:mm:ss")
$TimeDiff = (New-TimeSpan $starttime $endtime)
$Hrs = $TimeDiff.Hours
$Mins = $TimeDiff.Minutes
$Secs = $TimeDiff.Seconds
write-host "`nTotal Elapsed Time: $Hrs`:$Mins`:$Secs"

Use Powershell to comment out a 'codeblock' in a text file?

I'm trying to comment out some code in a massive amount of files
The files all contain something along the lines of:
stage('inrichting'){
steps{
build job: 'SOMENAME', parameters: param
build job: 'SOMEOTHERNAME', parameters: param
echo 'TEXT'
}
}
The things within the steps{ } is variable, but always exists out of 0..N 'echo' and 0..N 'build job'
I need an output like:
//stage('inrichting'){
// steps{
// build job: 'SOMENAME', parameters: param
// build job: 'SOMEOTHERNAME', parameters: param
// echo 'TEXT'
// }
//}
Is there any good way to do this with PowerShell? I tried some stuff with pattern.replace but didn't get very far.
$list = Get-ChildItem -Path 'C:\Program Files (x86)\Jenkins\jobs' -Filter config.xml -Recurse -ErrorAction SilentlyContinue -Force | % { $_.fullname };
foreach ($item in $list) {
...
}
This is a bit tricky, as you're trying to find that whole section, and then add comment markers to all lines in it. I'd probably write an ad-hoc parser with switch -regex if your structure allows for it (counting braces may make things more robust, but is also a bit harder to get right for all cases). If the code is regular enough you can perhaps reduce it to the following:
stage('inrichting'){
steps{
... some amount of lines that don't contain braces
}
}
and we can then check for occurrence of the two fixed lines at the start and eventually two lines with closing braces:
foreach ($file in $list) {
# lines of the file
$lines = Get-Content $file
# line numbers to comment out
$linesToComment = #()
# line number of the current block to comment
$currentStart = -1
# the number of closing braces on single lines we've encountered for the current block
$closingBraces = 0
for ($l = 0; $l -le $lines.Count; $l++) {
switch -regex ($lines[$l]) {
'^\s*stage\('inrichting'\)\{' {
# found the first line we're looking for
$currentStart = $l
}
'^\s*steps\{' {
# found the second line, it may not belong to the same block, so reset if needed
if ($l -ne $currentStart + 1) { $currentStart = -1 }
}
'^\s*}' {
# only count braces if we're at the correct point
if ($currentStart -ne -1) { $closingBraces++ }
if ($closingBraces -eq 2) {
# we've reached the end, add the range to the lines to comment out
$linesToComment += $currentStart..$l
$currentStart = -1
$closingBraces = 0
}
}
}
}
$commentedLines = 0..($lines.Count-1) | % {
if ($linesToComment -contains $_) {
'//' + $lines[$_]
} else {
$lines[$_]
}
} | Set-Content $file
}
Untested, but the general idea might work.
Update: fixed and tested

How do I match ini file keys that may not contain an equal sign?

I am using the following Powershell code (modified version of https://gallery.technet.microsoft.com/scriptcenter/ea40c1ef-c856-434b-b8fb-ebd7a76e8d91) to parse an ini file:
$ini = #{}
$lastSection = ""
switch -regex -file $FilePath
{
"^\[(.+)\]$" # Section
{
$section = $matches[1]
$ini[$section] = #{}
$CommentCount = 0
$lastSection = $section
Continue
}
"^(;.*)$" # Comment
{
$section = "Comments"
if ($ini[$section] -eq $null)
{
$ini[$section] = #{}
}
$value = $matches[1]
$CommentCount = $CommentCount + 1
$name = "Comment" + $CommentCount
$ini[$section][$name] = $value
$section = $lastSection
Continue
}
"(.+?)\s*=\s*(.*)" # Key
{
if (!($section))
{
$section = "No-Section"
$ini[$section] = #{}
}
$name,$value = $matches[1..2]
$ini[$section][$name] = $value
Continue
}
"([A-Z])\w+\s+" # Key
{
if (!($section))
{
$section = "No-Section"
$ini[$section] = #{}
}
$value = $matches[1]
$ini[$section][$value] = $value
}
}
Ini files that I deal with can contain keys that have an equal sign, and some that do not. For example:
[Cipher]
OpenSSL
[SSL]
CertFile=file.crt
The switch statement correctly matches the CertFile=file.crt line and I was hoping that the last "([A-Z])\w+\s+" condition would catch the OpenSSL line. However it does not, and I have not been able to figure out what regex I can use to catch those lines where the key does not contain an equal sign.
The problem is that you're trying to match at least one whitespace character with \s+
You could use part of the regex you already have for matching the lines with =.
"(.+?)\s*"
Consider anchoring your strings too in order the match the full line so
it becomes "^(.+?)\s*$"