Approach guidance: grepping a string between multiple, slightly different delimiters - regex

Assume a 3Kb file that looks like this:
PdId1 Unit 1
Model 3244
Status: OK
Advanced Status OK
-----------------------
No errors found
Statistics...
...<arbitrary length values here>...
PdId2 Unit 1
Model 3222
Status: OK
Advanced Status OK
-----------------------
Error Log is as follows <arbitrary values here>
PdId3 Unit 1
Model 3243
Status: OK
Advanced Status OK
-----------------------
No errors found
So we can be certain that PdIdn can reliably used as a delimiter, that it's always at the start of a line and that it's always trailing a numebr. I want to parse the text between the delimiter for "No errors found" and if the string is missing, grab the delimiter and the next four lines (grep -A4), glue on an error message and echo the result.
I've been wracking my brain about how to approach this. I'm most comfortable in Bash with grep, but I don't think grep's going to cut it here. I've looked at using split to break the file into pieces, but this seems messy and hard to clean up after processing is done. I started to try to write something in awk / sed, but I don't understand how to split on the delimiters, then go back and parse each result, then break off the next piece and parse that.
I apologise for the general nature of this question, but I'm stumped and could use some guidance.
Edit: Technically, PdId isn't a delimiter as much as it's the start of the next record. The number of records is arbitrary.
Edit: We've now got real world data to work with:
-------------------------------------------------------------------------------
PdId: 1
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 16/41 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 251) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 71
3 Spin_Up_Time 0x0007 169 169 024 Pre-fail Always
- 245 (Average 204)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 746
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1181
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 751
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 751
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 16/41)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 2
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 16/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 246) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 72
3 Spin_Up_Time 0x0007 171 171 024 Pre-fail Always
- 243 (Average 201)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 746
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1181
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 749
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 749
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 16/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 3
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 17/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 241) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 140 140 054 Pre-fail Offline
- 67
3 Spin_Up_Time 0x0007 170 170 024 Pre-fail Always
- 234 (Average 213)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 748
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1188
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 750
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 750
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 17/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 4
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 15/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 254) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
ATA Error Count: 165 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 165 occurred at disk power-on lifetime: 1176 hours (49 days + 0 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 50 b0 ee 81 0d
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 80 a8 80 ee 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 a0 00 ee 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 98 80 ed 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 90 00 ed 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 88 80 ec 81 40 00 18:38:48.275 WRITE FPDMA QUEUED
Error 164 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 10 f0 ad 6b 0d Error: ICRC, ABRT 16 sectors at LBA = 0x0d6badf0 = 225160688
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
35 00 80 80 ad 6b 40 00 18:36:07.145 WRITE DMA EXT
35 00 80 00 ae 6b 40 00 18:36:07.144 WRITE DMA EXT
35 00 80 00 ad 6b 40 00 18:36:07.144 WRITE DMA EXT
35 00 80 80 ab 6b 40 00 18:36:07.139 WRITE DMA EXT
35 00 80 00 ab 6b 40 00 18:36:07.139 WRITE DMA EXT
Error 163 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 f0 10 5e 5d 0d Error: ICRC, ABRT 240 sectors at LBA = 0x0d5d5e10 = 224222736
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
35 00 80 80 5b 5d 40 00 18:35:47.982 WRITE DMA EXT
35 00 80 80 5a 5d 40 00 18:35:47.982 WRITE DMA EXT
35 00 80 00 59 5d 40 00 18:35:47.981 WRITE DMA EXT
35 00 00 00 58 5d 40 00 18:35:47.979 WRITE DMA EXT
35 00 30 00 36 5d 40 00 18:35:47.960 WRITE DMA EXT
Error 162 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 20 e0 33 19 0d
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 80 30 00 33 19 40 00 18:34:50.672 WRITE FPDMA QUEUED
61 80 28 80 33 19 40 00 18:34:50.671 WRITE FPDMA QUEUED
61 80 20 00 34 19 40 00 18:34:50.671 WRITE FPDMA QUEUED
61 00 18 80 34 19 40 00 18:34:50.671 WRITE FPDMA QUEUED
61 80 10 80 36 19 40 00 18:34:50.670 WRITE FPDMA QUEUED
Error 161 occurred at disk power-on lifetime: 1133 hours (47 days + 5 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 d0 30 dd 3b 0a
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 80 38 80 dc 3b 40 00 06:26:51.414 WRITE FPDMA QUEUED
61 80 30 00 df 3b 40 00 06:26:51.413 WRITE FPDMA QUEUED
61 80 28 80 df 3b 40 00 06:26:51.413 WRITE FPDMA QUEUED
61 80 20 00 da 3b 40 00 06:26:51.402 WRITE FPDMA QUEUED
61 80 18 80 da 3b 40 00 06:26:51.402 WRITE FPDMA QUEUED
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 73
3 Spin_Up_Time 0x0007 170 170 024 Pre-fail Always
- 234 (Average 212)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 747
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1187
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 748
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 748
194 Temperature_Celsius 0x0002 200 200 000 Old_age Always
- 30 (Lifetime Min/Max 15/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 165
-------------------------------------------------------------------------------
PdId: 5
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 17/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 251) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 140 140 054 Pre-fail Offline
- 68
3 Spin_Up_Time 0x0007 133 133 024 Pre-fail Always
- 289 (Average 282)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 748
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1186
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 750
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 750
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 17/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 6
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 30 Celsius
Power Cycle Min/Max Temperature: 27/30 Celsius
Lifetime Min/Max Temperature: 17/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 243) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 72
3 Spin_Up_Time 0x0007 130 130 024 Pre-fail Always
- 294 (Average 287)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 748
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1186
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 751
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 751
194 Temperature_Celsius 0x0002 200 200 000 Old_age Always
- 30 (Lifetime Min/Max 17/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0

Read line by line, collect into an accumulator the lines you want, print the accumulated lines when you see the trigger message (otherwise just start over and overwrite the accumulator when you see the start of the next record).
We use a as the accumulator and the helper variable n to keep track of how many lines to accumulate:
awk '/^PdId: [1-9][0-9]*/ { a=$0; n=4; next }
n { --n; a=a "\n" $ 0; next }
/No Errors Logged/ { print a }' file

Put the following into an executable awk file:
#!/usr/bin/awk -f
BEGIN {no_errs=1}
c > 0 {a[c++]=$0}
/^----------/ {
logAnyErrors()
ata_err=""
no_errs=0
c=1
delete a
}
/^No Errors Logged/ {no_errs=1}
/^ATA Error Count:/ {ata_err=$0}
function logAnyErrors() {
if( ata_err!="" || !no_errs) {
for(i=1;i<=5;i++) print a[i]
if( ata_err!="" ) print ata_err
print "--" # separator
}
}
END { logAnyErrors() }
Your data actually has a delimiter of "^------------------"... before each PdId.
The breakdown:
Start off assuming no errors in the BEGIN block
add each record line to an array called a with a line counter c
Whenever a new record section occurs call logAnyErrors() and reset counters
In logAnyErrors(), if there are ATA or other errors, print the first 5 lines of the record and a delimiter similar to what I think grep -A4 would output.
At the end, log any errors in the final record.
When I put this into an executable file called awko and run like awko data I get the following output:
PdId: 4
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
ATA Error Count: 165 (device log contains only the most recent five errors)
----------------
It's possible that the delete a line is non-conforming for some awks. Works on my mac. It's not necessary unless you want to print out more information in each block when errors occur(since the first 5 lines will always be overwritten).

Related

SAS warning: complete separation of data points. The maximum likelihood estimate does not exist

I did loggistic regression in SAS using the database shown below but I got several warnings. I tried to identify the outliers and exclude them then test for multicolinearity but still I am getting warnings.
Any advice will be greatly appreciated.
**********************************;
************** database **********;
***********************************;
data D_BP;
input BP Age Weight BSA Dur Pulse Stress;
datalines;
0 47 85.4 1.75 5.1 63 33
0 51 89.4 1.89 7 72 95
0 47 90.9 1.9 6.2 66 8
0 49 89.2 1.83 7.1 69 62
0 48 92.7 2.07 5.6 64 35
0 47 94.4 2.07 5.3 74 90
0 50 95 2.05 10.2 68 47
0 45 87.1 1.92 5.6 67 80
0 46 94.5 1.98 7.4 69 95
0 46 87 1.87 3.6 62 18
0 46 94.5 1.9 4.3 70 12
0 48 90.5 1.88 9 71 99
1 49 94.2 2.1 3.8 70 14
1 49 95.3 1.98 8.2 72 10
1 50 94.7 2.01 5.8 73 99
1 48 99.5 2.25 9.3 71 10
1 49 99.8 2.25 2.5 69 42
1 49 94.1 1.98 5.6 71 21
1 52 101.3 2.19 10 76 98
1 56 95.7 2.09 7 75 99
;
run;
****** do logistic regression **********;
Proc logistic data=work.D_bp;
Model BP=Age Weight BSA Dur Pulse Stress;
Run;
**** identify outlier *********;
proc reg data=work.D_bp plots(only
label)=(RStudentByLeverage CooksD);
model BP=Age Weight BSA Dur Pulse Stress ;
run;
**** After removing outliers ==> assess multicollinearity*********;
**** assessing multicollinearity by 2 ways *********;
proc corr data=work.D_bp ;
Var Age Weight BSA Dur Pulse Stress;
run;
proc reg data=work.D_bp plots;
Model BP=Age Weight BSA Dur Pulse Stress/Collin vif tol;
run;
****** repeat logistic regression after excluding weight **********;
Proc logistic data=work.D_bp;
Model BP=Age BSA Dur Pulse Stress;
Run;
WARNING: There is a complete separation of data points. The maximum likelihood estimate does
not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are
based on the last maximum likelihood iteration. Validity of the model fit is
questionable.

Looping in SAS to bring the latest value

I am trying to find days matching to a reference number of days given or else to find the number of days close to the reference days.
I coded till here, however not sure how to go forward.
ID Date ref_days lags total_days
1 2017-02-02 224 . 0
1 2017-02-02 224 84 84
1 2017-02-02 224 84 168
2 2015-01-21 213 300 388
3 2016-02-12 560 95 .
3 2016-02-12 560 86 181
3 2016-02-12 560 82 263
3 2016-02-12 560 69 332
3 2016-02-12 560 77 409
So now I want to bring out the last value close to the reference days.
and the next total_days should start from ZERO again to find the next window. How can I do this?
Here is a code that I wrote
data want;
do until (totaldays <= ref_days);
set have;
by ID ref_days notsorted;
if first.id then totaldays=0;
else totaldays+lags;
end;
run;
Required Output:
ID Date ref_days lags total_days
1 2017-02-02 224 . 0
1 2017-02-02 224 84 84
1 2017-02-02 224 84 168
2 2015-01-21 213 300 388
3 2016-02-12 560 95 .
3 2016-02-12 300 86 181
3 2016-02-12 300 82 263
3 2016-02-12 300 69 .
3 2016-02-12 300 77 146
A while ago I did similar to this via Proc sql. It calculates all the distances and takes the closest one. It works with moderate size dataset. Hopefully it is of some use.
proc sql;
select * from
(
select *,
abs(t1.link-t2.link) as dist /*In your case these would be dateVars*/
from test1 t1
left join test2 t2
on 1=1) group by system1 having dist=min(dist);
;
quit;
There was some talk that the left join on 1=1 is a bit silly (as full outter join would suffice, or something.) However this worked for the problem in question.

AWS intermittent slow ssh and GET requests

we have few EC2 server on AWS but one one particular server we are noticing odd issues. First of all, SSH takes tens of seconds but only happens intermittently. Secondly, get request to the server also takes tens of seconds, but again it happens intermittently. We checked the stats on the server and everything looks fine. CPU average is around 7% and 25GB of free ram.
Here is the response from ss -s:
Total: 2553 (kernel 0)
TCP: 12015 (estab 2342, closed 9189, orphaned 478, synrecv 0, timewait 9188/0), ports 0
Transport Total IP IPv6
* 0 - -
RAW 0 0 0
UDP 4 4 0
TCP 2826 2765 61
INET 2830 2769 61
FRAG 0 0 0
And breakdown of current connection status:
1 established)
1 Foreign
6 LISTEN
62 LAST_ACK
136 SYN_RECV
155 CLOSING
251 FIN_WAIT1
1078 FIN_WAIT2
2197 ESTABLISHED
8229 TIME_WAIT
We have ruled out DNS issue, as it happens when we try to access it via it's hostname or the ip address. There are no load balancer in front of that server. We do use Route53 for routing purposes, but I don't see any issue with that.
Using AB to do get requests:
Run #1
Server Software: nginx/1.6.1
Server Hostname: my.hidden.com
Server Port: 443
SSL/TLS Protocol: TLSv1,DHE-RSA-AES256-SHA,2048,256
Document Path: /debug
Document Length: 43 bytes
Concurrency Level: 1
Time taken for tests: 104.415 seconds
Complete requests: 500
Failed requests: 0
Total transferred: 162304 bytes
HTML transferred: 21500 bytes
Requests per second: 4.79 [#/sec] (mean)
Time per request: 208.829 [ms] (mean)
Time per request: 208.829 [ms] (mean, across all concurrent requests)
Transfer rate: 1.52 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 134 160 35.7 154 709
Processing: 39 49 8.2 48 127
Waiting: 39 49 8.2 48 127
Total: 178 209 38.6 203 809
Percentage of the requests served within a certain time (ms)
50% 203
66% 209
75% 212
80% 215
90% 225
95% 244
98% 273
99% 302
100% 809 (longest request)
Run #2
Server Software: nginx/1.6.1
Server Hostname: my.hidden.com
Server Port: 443
SSL/TLS Protocol: TLSv1,DHE-RSA-AES256-SHA,2048,256
Document Path: /debug
Document Length: 43 bytes
Concurrency Level: 1
Time taken for tests: 515.608 seconds
Complete requests: 500
Failed requests: 0
Total transferred: 162284 bytes
HTML transferred: 21500 bytes
Requests per second: 0.97 [#/sec] (mean)
Time per request: 1031.216 [ms] (mean)
Time per request: 1031.216 [ms] (mean, across all concurrent requests)
Transfer rate: 0.31 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 136 980 1251.8 164 5575
Processing: 39 51 33.0 48 730
Waiting: 39 51 33.0 48 730
Total: 180 1031 1258.7 216 6306
Percentage of the requests served within a certain time (ms)
50% 216
66% 243
75% 2839
80% 2850
90% 2874
95% 2903
98% 2935
99% 2992
100% 6306 (longest request)
Run #3
Server Software: nginx/1.6.1
Server Hostname: my.hidden.com
Server Port: 443
SSL/TLS Protocol: TLSv1,DHE-RSA-AES256-SHA,2048,256
Document Path: /debug
Document Length: 43 bytes
Concurrency Level: 1
Time taken for tests: 417.639 seconds
Complete requests: 500
Failed requests: 0
Total transferred: 162320 bytes
HTML transferred: 21500 bytes
Requests per second: 1.20 [#/sec] (mean)
Time per request: 835.279 [ms] (mean)
Time per request: 835.279 [ms] (mean, across all concurrent requests)
Transfer rate: 0.38 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 137 781 1140.0 162 5053
Processing: 38 54 56.6 49 1281
Waiting: 38 54 56.6 49 1281
Total: 179 835 1141.2 213 5104
Percentage of the requests served within a certain time (ms)
50% 213
66% 233
75% 334
80% 2835
90% 2872
95% 2915
98% 3030
99% 3137
100% 5104 (longest request)
Run #4
Server Software: nginx/1.6.1
Server Hostname: my.hidden.com
Server Port: 443
SSL/TLS Protocol: TLSv1,DHE-RSA-AES256-SHA,2048,256
Document Path: /debug
Document Length: 43 bytes
Concurrency Level: 1
Time taken for tests: 104.806 seconds
Complete requests: 500
Failed requests: 0
Total transferred: 162250 bytes
HTML transferred: 21500 bytes
Requests per second: 4.77 [#/sec] (mean)
Time per request: 209.611 [ms] (mean)
Time per request: 209.611 [ms] (mean, across all concurrent requests)
Transfer rate: 1.51 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 135 160 13.8 157 227
Processing: 39 50 8.8 48 159
Waiting: 39 50 8.8 48 159
Total: 179 209 17.7 206 331
Percentage of the requests served within a certain time (ms)
50% 206
66% 212
75% 216
80% 219
90% 230
95% 240
98% 266
99% 275
100% 331 (longest request)
Run #5
Server Software: nginx/1.6.1
Server Hostname: my.hidden.com
Server Port: 443
SSL/TLS Protocol: TLSv1,DHE-RSA-AES256-SHA,2048,256
Document Path: /debug
Document Length: 43 bytes
Concurrency Level: 1
Time taken for tests: 110.983 seconds
Complete requests: 500
Failed requests: 0
Total transferred: 162282 bytes
HTML transferred: 21500 bytes
Requests per second: 4.51 [#/sec] (mean)
Time per request: 221.967 [ms] (mean)
Time per request: 221.967 [ms] (mean, across all concurrent requests)
Transfer rate: 1.43 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 132 170 149.3 159 3460
Processing: 38 52 25.8 49 589
Waiting: 38 52 25.8 49 589
Total: 177 222 152.6 208 3532
Percentage of the requests served within a certain time (ms)
50% 208
66% 217
75% 222
80% 227
90% 240
95% 257
98% 282
99% 336
100% 3532 (longest request)

Why is _beginthreadex failing with ERROR_INVALID_ACCESS?

I have a bunch of QWebViews rendered onto a widget. There comes a point at which I start getting the error QThread::start: Failed to create thread (The access code is invalid.). Looking at the Qt source, it appears that _beginthreadex is returning a null handle and errno is ERROR_INVALID_ACCESS, but I have no idea why.
Here is the backtrace at the printing of the error:
0 qErrnoWarning qglobal.cpp 2451 0x69ccdd3c
1 QThread::start qthread_win.cpp 469 0x69cd5831
2 QThreadPoolPrivate::tryStart qthreadpool.cpp 203 0x69ccc3f5
3 QThreadPool::start qthreadpool.cpp 474 0x69cccdf4
4 QHostInfoLookupManager::work qhostinfo.cpp 633 0x6cb9b071
5 QHostInfoLookupManager::scheduleLookup qhostinfo.cpp 652 0x6cb9b143
6 QHostInfo::lookupHost qhostinfo.cpp 202 0x6cb9a220
7 qt_qhostinfo_lookup qhostinfo.cpp 722 0x6cb9b4b6
8 QAbstractSocket::connectToHostImplementation qabstractsocket.cpp 1427 0x6cbb17f5
9 QAbstractSocket::qt_static_metacall moc_qabstractsocket.cpp 166 0x6cbb4925
10 QMetaMethod::invoke qmetaobject.cpp 1664 0x69dc784f
11 QMetaObject::invokeMethod qmetaobject.cpp 1179 0x69dc6d6b
12 QMetaObject::invokeMethod qobjectdefs.h 418 0x6cd361dd
13 QAbstractSocket::connectToHost qabstractsocket.cpp 1342 0x6cbb13b3
14 QSslSocket::connectToHostImplementation qsslsocket.cpp 1744 0x6cbc7340
15 QSslSocket::qt_static_metacall moc_qsslsocket.cpp 91 0x6cbc93cf
16 QMetaMethod::invoke qmetaobject.cpp 1664 0x69dc784f
17 QMetaObject::invokeMethod qmetaobject.cpp 1179 0x69dc6d6b
18 QMetaObject::invokeMethod qobjectdefs.h 418 0x6cd361dd
19 QAbstractSocket::connectToHost qabstractsocket.cpp 1342 0x6cbb13b3
20 QSslSocket::connectToHostEncrypted qsslsocket.cpp 422 0x6cbc55e1
21 QHttpNetworkConnectionChannel::ensureConnection qhttpnetworkconnectionchannel.cpp 607 0x6cb6191f
22 QHttpNetworkConnectionPrivate::_q_startNextRequest qhttpnetworkconnection.cpp 862 0x6cb5e92c
23 QHttpNetworkConnectionPrivate::queueRequest qhttpnetworkconnection.cpp 501 0x6cb5c57d
24 QHttpNetworkConnection::sendRequest qhttpnetworkconnection.cpp 931 0x6cb5edf2
25 QHttpThreadDelegate::startRequest qhttpthreaddelegate.cpp 291 0x6cb8912a
26 QHttpThreadDelegate::qt_static_metacall moc_qhttpthreaddelegate_p.cpp 113 0x6cbd147c
27 QMetaCallEvent::placeMetaCall qobject.cpp 525 0x69dcf91c
28 QObject::event qobject.cpp 1195 0x69dd08db
29 QApplicationPrivate::notify_helper qapplication.cpp 4551 0x2582f44
30 QApplication::notify qapplication.cpp 3933 0x25808b7
31 QCoreApplication::notifyInternal qcoreapplication.cpp 915 0x69dc0dc6
32 QCoreApplication::sendEvent qcoreapplication.h 231 0x69e35185
33 QCoreApplicationPrivate::sendPostedEvents qcoreapplication.cpp 1539 0x69dc1d2a
34 qt_internal_proc qeventdispatcher_win.cpp 496 0x69de2590
35 USER32!OffsetRect C:\Windows\syswow64\user32.dll 0 0x74cc62fa
36 ?? 0 0x152404
37 ?? 0 0x401
38 ?? 0
The code at the call looks like:
d->handle = (Qt::HANDLE) _beginthreadex(NULL, d->stackSize, QThreadPrivate::start, //d->stackSize is 0
this, CREATE_SUSPENDED, &(d->id));
if (!d->handle) {
qErrnoWarning(errno, "QThread::start: Failed to create thread");
d->running = false;
d->finished = true;
return;
}
Why is this happening and how do I fix it?
EDIT: also of note, there are exactly 500 threads at the point in which this breaks.
There's a good chance you've run out of free address space in your process (for thread stacks) after creating 500 threads. On 32-bit Windows, processes only get 2GB of address space by default (the upper half of the address space being reserved for the kernel). 500 1MB thread stacks (the default size, Qt may go higher or lower) plus all the other allocations your process makes could easily be using that up.
See this Old New Thing article for more.
Possible fixes:
If you know your QThreads don't need very big stacks, you can call QThread::setStackSize() to set a smaller size before starting the thread.
Consider using a thread pool and/or just reducing the number of concurrent threads you start. It's unlikely that you have enough CPU cores to make 500+ threads productive.
Use the Windows /3GB switch and make your application LARGE ADDRESS AWARE to get 3GB of user-mode address space.
Go 64-bit (for 63 bits of user-mode address space).

Qt application killed because Out Of Memory (OOM)

I am running a Qt application on embedded Linux platform. The system has 128 MB RAM, 512MB NAND, no swap. The application uses a custom library for the peripherals, the rest are all Qt and c/c++ libs. The application uses SQLITE3 as well.
After 2-3 hours, the machine starts running very slow, shell commands take 10 or so seconds to respond. Eventually the machine hangs, and finally OOM killer kills the application, and the system starts behaving at normal speed.
After some system memory observations using top command reveals that while application is running, the system free memory is decreasing, while slab keeps on increasing. These are the snaps of top given below. The application is named xyz.
At Application start :
Mem total:126164 anon:3308 map:8436 free:32456
slab:60936 buf:0 cache:27528 dirty:0 write:0
Swap total:0 free:0
PID VSZ VSZRW^ RSS (SHR) DIRTY (SHR) STACK COMMAND
776 29080 9228 8036 528 968 0 84 ./xyz -qws
781 3960 736 1976 1456 520 0 84 sshd: root#notty
786 3676 680 1208 764 416 0 88 /usr/libexec/sftp-server
770 3792 568 1948 1472 464 0 84 {sshd} sshd: root#pts/0
766 3792 568 956 688 252 0 84 /usr/sbin/sshd
388 1864 284 552 332 188 0 84 udevd --daemon
789 2832 272 688 584 84 0 84 top
774 2828 268 668 560 84 0 84 -sh
709 2896 268 556 464 80 0 84 /usr/sbin/inetd
747 2828 268 596 516 68 0 84 /sbin/getty -L ttymxc0 115200 vt100
777 2824 264 444 368 68 0 84 tee out.log
785 2824 264 484 416 68 0 84 sh -c /usr/libexec/sftp-server
1 2824 264 556 488 64 0 84 init
After some time :
Mem total:126164 anon:3312 map:8440 free:9244
slab:83976 buf:0 cache:27584 dirty:0 write:0
Swap total:0 free:0
PID VSZ VSZRW^ RSS (SHR) DIRTY (SHR) STACK COMMAND
776 29080 9228 8044 528 972 0 84 ./xyz -qws
781 3960 736 1976 1456 520 0 84 sshd: root#notty
786 3676 680 1208 764 416 0 88 /usr/libexec/sftp-server
770 3792 568 1948 1472 464 0 84 {sshd} sshd: root#pts/0
766 3792 568 956 688 252 0 84 /usr/sbin/sshd
388 1864 284 552 332 188 0 84 udevd --daemon
789 2832 272 688 584 84 0 84 top
774 2828 268 668 560 84 0 84 -sh
709 2896 268 556 464 80 0 84 /usr/sbin/inetd
747 2828 268 596 516 68 0 84 /sbin/getty -L ttymxc0 115200 vt100
777 2824 264 444 368 68 0 84 tee out.log
785 2824 264 484 416 68 0 84 sh -c /usr/libexec/sftp-server
1 2824 264 556 488 64 0 84 init
Funnily though, I can not see any major changes in the output of top involving the application itself. Eventually the application is killed, top output after that :
Mem total:126164 anon:2356 map:916 free:2368
slab:117944 buf:0 cache:1580 dirty:0 write:0
Swap total:0 free:0
PID VSZ VSZRW^ RSS (SHR) DIRTY (SHR) STACK COMMAND
781 3960 736 708 184 520 0 84 sshd: root#notty
786 3724 728 736 172 484 0 88 /usr/libexec/sftp-server
770 3792 568 648 188 460 0 84 {sshd} sshd: root#pts/0
766 3792 568 252 0 252 0 84 /usr/sbin/sshd
388 1864 284 188 0 188 0 84 udevd --daemon
819 2832 272 676 348 84 0 84 top
774 2828 268 512 324 96 0 84 -sh
709 2896 268 80 0 80 0 84 /usr/sbin/inetd
747 2828 268 68 0 68 0 84 /sbin/getty -L ttymxc0 115200 vt100
785 2824 264 68 0 68 0 84 sh -c /usr/libexec/sftp-server
1 2824 264 64 0 64 0 84 init
The dmesg shows :
sh invoked oom-killer: gfp_mask=0xd0, order=2, oomkilladj=0
[<c002d4c4>] (unwind_backtrace+0x0/0xd4) from [<c0073ac0>] (oom_kill_process+0x54/0x1b8)
[<c0073ac0>] (oom_kill_process+0x54/0x1b8) from [<c0073f14>] (__out_of_memory+0x154/0x178)
[<c0073f14>] (__out_of_memory+0x154/0x178) from [<c0073fa0>] (out_of_memory+0x68/0x9c)
[<c0073fa0>] (out_of_memory+0x68/0x9c) from [<c007649c>] (__alloc_pages_nodemask+0x3e0/0x4c8)
[<c007649c>] (__alloc_pages_nodemask+0x3e0/0x4c8) from [<c0076598>] (__get_free_pages+0x14/0x4c)
[<c0076598>] (__get_free_pages+0x14/0x4c) from [<c002f528>] (get_pgd_slow+0x14/0xdc)
[<c002f528>] (get_pgd_slow+0x14/0xdc) from [<c0043890>] (mm_init+0x84/0xc4)
[<c0043890>] (mm_init+0x84/0xc4) from [<c0097b94>] (bprm_mm_init+0x10/0x138)
[<c0097b94>] (bprm_mm_init+0x10/0x138) from [<c00980a8>] (do_execve+0xf4/0x2a8)
[<c00980a8>] (do_execve+0xf4/0x2a8) from [<c002afc4>] (sys_execve+0x38/0x5c)
[<c002afc4>] (sys_execve+0x38/0x5c) from [<c0027d20>] (ret_fast_syscall+0x0/0x2c)
Mem-info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 42, btch: 7 usd: 0
Active_anon:424 active_file:11 inactive_anon:428
inactive_file:3 unevictable:0 dirty:0 writeback:0 unstable:0
free:608 slab:29498 mapped:14 pagetables:59 bounce:0
DMA free:692kB min:268kB low:332kB high:400kB active_anon:0kB inactive_anon:0kB active_file:4kB inactive_file:0kB unevictable:0kB present:24384kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 103 103
Normal free:1740kB min:1168kB low:1460kB high:1752kB active_anon:1696kB inactive_anon:1712kB active_file:40kB inactive_file:12kB unevictable:0kB present:105664kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 3*4kB 3*8kB 5*16kB 2*32kB 4*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 692kB
Normal: 377*4kB 1*8kB 4*16kB 1*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1740kB
30 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
32768 pages of RAM
687 free pages
1306 reserved pages
29498 slab pages
59 pages shared
0 pages swap cached
Out of memory: kill process 774 (sh) score 339 or a child
Killed process 776 (xyz)
So it's obvious that there is a memory leak, it must be my app since my app is killed. But I am not doing any malloc s from the program. I have taken care as to limit the scope of variables so that they are deallocated after they are used. So I am at a complete loss as to why is slab increasing in the top output. I have tried http://valgrind.org/docs/manual/faq.html#faq.reports but didn't work.
Currently trying to use Valgrind on desktop (since I have read it only works for arm-cortex) to check my business logic.
Addittional info :
root#freescale ~/Application/app$ uname -a
Linux freescale 2.6.31-207-g7286c01 #2053 Fri Jun 22 10:29:11 IST 2012 armv5tejl GNU/Linux
Compiler : arm-none-linux-gnueabi-4.1.2 glibc2.5
cpp libs : libstdc++.so.6.0.8
Qt : 4.7.3 libs
Any pointers would be greatly appreciated...
I don't think the problem is directly in your code.
The reason is obvious: your application space does not increase (both RSS and VSW do not increase).
However, you do see the number of slabs increasing. You cannot use or increase the number of slabs from your application - it's a kernel-only thingie.
Some obvious causes of slab size increase from the top of my head:
you never really close network sockets
you read many files, but never close them
you use many ioctls
I would run strace and look at its output for a while. strace intercepts interactions with the kernel. If you have memory issues, I'd expect repeated calls to brk(). If you have other issues, you'll see repeated calls to open without close.
If you have some data structure allocation, check for the correctness of adding children and etc.. I had similar bug in my code. Also if you make big and large queries to the database it may use more ram memory. Try to find some memory leak detector to find if there is any leak.