The throughput of ip_reasseble is low - dpdk

The throughput of ip_reasseble is low, ip_reassembly example was used to test it. Another PC2 run iperf -c to send ip fragments packets, the maximum throughput is about 900Mbps.
Testing setups:
PC1(dpdk) --- PC1(iperf -c)
PC1: 10Gb/s NIC
PC2: 1Gb/s NIC, MTU: 1500
#./build/ip_reassembly -l 2 -- -p 0x1 &
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:00:1f.6 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:15b7 net_e1000_em
EAL: PCI device 0000:04:00.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:10fb net_ixgbe
EAL: PCI device 0000:04:00.1 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:10fb net_ixgbe
IP_RSMBL: Creating LPM table on socket 0
IP_RSMBL: Creating LPM6 table on socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
Initializing port 0 ... Port 0 modified RSS hash function based on hardware support,requested:0xa38c configured:0x8104
Address:00:1B:21:C1:E9:C6
txq=2,0,0
IP_RSMBL: Socket 0: adding route 100.10.0.0/16 (port 0)
IP_RSMBL: Socket 0: adding route 100.20.0.0/16 (port 1)
IP_RSMBL: Socket 0: adding route 100.30.0.0/16 (port 2)
IP_RSMBL: Socket 0: adding route 100.40.0.0/16 (port 3)
IP_RSMBL: Socket 0: adding route 100.50.0.0/16 (port 4)
IP_RSMBL: Socket 0: adding route 100.60.0.0/16 (port 5)
IP_RSMBL: Socket 0: adding route 100.70.0.0/16 (port 6)
IP_RSMBL: Socket 0: adding route 100.80.0.0/16 (port 7)
IP_RSMBL: Socket 0: adding route 0101:0101:0101:0101:0101:0101:0101:0101/48 (port 0)
IP_RSMBL: Socket 0: adding route 0201:0101:0101:0101:0101:0101:0101:0101/48 (port 1)
IP_RSMBL: Socket 0: adding route 0301:0101:0101:0101:0101:0101:0101:0101/48 (port 2)
IP_RSMBL: Socket 0: adding route 0401:0101:0101:0101:0101:0101:0101:0101/48 (port 3)
IP_RSMBL: Socket 0: adding route 0501:0101:0101:0101:0101:0101:0101:0101/48 (port 4)
IP_RSMBL: Socket 0: adding route 0601:0101:0101:0101:0101:0101:0101:0101/48 (port 5)
IP_RSMBL: Socket 0: adding route 0701:0101:0101:0101:0101:0101:0101:0101/48 (port 6)
IP_RSMBL: Socket 0: adding route 0801:0101:0101:0101:0101:0101:0101:0101/48 (port 7)
Checking link status
done
Port0 Link Up. Speed 10000 Mbps - full-duplex
IP_RSMBL: entering main loop on lcore 2
IP_RSMBL: -- lcoreid=2 portid=0
run iperf -c on PC2
# iperf -c 192.168.10.157 -i 1 -u -t 30 -p 2152 -b 900M -l 1600
------------------------------------------------------------
Client connecting to 192.168.10.157, UDP port 2152
Sending 1600 byte datagrams, IPG target: 13.56 us (kalman adjust)
UDP buffer size: 958 MByte (default)
------------------------------------------------------------
[ 3] local 192.168.10.100 port 37771 connected with 192.168.10.157 port 2152
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 113 MBytes 944 Mbits/sec
[ 3] 1.0- 2.0 sec 112 MBytes 944 Mbits/sec
[ 3] 2.0- 3.0 sec 112 MBytes 944 Mbits/sec
[ 3] 3.0- 4.0 sec 113 MBytes 944 Mbits/sec
[ 3] 4.0- 5.0 sec 112 MBytes 944 Mbits/sec
[ 3] 5.0- 6.0 sec 112 MBytes 944 Mbits/sec
[ 3] 6.0- 7.0 sec 113 MBytes 944 Mbits/sec
[ 3] 7.0- 8.0 sec 112 MBytes 944 Mbits/sec
[ 3] 8.0- 9.0 sec 113 MBytes 944 Mbits/sec
[ 3] 9.0-10.0 sec 112 MBytes 944 Mbits/sec
[ 3] 10.0-11.0 sec 112 MBytes 944 Mbits/sec
[ 3] 11.0-12.0 sec 112 MBytes 944 Mbits/sec
[ 3] 12.0-13.0 sec 112 MBytes 944 Mbits/sec
[ 3] 13.0-14.0 sec 112 MBytes 944 Mbits/sec
[ 3] 14.0-15.0 sec 113 MBytes 944 Mbits/sec
[ 3] 15.0-16.0 sec 112 MBytes 944 Mbits/sec
[ 3] 16.0-17.0 sec 113 MBytes 944 Mbits/sec
[ 3] 17.0-18.0 sec 112 MBytes 944 Mbits/sec
[ 3] 18.0-19.0 sec 112 MBytes 944 Mbits/sec
[ 3] 19.0-20.0 sec 112 MBytes 944 Mbits/sec
[ 3] 20.0-21.0 sec 113 MBytes 944 Mbits/sec
[ 3] 21.0-22.0 sec 112 MBytes 944 Mbits/sec
[ 3] 22.0-23.0 sec 112 MBytes 944 Mbits/sec
[ 3] 23.0-24.0 sec 112 MBytes 944 Mbits/sec
[ 3] 24.0-25.0 sec 112 MBytes 944 Mbits/sec
[ 3] 25.0-26.0 sec 113 MBytes 944 Mbits/sec
[ 3] 26.0-27.0 sec 112 MBytes 944 Mbits/sec
[ 3] 27.0-28.0 sec 112 MBytes 944 Mbits/sec
[ 3] 28.0-29.0 sec 113 MBytes 944 Mbits/sec
[ 3] 29.0-30.0 sec 113 MBytes 944 Mbits/sec
[ 3] WARNING: did not receive ack of last datagram after 10 tries.
[ 3] 0.0-30.0 sec 3.30 GBytes 944 Mbits/sec
[ 3] Sent 2211842 datagrams
the result of ip fragments reassembled:
# ps
PID TTY TIME CMD
335 pts/9 00:02:35 ip_reassembly
535 pts/9 00:00:00 ps
25304 pts/9 00:00:00 su
25306 pts/9 00:00:00 zsh
# kill -SIGUSR1 335
-- lcoreid=2 portid=0 frag tbl stat:
max entries: 4096;
entries in use: 4088;
finds/inserts: 4344078;
entries added: 883521;
entries deleted by timeout: 837;
entries reused by timeout: 0;
total add failures: 2581961;
add no-space failures: 2581961;
add hash-collisions failures: 0;
TX bursts: 0
TX packets _queued: 0
TX packets dropped: 0
TX packets send: 0
RX gtpu packets: 872727
I add some stats of udp port 2152 in example of ip_reassembly to show the successful reassembled
packets. According to the result, the PC2 send 2211842 datagrams while only 872727 packets were reassembled by ip_reassembly. When I low down the sending speed of iperf to 800Mbps, no drops print.
I don't find the throughput description in DPDK guide https://doc.dpdk.org/guides-22.07/prog_guide/ip_fragment_reassembly_lib.html
has anyone met the same questions?

Related

How to match a regular expression to a text file?

I want to read from a file as text by data=fileread('channelresult') function. Then the data is used for regular expression matching. The channelresult file content is:
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-0.10 sec 9.24 MBytes 774 Mbits/sec
[ 4] 0.10-0.20 sec 14.8 MBytes 1.24 Gbits/sec
[ 4] 0.20-0.30 sec 15.0 MBytes 1.27 Gbits/sec
[ 4] 0.30-0.40 sec 17.6 MBytes 1.48 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.74 GBytes 1.49 Gbits/sec 1005 sender
[ 4] 0.00-10.00 sec 1.74 GBytes 1.49 Gbits/sec receiver
The regular expression I use is
pattern=number_str+'\s+sec\s+'+number_str+'\s+\w+\s+'+number_str+'\s+(\w)\w+/\w+\s+(\d+)\s+'+number_str+'\s(\w)'
And number_str='(\d*\.\d+|\d+)'. When I use out = regexp(data,pattern,'match') the variable out does not contain anything. It's a 0 by 0 cell array.

AWS intermittent slow ssh and GET requests

we have few EC2 server on AWS but one one particular server we are noticing odd issues. First of all, SSH takes tens of seconds but only happens intermittently. Secondly, get request to the server also takes tens of seconds, but again it happens intermittently. We checked the stats on the server and everything looks fine. CPU average is around 7% and 25GB of free ram.
Here is the response from ss -s:
Total: 2553 (kernel 0)
TCP: 12015 (estab 2342, closed 9189, orphaned 478, synrecv 0, timewait 9188/0), ports 0
Transport Total IP IPv6
* 0 - -
RAW 0 0 0
UDP 4 4 0
TCP 2826 2765 61
INET 2830 2769 61
FRAG 0 0 0
And breakdown of current connection status:
1 established)
1 Foreign
6 LISTEN
62 LAST_ACK
136 SYN_RECV
155 CLOSING
251 FIN_WAIT1
1078 FIN_WAIT2
2197 ESTABLISHED
8229 TIME_WAIT
We have ruled out DNS issue, as it happens when we try to access it via it's hostname or the ip address. There are no load balancer in front of that server. We do use Route53 for routing purposes, but I don't see any issue with that.
Using AB to do get requests:
Run #1
Server Software: nginx/1.6.1
Server Hostname: my.hidden.com
Server Port: 443
SSL/TLS Protocol: TLSv1,DHE-RSA-AES256-SHA,2048,256
Document Path: /debug
Document Length: 43 bytes
Concurrency Level: 1
Time taken for tests: 104.415 seconds
Complete requests: 500
Failed requests: 0
Total transferred: 162304 bytes
HTML transferred: 21500 bytes
Requests per second: 4.79 [#/sec] (mean)
Time per request: 208.829 [ms] (mean)
Time per request: 208.829 [ms] (mean, across all concurrent requests)
Transfer rate: 1.52 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 134 160 35.7 154 709
Processing: 39 49 8.2 48 127
Waiting: 39 49 8.2 48 127
Total: 178 209 38.6 203 809
Percentage of the requests served within a certain time (ms)
50% 203
66% 209
75% 212
80% 215
90% 225
95% 244
98% 273
99% 302
100% 809 (longest request)
Run #2
Server Software: nginx/1.6.1
Server Hostname: my.hidden.com
Server Port: 443
SSL/TLS Protocol: TLSv1,DHE-RSA-AES256-SHA,2048,256
Document Path: /debug
Document Length: 43 bytes
Concurrency Level: 1
Time taken for tests: 515.608 seconds
Complete requests: 500
Failed requests: 0
Total transferred: 162284 bytes
HTML transferred: 21500 bytes
Requests per second: 0.97 [#/sec] (mean)
Time per request: 1031.216 [ms] (mean)
Time per request: 1031.216 [ms] (mean, across all concurrent requests)
Transfer rate: 0.31 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 136 980 1251.8 164 5575
Processing: 39 51 33.0 48 730
Waiting: 39 51 33.0 48 730
Total: 180 1031 1258.7 216 6306
Percentage of the requests served within a certain time (ms)
50% 216
66% 243
75% 2839
80% 2850
90% 2874
95% 2903
98% 2935
99% 2992
100% 6306 (longest request)
Run #3
Server Software: nginx/1.6.1
Server Hostname: my.hidden.com
Server Port: 443
SSL/TLS Protocol: TLSv1,DHE-RSA-AES256-SHA,2048,256
Document Path: /debug
Document Length: 43 bytes
Concurrency Level: 1
Time taken for tests: 417.639 seconds
Complete requests: 500
Failed requests: 0
Total transferred: 162320 bytes
HTML transferred: 21500 bytes
Requests per second: 1.20 [#/sec] (mean)
Time per request: 835.279 [ms] (mean)
Time per request: 835.279 [ms] (mean, across all concurrent requests)
Transfer rate: 0.38 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 137 781 1140.0 162 5053
Processing: 38 54 56.6 49 1281
Waiting: 38 54 56.6 49 1281
Total: 179 835 1141.2 213 5104
Percentage of the requests served within a certain time (ms)
50% 213
66% 233
75% 334
80% 2835
90% 2872
95% 2915
98% 3030
99% 3137
100% 5104 (longest request)
Run #4
Server Software: nginx/1.6.1
Server Hostname: my.hidden.com
Server Port: 443
SSL/TLS Protocol: TLSv1,DHE-RSA-AES256-SHA,2048,256
Document Path: /debug
Document Length: 43 bytes
Concurrency Level: 1
Time taken for tests: 104.806 seconds
Complete requests: 500
Failed requests: 0
Total transferred: 162250 bytes
HTML transferred: 21500 bytes
Requests per second: 4.77 [#/sec] (mean)
Time per request: 209.611 [ms] (mean)
Time per request: 209.611 [ms] (mean, across all concurrent requests)
Transfer rate: 1.51 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 135 160 13.8 157 227
Processing: 39 50 8.8 48 159
Waiting: 39 50 8.8 48 159
Total: 179 209 17.7 206 331
Percentage of the requests served within a certain time (ms)
50% 206
66% 212
75% 216
80% 219
90% 230
95% 240
98% 266
99% 275
100% 331 (longest request)
Run #5
Server Software: nginx/1.6.1
Server Hostname: my.hidden.com
Server Port: 443
SSL/TLS Protocol: TLSv1,DHE-RSA-AES256-SHA,2048,256
Document Path: /debug
Document Length: 43 bytes
Concurrency Level: 1
Time taken for tests: 110.983 seconds
Complete requests: 500
Failed requests: 0
Total transferred: 162282 bytes
HTML transferred: 21500 bytes
Requests per second: 4.51 [#/sec] (mean)
Time per request: 221.967 [ms] (mean)
Time per request: 221.967 [ms] (mean, across all concurrent requests)
Transfer rate: 1.43 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 132 170 149.3 159 3460
Processing: 38 52 25.8 49 589
Waiting: 38 52 25.8 49 589
Total: 177 222 152.6 208 3532
Percentage of the requests served within a certain time (ms)
50% 208
66% 217
75% 222
80% 227
90% 240
95% 257
98% 282
99% 336
100% 3532 (longest request)

GREP values from a column in txt file

I have a txt file with 1200 entries in this way (iPerf output by the way)
1 [ 4] 0.0- 1.0 sec 10.6 MBytes 89.1 Mbits/sec
2 [ 4] 1.0- 2.0 sec 13.5 MBytes 113 Mbits/sec
3 [ 4] 2.0- 3.0 sec 9.50 MBytes 79.7 Mbits/sec
4 [ 4] 3.0- 4.0 sec 9.00 MBytes 75.5 Mbits/sec
How can I get ONLY the second values expressed in Mbits/sec using grep ?
Output example:
89.1
113
79.7
75.5
awk '{print $9}' your-file.txt
will do it for you. For example:
$ cat ~/test.txt
1 [ 4] 0.0- 1.0 sec 10.6 MBytes 89.1 Mbits/sec
2 [ 4] 1.0- 2.0 sec 13.5 MBytes 113 Mbits/sec
3 [ 4] 2.0- 3.0 sec 9.50 MBytes 79.7 Mbits/sec
4 [ 4] 3.0- 4.0 sec 9.00 MBytes 75.5 Mbits/sec
$ awk '{print $9}' ~/test.txt
89.1
113
79.7
75.5
Another way to tackle this is:
awk -F 'MBytes' '{print $2}' test.txt | awk -F 'Mbits' '{print $1}' | tr -d " "
In the above method we are:
Splitting each line by MBytes.
That gives us 2 parts: $1 is everything before MBytes. $2 is everything after MBytes
We choose everything after MBytes and split it further by Mbits
That gives us two parts again and we choose everything before Mbits
If there is white space before and after the numbers, we use tr to remove white space
So we get
$ cat test.txt
1 [ 4] 0.0- 1.0 sec 10.6 MBytes 89.1 Mbits/sec
2 [ 4] 1.0- 2.0 sec 13.5 MBytes 113 Mbits/sec
3 [ 4] 2.0- 3.0 sec 9.50 MBytes 79.7 Mbits/sec
4 [ 4] 3.0- 4.0 sec 9.00 MBytes 75.5 Mbits/sec
awk -F 'MBytes' '{print $2}' test.txt | awk -F 'Mbits' '{print $1}' | tr -d " "
Result:
89.1
113
79.7
75.5
if your data is fixed length format you can always use cut
cut -c38-41 data
if you know that the values are 4 chars wide.

Approach guidance: grepping a string between multiple, slightly different delimiters

Assume a 3Kb file that looks like this:
PdId1 Unit 1
Model 3244
Status: OK
Advanced Status OK
-----------------------
No errors found
Statistics...
...<arbitrary length values here>...
PdId2 Unit 1
Model 3222
Status: OK
Advanced Status OK
-----------------------
Error Log is as follows <arbitrary values here>
PdId3 Unit 1
Model 3243
Status: OK
Advanced Status OK
-----------------------
No errors found
So we can be certain that PdIdn can reliably used as a delimiter, that it's always at the start of a line and that it's always trailing a numebr. I want to parse the text between the delimiter for "No errors found" and if the string is missing, grab the delimiter and the next four lines (grep -A4), glue on an error message and echo the result.
I've been wracking my brain about how to approach this. I'm most comfortable in Bash with grep, but I don't think grep's going to cut it here. I've looked at using split to break the file into pieces, but this seems messy and hard to clean up after processing is done. I started to try to write something in awk / sed, but I don't understand how to split on the delimiters, then go back and parse each result, then break off the next piece and parse that.
I apologise for the general nature of this question, but I'm stumped and could use some guidance.
Edit: Technically, PdId isn't a delimiter as much as it's the start of the next record. The number of records is arbitrary.
Edit: We've now got real world data to work with:
-------------------------------------------------------------------------------
PdId: 1
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 16/41 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 251) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 71
3 Spin_Up_Time 0x0007 169 169 024 Pre-fail Always
- 245 (Average 204)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 746
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1181
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 751
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 751
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 16/41)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 2
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 16/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 246) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 72
3 Spin_Up_Time 0x0007 171 171 024 Pre-fail Always
- 243 (Average 201)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 746
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1181
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 749
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 749
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 16/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 3
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 17/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 241) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 140 140 054 Pre-fail Offline
- 67
3 Spin_Up_Time 0x0007 170 170 024 Pre-fail Always
- 234 (Average 213)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 748
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1188
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 750
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 750
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 17/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 4
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 15/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 254) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
ATA Error Count: 165 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 165 occurred at disk power-on lifetime: 1176 hours (49 days + 0 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 50 b0 ee 81 0d
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 80 a8 80 ee 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 a0 00 ee 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 98 80 ed 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 90 00 ed 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 88 80 ec 81 40 00 18:38:48.275 WRITE FPDMA QUEUED
Error 164 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 10 f0 ad 6b 0d Error: ICRC, ABRT 16 sectors at LBA = 0x0d6badf0 = 225160688
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
35 00 80 80 ad 6b 40 00 18:36:07.145 WRITE DMA EXT
35 00 80 00 ae 6b 40 00 18:36:07.144 WRITE DMA EXT
35 00 80 00 ad 6b 40 00 18:36:07.144 WRITE DMA EXT
35 00 80 80 ab 6b 40 00 18:36:07.139 WRITE DMA EXT
35 00 80 00 ab 6b 40 00 18:36:07.139 WRITE DMA EXT
Error 163 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 f0 10 5e 5d 0d Error: ICRC, ABRT 240 sectors at LBA = 0x0d5d5e10 = 224222736
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
35 00 80 80 5b 5d 40 00 18:35:47.982 WRITE DMA EXT
35 00 80 80 5a 5d 40 00 18:35:47.982 WRITE DMA EXT
35 00 80 00 59 5d 40 00 18:35:47.981 WRITE DMA EXT
35 00 00 00 58 5d 40 00 18:35:47.979 WRITE DMA EXT
35 00 30 00 36 5d 40 00 18:35:47.960 WRITE DMA EXT
Error 162 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 20 e0 33 19 0d
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 80 30 00 33 19 40 00 18:34:50.672 WRITE FPDMA QUEUED
61 80 28 80 33 19 40 00 18:34:50.671 WRITE FPDMA QUEUED
61 80 20 00 34 19 40 00 18:34:50.671 WRITE FPDMA QUEUED
61 00 18 80 34 19 40 00 18:34:50.671 WRITE FPDMA QUEUED
61 80 10 80 36 19 40 00 18:34:50.670 WRITE FPDMA QUEUED
Error 161 occurred at disk power-on lifetime: 1133 hours (47 days + 5 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 d0 30 dd 3b 0a
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 80 38 80 dc 3b 40 00 06:26:51.414 WRITE FPDMA QUEUED
61 80 30 00 df 3b 40 00 06:26:51.413 WRITE FPDMA QUEUED
61 80 28 80 df 3b 40 00 06:26:51.413 WRITE FPDMA QUEUED
61 80 20 00 da 3b 40 00 06:26:51.402 WRITE FPDMA QUEUED
61 80 18 80 da 3b 40 00 06:26:51.402 WRITE FPDMA QUEUED
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 73
3 Spin_Up_Time 0x0007 170 170 024 Pre-fail Always
- 234 (Average 212)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 747
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1187
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 748
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 748
194 Temperature_Celsius 0x0002 200 200 000 Old_age Always
- 30 (Lifetime Min/Max 15/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 165
-------------------------------------------------------------------------------
PdId: 5
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 17/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 251) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 140 140 054 Pre-fail Offline
- 68
3 Spin_Up_Time 0x0007 133 133 024 Pre-fail Always
- 289 (Average 282)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 748
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1186
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 750
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 750
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 17/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 6
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 30 Celsius
Power Cycle Min/Max Temperature: 27/30 Celsius
Lifetime Min/Max Temperature: 17/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 243) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 72
3 Spin_Up_Time 0x0007 130 130 024 Pre-fail Always
- 294 (Average 287)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 748
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1186
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 751
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 751
194 Temperature_Celsius 0x0002 200 200 000 Old_age Always
- 30 (Lifetime Min/Max 17/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
Read line by line, collect into an accumulator the lines you want, print the accumulated lines when you see the trigger message (otherwise just start over and overwrite the accumulator when you see the start of the next record).
We use a as the accumulator and the helper variable n to keep track of how many lines to accumulate:
awk '/^PdId: [1-9][0-9]*/ { a=$0; n=4; next }
n { --n; a=a "\n" $ 0; next }
/No Errors Logged/ { print a }' file
Put the following into an executable awk file:
#!/usr/bin/awk -f
BEGIN {no_errs=1}
c > 0 {a[c++]=$0}
/^----------/ {
logAnyErrors()
ata_err=""
no_errs=0
c=1
delete a
}
/^No Errors Logged/ {no_errs=1}
/^ATA Error Count:/ {ata_err=$0}
function logAnyErrors() {
if( ata_err!="" || !no_errs) {
for(i=1;i<=5;i++) print a[i]
if( ata_err!="" ) print ata_err
print "--" # separator
}
}
END { logAnyErrors() }
Your data actually has a delimiter of "^------------------"... before each PdId.
The breakdown:
Start off assuming no errors in the BEGIN block
add each record line to an array called a with a line counter c
Whenever a new record section occurs call logAnyErrors() and reset counters
In logAnyErrors(), if there are ATA or other errors, print the first 5 lines of the record and a delimiter similar to what I think grep -A4 would output.
At the end, log any errors in the final record.
When I put this into an executable file called awko and run like awko data I get the following output:
PdId: 4
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
ATA Error Count: 165 (device log contains only the most recent five errors)
----------------
It's possible that the delete a line is non-conforming for some awks. Works on my mac. It's not necessary unless you want to print out more information in each block when errors occur(since the first 5 lines will always be overwritten).

Qt application killed because Out Of Memory (OOM)

I am running a Qt application on embedded Linux platform. The system has 128 MB RAM, 512MB NAND, no swap. The application uses a custom library for the peripherals, the rest are all Qt and c/c++ libs. The application uses SQLITE3 as well.
After 2-3 hours, the machine starts running very slow, shell commands take 10 or so seconds to respond. Eventually the machine hangs, and finally OOM killer kills the application, and the system starts behaving at normal speed.
After some system memory observations using top command reveals that while application is running, the system free memory is decreasing, while slab keeps on increasing. These are the snaps of top given below. The application is named xyz.
At Application start :
Mem total:126164 anon:3308 map:8436 free:32456
slab:60936 buf:0 cache:27528 dirty:0 write:0
Swap total:0 free:0
PID VSZ VSZRW^ RSS (SHR) DIRTY (SHR) STACK COMMAND
776 29080 9228 8036 528 968 0 84 ./xyz -qws
781 3960 736 1976 1456 520 0 84 sshd: root#notty
786 3676 680 1208 764 416 0 88 /usr/libexec/sftp-server
770 3792 568 1948 1472 464 0 84 {sshd} sshd: root#pts/0
766 3792 568 956 688 252 0 84 /usr/sbin/sshd
388 1864 284 552 332 188 0 84 udevd --daemon
789 2832 272 688 584 84 0 84 top
774 2828 268 668 560 84 0 84 -sh
709 2896 268 556 464 80 0 84 /usr/sbin/inetd
747 2828 268 596 516 68 0 84 /sbin/getty -L ttymxc0 115200 vt100
777 2824 264 444 368 68 0 84 tee out.log
785 2824 264 484 416 68 0 84 sh -c /usr/libexec/sftp-server
1 2824 264 556 488 64 0 84 init
After some time :
Mem total:126164 anon:3312 map:8440 free:9244
slab:83976 buf:0 cache:27584 dirty:0 write:0
Swap total:0 free:0
PID VSZ VSZRW^ RSS (SHR) DIRTY (SHR) STACK COMMAND
776 29080 9228 8044 528 972 0 84 ./xyz -qws
781 3960 736 1976 1456 520 0 84 sshd: root#notty
786 3676 680 1208 764 416 0 88 /usr/libexec/sftp-server
770 3792 568 1948 1472 464 0 84 {sshd} sshd: root#pts/0
766 3792 568 956 688 252 0 84 /usr/sbin/sshd
388 1864 284 552 332 188 0 84 udevd --daemon
789 2832 272 688 584 84 0 84 top
774 2828 268 668 560 84 0 84 -sh
709 2896 268 556 464 80 0 84 /usr/sbin/inetd
747 2828 268 596 516 68 0 84 /sbin/getty -L ttymxc0 115200 vt100
777 2824 264 444 368 68 0 84 tee out.log
785 2824 264 484 416 68 0 84 sh -c /usr/libexec/sftp-server
1 2824 264 556 488 64 0 84 init
Funnily though, I can not see any major changes in the output of top involving the application itself. Eventually the application is killed, top output after that :
Mem total:126164 anon:2356 map:916 free:2368
slab:117944 buf:0 cache:1580 dirty:0 write:0
Swap total:0 free:0
PID VSZ VSZRW^ RSS (SHR) DIRTY (SHR) STACK COMMAND
781 3960 736 708 184 520 0 84 sshd: root#notty
786 3724 728 736 172 484 0 88 /usr/libexec/sftp-server
770 3792 568 648 188 460 0 84 {sshd} sshd: root#pts/0
766 3792 568 252 0 252 0 84 /usr/sbin/sshd
388 1864 284 188 0 188 0 84 udevd --daemon
819 2832 272 676 348 84 0 84 top
774 2828 268 512 324 96 0 84 -sh
709 2896 268 80 0 80 0 84 /usr/sbin/inetd
747 2828 268 68 0 68 0 84 /sbin/getty -L ttymxc0 115200 vt100
785 2824 264 68 0 68 0 84 sh -c /usr/libexec/sftp-server
1 2824 264 64 0 64 0 84 init
The dmesg shows :
sh invoked oom-killer: gfp_mask=0xd0, order=2, oomkilladj=0
[<c002d4c4>] (unwind_backtrace+0x0/0xd4) from [<c0073ac0>] (oom_kill_process+0x54/0x1b8)
[<c0073ac0>] (oom_kill_process+0x54/0x1b8) from [<c0073f14>] (__out_of_memory+0x154/0x178)
[<c0073f14>] (__out_of_memory+0x154/0x178) from [<c0073fa0>] (out_of_memory+0x68/0x9c)
[<c0073fa0>] (out_of_memory+0x68/0x9c) from [<c007649c>] (__alloc_pages_nodemask+0x3e0/0x4c8)
[<c007649c>] (__alloc_pages_nodemask+0x3e0/0x4c8) from [<c0076598>] (__get_free_pages+0x14/0x4c)
[<c0076598>] (__get_free_pages+0x14/0x4c) from [<c002f528>] (get_pgd_slow+0x14/0xdc)
[<c002f528>] (get_pgd_slow+0x14/0xdc) from [<c0043890>] (mm_init+0x84/0xc4)
[<c0043890>] (mm_init+0x84/0xc4) from [<c0097b94>] (bprm_mm_init+0x10/0x138)
[<c0097b94>] (bprm_mm_init+0x10/0x138) from [<c00980a8>] (do_execve+0xf4/0x2a8)
[<c00980a8>] (do_execve+0xf4/0x2a8) from [<c002afc4>] (sys_execve+0x38/0x5c)
[<c002afc4>] (sys_execve+0x38/0x5c) from [<c0027d20>] (ret_fast_syscall+0x0/0x2c)
Mem-info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 42, btch: 7 usd: 0
Active_anon:424 active_file:11 inactive_anon:428
inactive_file:3 unevictable:0 dirty:0 writeback:0 unstable:0
free:608 slab:29498 mapped:14 pagetables:59 bounce:0
DMA free:692kB min:268kB low:332kB high:400kB active_anon:0kB inactive_anon:0kB active_file:4kB inactive_file:0kB unevictable:0kB present:24384kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 103 103
Normal free:1740kB min:1168kB low:1460kB high:1752kB active_anon:1696kB inactive_anon:1712kB active_file:40kB inactive_file:12kB unevictable:0kB present:105664kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 3*4kB 3*8kB 5*16kB 2*32kB 4*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 692kB
Normal: 377*4kB 1*8kB 4*16kB 1*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1740kB
30 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
32768 pages of RAM
687 free pages
1306 reserved pages
29498 slab pages
59 pages shared
0 pages swap cached
Out of memory: kill process 774 (sh) score 339 or a child
Killed process 776 (xyz)
So it's obvious that there is a memory leak, it must be my app since my app is killed. But I am not doing any malloc s from the program. I have taken care as to limit the scope of variables so that they are deallocated after they are used. So I am at a complete loss as to why is slab increasing in the top output. I have tried http://valgrind.org/docs/manual/faq.html#faq.reports but didn't work.
Currently trying to use Valgrind on desktop (since I have read it only works for arm-cortex) to check my business logic.
Addittional info :
root#freescale ~/Application/app$ uname -a
Linux freescale 2.6.31-207-g7286c01 #2053 Fri Jun 22 10:29:11 IST 2012 armv5tejl GNU/Linux
Compiler : arm-none-linux-gnueabi-4.1.2 glibc2.5
cpp libs : libstdc++.so.6.0.8
Qt : 4.7.3 libs
Any pointers would be greatly appreciated...
I don't think the problem is directly in your code.
The reason is obvious: your application space does not increase (both RSS and VSW do not increase).
However, you do see the number of slabs increasing. You cannot use or increase the number of slabs from your application - it's a kernel-only thingie.
Some obvious causes of slab size increase from the top of my head:
you never really close network sockets
you read many files, but never close them
you use many ioctls
I would run strace and look at its output for a while. strace intercepts interactions with the kernel. If you have memory issues, I'd expect repeated calls to brk(). If you have other issues, you'll see repeated calls to open without close.
If you have some data structure allocation, check for the correctness of adding children and etc.. I had similar bug in my code. Also if you make big and large queries to the database it may use more ram memory. Try to find some memory leak detector to find if there is any leak.