I have a bunch of QWebViews rendered onto a widget. There comes a point at which I start getting the error QThread::start: Failed to create thread (The access code is invalid.). Looking at the Qt source, it appears that _beginthreadex is returning a null handle and errno is ERROR_INVALID_ACCESS, but I have no idea why.
Here is the backtrace at the printing of the error:
0 qErrnoWarning qglobal.cpp 2451 0x69ccdd3c
1 QThread::start qthread_win.cpp 469 0x69cd5831
2 QThreadPoolPrivate::tryStart qthreadpool.cpp 203 0x69ccc3f5
3 QThreadPool::start qthreadpool.cpp 474 0x69cccdf4
4 QHostInfoLookupManager::work qhostinfo.cpp 633 0x6cb9b071
5 QHostInfoLookupManager::scheduleLookup qhostinfo.cpp 652 0x6cb9b143
6 QHostInfo::lookupHost qhostinfo.cpp 202 0x6cb9a220
7 qt_qhostinfo_lookup qhostinfo.cpp 722 0x6cb9b4b6
8 QAbstractSocket::connectToHostImplementation qabstractsocket.cpp 1427 0x6cbb17f5
9 QAbstractSocket::qt_static_metacall moc_qabstractsocket.cpp 166 0x6cbb4925
10 QMetaMethod::invoke qmetaobject.cpp 1664 0x69dc784f
11 QMetaObject::invokeMethod qmetaobject.cpp 1179 0x69dc6d6b
12 QMetaObject::invokeMethod qobjectdefs.h 418 0x6cd361dd
13 QAbstractSocket::connectToHost qabstractsocket.cpp 1342 0x6cbb13b3
14 QSslSocket::connectToHostImplementation qsslsocket.cpp 1744 0x6cbc7340
15 QSslSocket::qt_static_metacall moc_qsslsocket.cpp 91 0x6cbc93cf
16 QMetaMethod::invoke qmetaobject.cpp 1664 0x69dc784f
17 QMetaObject::invokeMethod qmetaobject.cpp 1179 0x69dc6d6b
18 QMetaObject::invokeMethod qobjectdefs.h 418 0x6cd361dd
19 QAbstractSocket::connectToHost qabstractsocket.cpp 1342 0x6cbb13b3
20 QSslSocket::connectToHostEncrypted qsslsocket.cpp 422 0x6cbc55e1
21 QHttpNetworkConnectionChannel::ensureConnection qhttpnetworkconnectionchannel.cpp 607 0x6cb6191f
22 QHttpNetworkConnectionPrivate::_q_startNextRequest qhttpnetworkconnection.cpp 862 0x6cb5e92c
23 QHttpNetworkConnectionPrivate::queueRequest qhttpnetworkconnection.cpp 501 0x6cb5c57d
24 QHttpNetworkConnection::sendRequest qhttpnetworkconnection.cpp 931 0x6cb5edf2
25 QHttpThreadDelegate::startRequest qhttpthreaddelegate.cpp 291 0x6cb8912a
26 QHttpThreadDelegate::qt_static_metacall moc_qhttpthreaddelegate_p.cpp 113 0x6cbd147c
27 QMetaCallEvent::placeMetaCall qobject.cpp 525 0x69dcf91c
28 QObject::event qobject.cpp 1195 0x69dd08db
29 QApplicationPrivate::notify_helper qapplication.cpp 4551 0x2582f44
30 QApplication::notify qapplication.cpp 3933 0x25808b7
31 QCoreApplication::notifyInternal qcoreapplication.cpp 915 0x69dc0dc6
32 QCoreApplication::sendEvent qcoreapplication.h 231 0x69e35185
33 QCoreApplicationPrivate::sendPostedEvents qcoreapplication.cpp 1539 0x69dc1d2a
34 qt_internal_proc qeventdispatcher_win.cpp 496 0x69de2590
35 USER32!OffsetRect C:\Windows\syswow64\user32.dll 0 0x74cc62fa
36 ?? 0 0x152404
37 ?? 0 0x401
38 ?? 0
The code at the call looks like:
d->handle = (Qt::HANDLE) _beginthreadex(NULL, d->stackSize, QThreadPrivate::start, //d->stackSize is 0
this, CREATE_SUSPENDED, &(d->id));
if (!d->handle) {
qErrnoWarning(errno, "QThread::start: Failed to create thread");
d->running = false;
d->finished = true;
return;
}
Why is this happening and how do I fix it?
EDIT: also of note, there are exactly 500 threads at the point in which this breaks.
There's a good chance you've run out of free address space in your process (for thread stacks) after creating 500 threads. On 32-bit Windows, processes only get 2GB of address space by default (the upper half of the address space being reserved for the kernel). 500 1MB thread stacks (the default size, Qt may go higher or lower) plus all the other allocations your process makes could easily be using that up.
See this Old New Thing article for more.
Possible fixes:
If you know your QThreads don't need very big stacks, you can call QThread::setStackSize() to set a smaller size before starting the thread.
Consider using a thread pool and/or just reducing the number of concurrent threads you start. It's unlikely that you have enough CPU cores to make 500+ threads productive.
Use the Windows /3GB switch and make your application LARGE ADDRESS AWARE to get 3GB of user-mode address space.
Go 64-bit (for 63 bits of user-mode address space).
Related
Link to CausalFS GitHub I'm using v.2.0 of CausalFS cpp package by Kui Yu.
Upon running the structural learning algos, my DAG and MB are not matching.
I'm trying to generate a DAG based on the data given in the CDD/data/data.txt directory and CDD/data.txt via some of the Local-to-global structure learning algos mentioned in the manual (PCMB-CSL, STMB-CSL etc.). Running the commands as given by the manual (pg. 18 of 26).
But my resulting DAG is just filled with zeros (for the most part). Given that this is a example dataset that looks suspicious. Upon then checking CDD/mb/mb.out I find that the Markov blankets for the variables do not agree with the DAG output.
For ex, running ./main ./data/data.txt ./data/net.txt 0.01 PCMB-CSL "" "" -1 gives a 1 at position (1,22) (one-indexed) only (relaxing alpha value to 0.1 (kept at 0.01 in ex) gives just another 1). However, this doesn't agree with the output MB for each variable, which looks like (upon running IAMB as ./main ./data/data.txt ./net/net.txt 0.01 IAMB all "" "")-
0 21
1 22 26 28
2 29
3 14 21
4 5 12
5 4 12
6 8 12
7 8 12
8 6 7 12
9 11 15
10 35
11 9 12 15 33
12 4 6 7 8 11 13
13 8 12 14 15 17 30 34
14 3 13 20
15 8 9 11 13 14 17 30
16 15
17 13 15 18 27 30
18 17 19 20 27
19 18 20
20 14 18 21 28
21 0 3 20 26
22 1 21 23 24 28
23 1 22 24
24 5 22 23 25
25 24
26 1 21 22
27 17 18 28 29
28 1 18 21 22 27 29
29 2 27
30 13 14 15 17
31 34
32 15 18 34
33 11 12 32 35 36
34 30 31 32 35 36
35 10 33 34
36 33 34 35
Such an MB profile suggests the DAG to be much more connected.
I would love to hear suggestions from people who've managed to get the package to behave appropriately. I just do not understand the error here from my side. (I'm running on PopOS 20.04)
Thanks a bunch <3
P.S- The files just continue to write upon rerunning the code, so make sure to appropriately delete them.
I'm using Qt 5.3 mingw on windows and writing application with Qt/QML.
sometimes, a crash happens at start-up when I run the project in Debug mode and trying to debug the code.
the inferior stopped because it received a signal from operating system
Is there any problem with QML or something else?
the stack is like this when error happens.
0 QScopedPointer<QObjectData, QScopedPointerDeleter<QObjectData> >::data 143 0x9d2250c
1 qGetPtrHelper<QScopedPointer<QObjectData> > 941 0x99e1cc7
2 QOpenGLContext::d_func 148 0x9d21f9b
3 QOpenGLContext::isValid 596 0x99e05d1
4 GLAcquireContext::GLAcquireContext 75 0x1af7ec21
5 QQuickContext2DTexture::paint 247 0x1aef2b79
6 QQuickContext2DTexture::event 366 0x1aef34a1
7 QApplicationPrivate::notify_helper 3500 0x217eded3
8 QApplication::notify 2953 0x217eb985
9 QCoreApplication::notifyInternal 935 0x6b929f96
10 QCoreApplication::sendEvent 237 0x6b9cf2db
11 QCoreApplicationPrivate::sendPostedEvents 1539 0x6b92b14e
12 QEventDispatcherWin32::sendPostedEvents 1143 0x6b97b006
13 qt_internal_proc(HWND__*, unsigned int, unsigned int, long)#16 421 0x6b978708
14 gapfnScSendMessage C:\Windows\syswow64\user32.dll 0x768362fa
15 ?? 0x2e07d2
16 USER32!GetThreadDesktop C:\Windows\syswow64\user32.dll 0x76836d3a
17 __lambda0::operator() 364 0x6b978443
18 ?? 0x2e07d2
19 USER32!CharPrevW C:\Windows\syswow64\user32.dll 0x768377c4
20 USER32!DispatchMessageW C:\Windows\syswow64\user32.dll 0x7683788a
21 QEventDispatcherWin32::processEvents 756 0x6b979a0b
22 QEventLoop::processEvents 136 0x6b92803c
23 QEventLoop::exec 212 0x6b9282d7
24 QThread::exec 511 0x6b795f49
25 QThread::run 578 0x6b7960b1
26 QThreadPrivate::start(void*)#4 407 0x6b798b3e
27 msvcrt!_itow_s C:\Windows\syswow64\msvcrt.dll 0x76231287
28 msvcrt!_endthreadex C:\Windows\syswow64\msvcrt.dll 0x76231328
29 KERNEL32!BaseThreadInitThunk C:\Windows\syswow64\kernel32.dll 0x76a233aa
30 ntdll!RtlInitializeExceptionChain C:\Windows\system32\ntdll.dll 0x77079ef2
31 ntdll!RtlInitializeExceptionChain C:\Windows\system32\ntdll.dll 0x77079ec5
32 ??
Any suggestion will be appreciated.
Thanks in advance.
Assume a 3Kb file that looks like this:
PdId1 Unit 1
Model 3244
Status: OK
Advanced Status OK
-----------------------
No errors found
Statistics...
...<arbitrary length values here>...
PdId2 Unit 1
Model 3222
Status: OK
Advanced Status OK
-----------------------
Error Log is as follows <arbitrary values here>
PdId3 Unit 1
Model 3243
Status: OK
Advanced Status OK
-----------------------
No errors found
So we can be certain that PdIdn can reliably used as a delimiter, that it's always at the start of a line and that it's always trailing a numebr. I want to parse the text between the delimiter for "No errors found" and if the string is missing, grab the delimiter and the next four lines (grep -A4), glue on an error message and echo the result.
I've been wracking my brain about how to approach this. I'm most comfortable in Bash with grep, but I don't think grep's going to cut it here. I've looked at using split to break the file into pieces, but this seems messy and hard to clean up after processing is done. I started to try to write something in awk / sed, but I don't understand how to split on the delimiters, then go back and parse each result, then break off the next piece and parse that.
I apologise for the general nature of this question, but I'm stumped and could use some guidance.
Edit: Technically, PdId isn't a delimiter as much as it's the start of the next record. The number of records is arbitrary.
Edit: We've now got real world data to work with:
-------------------------------------------------------------------------------
PdId: 1
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 16/41 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 251) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 71
3 Spin_Up_Time 0x0007 169 169 024 Pre-fail Always
- 245 (Average 204)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 746
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1181
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 751
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 751
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 16/41)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 2
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 16/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 246) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 72
3 Spin_Up_Time 0x0007 171 171 024 Pre-fail Always
- 243 (Average 201)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 746
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1181
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 749
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 749
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 16/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 3
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 17/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 241) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 140 140 054 Pre-fail Offline
- 67
3 Spin_Up_Time 0x0007 170 170 024 Pre-fail Always
- 234 (Average 213)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 748
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1188
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 750
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 750
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 17/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 4
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 15/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 254) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
ATA Error Count: 165 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 165 occurred at disk power-on lifetime: 1176 hours (49 days + 0 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 50 b0 ee 81 0d
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 80 a8 80 ee 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 a0 00 ee 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 98 80 ed 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 90 00 ed 81 40 00 18:38:48.276 WRITE FPDMA QUEUED
61 80 88 80 ec 81 40 00 18:38:48.275 WRITE FPDMA QUEUED
Error 164 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 10 f0 ad 6b 0d Error: ICRC, ABRT 16 sectors at LBA = 0x0d6badf0 = 225160688
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
35 00 80 80 ad 6b 40 00 18:36:07.145 WRITE DMA EXT
35 00 80 00 ae 6b 40 00 18:36:07.144 WRITE DMA EXT
35 00 80 00 ad 6b 40 00 18:36:07.144 WRITE DMA EXT
35 00 80 80 ab 6b 40 00 18:36:07.139 WRITE DMA EXT
35 00 80 00 ab 6b 40 00 18:36:07.139 WRITE DMA EXT
Error 163 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 f0 10 5e 5d 0d Error: ICRC, ABRT 240 sectors at LBA = 0x0d5d5e10 = 224222736
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
35 00 80 80 5b 5d 40 00 18:35:47.982 WRITE DMA EXT
35 00 80 80 5a 5d 40 00 18:35:47.982 WRITE DMA EXT
35 00 80 00 59 5d 40 00 18:35:47.981 WRITE DMA EXT
35 00 00 00 58 5d 40 00 18:35:47.979 WRITE DMA EXT
35 00 30 00 36 5d 40 00 18:35:47.960 WRITE DMA EXT
Error 162 occurred at disk power-on lifetime: 1175 hours (48 days + 23 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 20 e0 33 19 0d
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 80 30 00 33 19 40 00 18:34:50.672 WRITE FPDMA QUEUED
61 80 28 80 33 19 40 00 18:34:50.671 WRITE FPDMA QUEUED
61 80 20 00 34 19 40 00 18:34:50.671 WRITE FPDMA QUEUED
61 00 18 80 34 19 40 00 18:34:50.671 WRITE FPDMA QUEUED
61 80 10 80 36 19 40 00 18:34:50.670 WRITE FPDMA QUEUED
Error 161 occurred at disk power-on lifetime: 1133 hours (47 days + 5 hours)
When the command that caused the error occurred,
the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 d0 30 dd 3b 0a
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 80 38 80 dc 3b 40 00 06:26:51.414 WRITE FPDMA QUEUED
61 80 30 00 df 3b 40 00 06:26:51.413 WRITE FPDMA QUEUED
61 80 28 80 df 3b 40 00 06:26:51.413 WRITE FPDMA QUEUED
61 80 20 00 da 3b 40 00 06:26:51.402 WRITE FPDMA QUEUED
61 80 18 80 da 3b 40 00 06:26:51.402 WRITE FPDMA QUEUED
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 73
3 Spin_Up_Time 0x0007 170 170 024 Pre-fail Always
- 234 (Average 212)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 747
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1187
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 748
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 748
194 Temperature_Celsius 0x0002 200 200 000 Old_age Always
- 30 (Lifetime Min/Max 15/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 165
-------------------------------------------------------------------------------
PdId: 5
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 27/31 Celsius
Lifetime Min/Max Temperature: 17/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 251) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 140 140 054 Pre-fail Offline
- 68
3 Spin_Up_Time 0x0007 133 133 024 Pre-fail Always
- 289 (Average 282)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 748
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1186
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 750
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 750
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always
- 31 (Lifetime Min/Max 17/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
-------------------------------------------------------------------------------
PdId: 6
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 30 Celsius
Power Cycle Min/Max Temperature: 27/30 Celsius
Lifetime Min/Max Temperature: 17/40 Celsius
Under/Over Temperature Limit Count: 0/0
Self-test execution status: ( 0) The previous self-test routine
completed without error or no self-test
has ever been run.
has ever been run.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 243) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Self-test log structure revision number: 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Error Log Version: 1
No Errors Logged
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
==============================================================================
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
==============================================================================
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always
- 0
2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline
- 72
3 Spin_Up_Time 0x0007 130 130 024 Pre-fail Always
- 294 (Average 287)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always
- 748
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always
- 0
8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline
- 33
9 Power_On_Hours 0x0012 100 100 000 Old_age Always
- 1186
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 529
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 751
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always
- 751
194 Temperature_Celsius 0x0002 200 200 000 Old_age Always
- 30 (Lifetime Min/Max 17/40)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always
- 0
Read line by line, collect into an accumulator the lines you want, print the accumulated lines when you see the trigger message (otherwise just start over and overwrite the accumulator when you see the start of the next record).
We use a as the accumulator and the helper variable n to keep track of how many lines to accumulate:
awk '/^PdId: [1-9][0-9]*/ { a=$0; n=4; next }
n { --n; a=a "\n" $ 0; next }
/No Errors Logged/ { print a }' file
Put the following into an executable awk file:
#!/usr/bin/awk -f
BEGIN {no_errs=1}
c > 0 {a[c++]=$0}
/^----------/ {
logAnyErrors()
ata_err=""
no_errs=0
c=1
delete a
}
/^No Errors Logged/ {no_errs=1}
/^ATA Error Count:/ {ata_err=$0}
function logAnyErrors() {
if( ata_err!="" || !no_errs) {
for(i=1;i<=5;i++) print a[i]
if( ata_err!="" ) print ata_err
print "--" # separator
}
}
END { logAnyErrors() }
Your data actually has a delimiter of "^------------------"... before each PdId.
The breakdown:
Start off assuming no errors in the BEGIN block
add each record line to an array called a with a line counter c
Whenever a new record section occurs call logAnyErrors() and reset counters
In logAnyErrors(), if there are ATA or other errors, print the first 5 lines of the record and a delimiter similar to what I think grep -A4 would output.
At the end, log any errors in the final record.
When I put this into an executable file called awko and run like awko data I get the following output:
PdId: 4
Model Number: WD 1000
Drive Type: SATA
SMART Status: Enable
SMART Health Status: OK
ATA Error Count: 165 (device log contains only the most recent five errors)
----------------
It's possible that the delete a line is non-conforming for some awks. Works on my mac. It's not necessary unless you want to print out more information in each block when errors occur(since the first 5 lines will always be overwritten).
I am following a tutorial for a word processor for my QT module at uni.
It has asked me to put set this attribute:
MainWindow::setAttribute(Qt::WA_DeleteOnClose);
the problem comes when i run the application it causes an error saying that the application has closed unexpectedly.
Also it asked me to make a actionExit action and add to the file toolbar, which doesnt show, i am guessing that it is due to the fact that i am writing it on OSx and the exit/quit is taken care for you with the cmd+Q shortcut.
I was wondering if anyone could shed some light on this problem for me so that i know for future reference. if needed i can post the tutorial + source code.
Thanks
edit: backtrace from the debugger(hope this is correct)
0 __pthread_kill 0 0x7fff8eaff212
1 pthread_kill 0 0x7fff86f7eaf4
2 abort 0 0x7fff86fc2dce
3 free 0 0x7fff86f96959
4 MainWindow::~MainWindow mainwindow.cpp 22 0x100002cff
5 QObject::event 0 0x100e48906
6 QWidget::event 0 0x1000ecd5e
7 QMainWindow::event 0 0x10049cadb
8 QApplicationPrivate::notify_helper 0 0x10009593d
9 QApplication::notify 0 0x10009bdc4
10 QCoreApplication::notifyInternal 0 0x100e3417c
11 QCoreApplicationPrivate::sendPostedEvents 0 0x100e355a0
12 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ 0 0x7fff90925101
13 __CFRunLoopDoSources0 0 0x7fff90924a25
14 __CFRunLoopRun 0 0x7fff90947dc5
15 CFRunLoopRunSpecific 0 0x7fff909476b2
16 RunCurrentEventLoopInMode 0 0x7fff8d0f60a4
17 ReceiveNextEventCommon 0 0x7fff8d0f5d84
18 BlockUntilNextEventMatchingListInMode 0 0x7fff8d0f5cd3
19 _DPSNextEvent 0 0x7fff91a00613
20 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 0 0x7fff919ffed2
... <More>
Is your MainWindow object declared on the stack, by any chance? If so, then DeleteOnClose is not a good idea, simply because deleting an object that is on the stack is an error.
I am running a Qt application on embedded Linux platform. The system has 128 MB RAM, 512MB NAND, no swap. The application uses a custom library for the peripherals, the rest are all Qt and c/c++ libs. The application uses SQLITE3 as well.
After 2-3 hours, the machine starts running very slow, shell commands take 10 or so seconds to respond. Eventually the machine hangs, and finally OOM killer kills the application, and the system starts behaving at normal speed.
After some system memory observations using top command reveals that while application is running, the system free memory is decreasing, while slab keeps on increasing. These are the snaps of top given below. The application is named xyz.
At Application start :
Mem total:126164 anon:3308 map:8436 free:32456
slab:60936 buf:0 cache:27528 dirty:0 write:0
Swap total:0 free:0
PID VSZ VSZRW^ RSS (SHR) DIRTY (SHR) STACK COMMAND
776 29080 9228 8036 528 968 0 84 ./xyz -qws
781 3960 736 1976 1456 520 0 84 sshd: root#notty
786 3676 680 1208 764 416 0 88 /usr/libexec/sftp-server
770 3792 568 1948 1472 464 0 84 {sshd} sshd: root#pts/0
766 3792 568 956 688 252 0 84 /usr/sbin/sshd
388 1864 284 552 332 188 0 84 udevd --daemon
789 2832 272 688 584 84 0 84 top
774 2828 268 668 560 84 0 84 -sh
709 2896 268 556 464 80 0 84 /usr/sbin/inetd
747 2828 268 596 516 68 0 84 /sbin/getty -L ttymxc0 115200 vt100
777 2824 264 444 368 68 0 84 tee out.log
785 2824 264 484 416 68 0 84 sh -c /usr/libexec/sftp-server
1 2824 264 556 488 64 0 84 init
After some time :
Mem total:126164 anon:3312 map:8440 free:9244
slab:83976 buf:0 cache:27584 dirty:0 write:0
Swap total:0 free:0
PID VSZ VSZRW^ RSS (SHR) DIRTY (SHR) STACK COMMAND
776 29080 9228 8044 528 972 0 84 ./xyz -qws
781 3960 736 1976 1456 520 0 84 sshd: root#notty
786 3676 680 1208 764 416 0 88 /usr/libexec/sftp-server
770 3792 568 1948 1472 464 0 84 {sshd} sshd: root#pts/0
766 3792 568 956 688 252 0 84 /usr/sbin/sshd
388 1864 284 552 332 188 0 84 udevd --daemon
789 2832 272 688 584 84 0 84 top
774 2828 268 668 560 84 0 84 -sh
709 2896 268 556 464 80 0 84 /usr/sbin/inetd
747 2828 268 596 516 68 0 84 /sbin/getty -L ttymxc0 115200 vt100
777 2824 264 444 368 68 0 84 tee out.log
785 2824 264 484 416 68 0 84 sh -c /usr/libexec/sftp-server
1 2824 264 556 488 64 0 84 init
Funnily though, I can not see any major changes in the output of top involving the application itself. Eventually the application is killed, top output after that :
Mem total:126164 anon:2356 map:916 free:2368
slab:117944 buf:0 cache:1580 dirty:0 write:0
Swap total:0 free:0
PID VSZ VSZRW^ RSS (SHR) DIRTY (SHR) STACK COMMAND
781 3960 736 708 184 520 0 84 sshd: root#notty
786 3724 728 736 172 484 0 88 /usr/libexec/sftp-server
770 3792 568 648 188 460 0 84 {sshd} sshd: root#pts/0
766 3792 568 252 0 252 0 84 /usr/sbin/sshd
388 1864 284 188 0 188 0 84 udevd --daemon
819 2832 272 676 348 84 0 84 top
774 2828 268 512 324 96 0 84 -sh
709 2896 268 80 0 80 0 84 /usr/sbin/inetd
747 2828 268 68 0 68 0 84 /sbin/getty -L ttymxc0 115200 vt100
785 2824 264 68 0 68 0 84 sh -c /usr/libexec/sftp-server
1 2824 264 64 0 64 0 84 init
The dmesg shows :
sh invoked oom-killer: gfp_mask=0xd0, order=2, oomkilladj=0
[<c002d4c4>] (unwind_backtrace+0x0/0xd4) from [<c0073ac0>] (oom_kill_process+0x54/0x1b8)
[<c0073ac0>] (oom_kill_process+0x54/0x1b8) from [<c0073f14>] (__out_of_memory+0x154/0x178)
[<c0073f14>] (__out_of_memory+0x154/0x178) from [<c0073fa0>] (out_of_memory+0x68/0x9c)
[<c0073fa0>] (out_of_memory+0x68/0x9c) from [<c007649c>] (__alloc_pages_nodemask+0x3e0/0x4c8)
[<c007649c>] (__alloc_pages_nodemask+0x3e0/0x4c8) from [<c0076598>] (__get_free_pages+0x14/0x4c)
[<c0076598>] (__get_free_pages+0x14/0x4c) from [<c002f528>] (get_pgd_slow+0x14/0xdc)
[<c002f528>] (get_pgd_slow+0x14/0xdc) from [<c0043890>] (mm_init+0x84/0xc4)
[<c0043890>] (mm_init+0x84/0xc4) from [<c0097b94>] (bprm_mm_init+0x10/0x138)
[<c0097b94>] (bprm_mm_init+0x10/0x138) from [<c00980a8>] (do_execve+0xf4/0x2a8)
[<c00980a8>] (do_execve+0xf4/0x2a8) from [<c002afc4>] (sys_execve+0x38/0x5c)
[<c002afc4>] (sys_execve+0x38/0x5c) from [<c0027d20>] (ret_fast_syscall+0x0/0x2c)
Mem-info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 42, btch: 7 usd: 0
Active_anon:424 active_file:11 inactive_anon:428
inactive_file:3 unevictable:0 dirty:0 writeback:0 unstable:0
free:608 slab:29498 mapped:14 pagetables:59 bounce:0
DMA free:692kB min:268kB low:332kB high:400kB active_anon:0kB inactive_anon:0kB active_file:4kB inactive_file:0kB unevictable:0kB present:24384kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 103 103
Normal free:1740kB min:1168kB low:1460kB high:1752kB active_anon:1696kB inactive_anon:1712kB active_file:40kB inactive_file:12kB unevictable:0kB present:105664kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 3*4kB 3*8kB 5*16kB 2*32kB 4*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 692kB
Normal: 377*4kB 1*8kB 4*16kB 1*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1740kB
30 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
32768 pages of RAM
687 free pages
1306 reserved pages
29498 slab pages
59 pages shared
0 pages swap cached
Out of memory: kill process 774 (sh) score 339 or a child
Killed process 776 (xyz)
So it's obvious that there is a memory leak, it must be my app since my app is killed. But I am not doing any malloc s from the program. I have taken care as to limit the scope of variables so that they are deallocated after they are used. So I am at a complete loss as to why is slab increasing in the top output. I have tried http://valgrind.org/docs/manual/faq.html#faq.reports but didn't work.
Currently trying to use Valgrind on desktop (since I have read it only works for arm-cortex) to check my business logic.
Addittional info :
root#freescale ~/Application/app$ uname -a
Linux freescale 2.6.31-207-g7286c01 #2053 Fri Jun 22 10:29:11 IST 2012 armv5tejl GNU/Linux
Compiler : arm-none-linux-gnueabi-4.1.2 glibc2.5
cpp libs : libstdc++.so.6.0.8
Qt : 4.7.3 libs
Any pointers would be greatly appreciated...
I don't think the problem is directly in your code.
The reason is obvious: your application space does not increase (both RSS and VSW do not increase).
However, you do see the number of slabs increasing. You cannot use or increase the number of slabs from your application - it's a kernel-only thingie.
Some obvious causes of slab size increase from the top of my head:
you never really close network sockets
you read many files, but never close them
you use many ioctls
I would run strace and look at its output for a while. strace intercepts interactions with the kernel. If you have memory issues, I'd expect repeated calls to brk(). If you have other issues, you'll see repeated calls to open without close.
If you have some data structure allocation, check for the correctness of adding children and etc.. I had similar bug in my code. Also if you make big and large queries to the database it may use more ram memory. Try to find some memory leak detector to find if there is any leak.