What are the possible causes of "BUG: scheduling while atomic?"

What are the possible causes of "BUG: scheduling while atomic?" - c++

There is another process continuously creating files that need processing by this code.
This code constantly scans the file-system for new files that need processing by comparing the contents of the file-system against a sqlite database that contains the processing results - one record for each file. This process is running at nice -n 19 so as not to interfere with the creation of new files by the other process.
It all works perfectly for a large number (>1k) of files, but then blows up with BUG: scheduling while atomic.
According to this
"Scheduling while atomic" indicates that you've tried to sleep
somewhere that you shouldn't
But the only sleep in the code is like this
void doFiles(void) {
for (...) { // for each file in the file-system
... // check database - do processing if needed
}
sleep(1);
}
int main(int argc, char *argv[], char *envp[]) {
while (true) doFiles();
return -1;
}
The code will hit this sleep after it has checked every file in the file-system against the database. The process needs to be repeated since new files will be added from time to time. There is no multi-threading in this code. Are there other possible causes for "BUG: scheduling while atomic" besides a misplaced sleep?
Edit: additional error output:
note: mirlin[1083] exited with preempt_count 1
BUG: scheduling while atomic: mirlin/1083/0x40000002
Modules linked in: g_cdc_ms musb_hdrc nop_usb_xceiv irqk edmak dm365mmap cmemk
Backtrace:
[<c002a5a0>] (dump_backtrace+0x0/0x110) from [<c028e56c>] (dump_stack+0x18/0x1c)
r6:c1099460 r5:c04ea000 r4:00000000 r3:20000013
[<c028e554>] (dump_stack+0x0/0x1c) from [<c00337b8>] (__schedule_bug+0x58/0x64)
[<c0033760>] (__schedule_bug+0x0/0x64) from [<c028e864>] (schedule+0x84/0x378)
r4:c10992c0 r3:00000000
[<c028e7e0>] (schedule+0x0/0x378) from [<c0033a80>] (__cond_resched+0x28/0x38)
[<c0033a58>] (__cond_resched+0x0/0x38) from [<c028ec6c>] (_cond_resched+0x34/0x44)
r4:00013000 r3:00000001
[<c028ec38>] (_cond_resched+0x0/0x44) from [<c0082f64>] (unmap_vmas+0x570/0x620)
[<c00829f4>] (unmap_vmas+0x0/0x620) from [<c0085c10>] (exit_mmap+0xc0/0x1ec)
[<c0085b50>] (exit_mmap+0x0/0x1ec) from [<c0037610>] (mmput+0x40/0xfc)
r9:00000001 r8:80000005 r6:c04ea000 r5:00000000 r4:c0427300
[<c00375d0>] (mmput+0x0/0xfc) from [<c003b5e4>] (exit_mm+0x150/0x158)
r5:c10992c0 r4:c0427300
[<c003b494>] (exit_mm+0x0/0x158) from [<c003cd44>] (do_exit+0x198/0x67c)
r7:c03120d1 r6:c10992c0 r5:0000000b r4:c10992c0
...

As others have said, you can sleep() anytime you want to in user code.
Looks like a problem with a driver on your platform. The driver may not actually call sleep() or schedule(), but often it will make a call of an kernel function which will, in turn, call one of these.
This also looks like it is using memory mapped file I/O on an embedded TI ARM processor.

This error was caused by a bad build.
A clean build by itself did not help.
A fresh checkout and build was required to resolve this issue.

Related

Deleted file still reported as existing (Windows only)

(Note that this is not primarily a Qt question)
It seems to me that the return value of QFile::exists() is sometimes incorrect.
Consider the following two unit-test-like snippets (each of which I have executed a few thousand times in a loop)
// create file
QFile file("test.tmp");
QVERIFY(file.open(QIODevice::WriteOnly));
QVERIFY(file.write("some data") != -1);
file.close();
// delete file
QVERIFY(file.remove());
// assert file is gone
QVERIFY(!file.exists()); // <-- 5..10 % chance of failure
and
// create file
QFile file("test.tmp");
QVERIFY(file.open(QIODevice::WriteOnly));
QVERIFY(file.write("some data") != -1);
file.close();
// delete file
QVERIFY(file.remove());
// retry until file is gone (or until timeout)
for (auto i = 0; i < 10; i++)
{
if (!file.exists()) // <-- note that only the check is retried, not the actual delete
return;
QThread::yieldCurrentThread();
}
QFAIL("file is still reported as existing"); // <-- never reached in my tests
The first unit test fails about 8 out of 100 times. Always on the last line of code (indicating that the file still exists). The second unit test never fails.
This behavior was observed on a Windows 10 system using NTFS (with Qt 5.2.1). It could not be reproduced using ubuntu 16.04 LTS using ext4 on a VM (with Qt 5.8.0)
Not sure if this helps:
Process Monitor (when it succeeds)
Process Monitor (when it fails)
So my questions are:
what is happening?
what are implications that I might be interested in?
update:
For clarification: I am hoping for an answer like "this is caused by the NTFS feature 'bills-fancy-caching-magic'". From there I would like to find out, whether Qt does look over this feature intentionally.

According to the Windows API documentation, it is defined behaviour:
The DeleteFile function marks a file for deletion on close. Therefore, the file deletion does not occur until the last handle to the file is closed. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED.
It seems to be a property of the Windows kernel and therefore not to be limited to NTFS.
The behaviour seems to be unpredictable, as other services (think virus scanners) might open the file in question.

SQLite C++ 'database is locked' when multiple processes access db in readonly mode

I have an sqlite database that doesn't change.
Multiple processes that open a database connection each in SQLITE_OPEN_READONLY mode using sqlite3_open_v2. Each process is single threaded
The connections are made from an MSVC project using the official C/C++ Interface's single amalgamated C source file.
According to the SQLite FAQ multiple processes running SELECTs is fine
Each process after opening the database creates 4 prepared SELECT statements each with 2 bindable values.
Over the course of the execution the statements (one at a time) have the following called on them repeatedly as required
sqlite3_bind_int
sqlite3_bind_int
sqlite3_step (while SQLITE_ROW is returned)
sqlite3_column_int (while there was a row)
sqlite3_reset
The prepared statements are reused so finalize isn't called on each of them until near the end of the program. Finally the database is closed at the very end of execution.
The problem is any of these operations can fail with error code = 5: 'database is locked'
Error code 5 is SQLITE_BUSY and the website states that
"indicates a conflict with a separate database connection, probably in a separate process"
The rest of the internet seems to agree that multiple READONLY connections is fine. I've gone over and over the source and can't see that anything is wrong (I can't post it here sadly, I know, not helpful)
So I'm turning it to you guys, what could I possibly be missing?
EDIT 1:
Database is on a local drive, File system is NTFS, OS is Windows 7.
EDIT 2:
Wrapping all sqlite3 calls in infinite loops that check if SQLITE_BUSY was returned and then remake the call alleviates the problem. I don't consider this a fix but if that truly is the right thing to do then I'll do that.

So the working answer I have used is to wrap all the calls to sqlite in functions that loop that function while SQLITE_BUSY is returned. There doesn't seem to be a simple alternative.
const int bindInt(sqlite3_stmt* stmt, int parameterIndex, int value)
{
int ret;
do
ret = sqlite3_bind_int(stmt, parameterIndex, value);
while (ret == SQLITE_BUSY)
return ret;
}

QSettings - Sync issue between two process

I am using Qsettings for non gui products to store its settings into xml files. This is written as a library which gets used in C, C++ programs. There will be 1 xml file file for each product. Each product might have more than one sub products and they are written into xml by subproduct grouping as follows -
File: "product1.xml"
<product1>
<subproduct1>
<settings1>..</settings1>
....
<settingsn>..</settingsn>
</subproduct1>
...
<subproductn>
<settings1>..</settings1>
....
<settingsn>..</settingsn>
</subproductn>
</product1>
File: productn.xml
<productn>
<subproduct1>
<settings1>..</settings1>
....
<settingsn>..</settingsn>
</subproduct1>
...
<subproductn>
<settings1>..</settings1>
....
<settingsn>..</settingsn>
</subproductn>
</productn>
The code in one process does the following -
settings = new QSettings("product1.xml", XmlFormat);
settings.setValue("settings1",<value>)
sleep(20);
settings.setValue("settings2", <value2>)
settings.sync();
When the first process goes to sleep, I start another process which does the following -
settings = new QSettings("product1.xml", XmlFormat);
settings.remove("settings1")
settings.setValue("settings3", <value3>)
settings.sync();
I would expect the settings1 to go away from product1.xml file but it still persist in the file - product1.xml at the end of above two process. I am not using QCoreApplication(..) in my settings library. Please point issues if there is anything wrong in the above design.

This is kind of an odd thing that you're doing, but one thing to note is that the sync() call is what actually writes the file to disk. In this case if you want your second process to actually see the changes you've made, then you'll need to call sync() before your second process accesses the file in order to guarantee that it will actually see your modifications. Thus I would try putting a settings.sync() call right before your sleep(20)

Maybe you have to do delete settings; after the sync() to make sure it is not open, then do the writing in the other process?

Does this compile? What implementation of XmlFormat are you using and which OS? There must be some special code in your project for storing / reading to and from Xml - there must be something in this code which works differently from what you expect.

Gdb process record/replay execution log

Could somebody tell me where would the execution log be stored when using the process record/replay feature in gdb?
Thanks
Raj
Update
#include <stdio.h>
int main (int argc, char const *argv[])
{
printf("Hello World\n");
printf("How are you?\n");
char *c = NULL;
printf("%c\n", *c);
return 0;
}
The code above seg faults when I dereference c. I want to use this example to figure out how I can use reverse-next/reverse-continue to go back after a segfault. I am able to do reverse-next and reach the first printf statement at which I put a break point when recording the execution. After this, when I try the "next" command in gdb, I see that the cursor moves through the printf statements but I don't see any output printed on the terminal. In summary, I want to know if the record/replay feature can be used to go through the execution history even after a segfault?

I thought you had to manually specify that with
record save filename
The default filename is gdb_record.process_id, where process_id is the process ID of the debugged process. That means, if you don't specify it, look in the CWD of the debugger
Update
With respect to your extra question on insn-number-max:
info record
Show various statistics about the state of process record and its in-memory
execution log buffer, including:
Whether in record mode or replay mode.
Lowest recorded instruction number (counting from when the current execution log started recording instructions).
Highest recorded instruction number.
Current instruction about to be replayed (if in replay mode).
Number of instructions contained in the execution log.
Maximum number of instructions that may be contained in the execution
log.
I'm not to sure but this might indicate that the whole is kept in memory after all. Of course, a 64bit system and plenty of swap (and ulimit unlimited) will make this a 'virtual' limitation

Memory leak checking using Instruments on Mac

I've just been pulling my hair out trying to make Instruments cough up my deliberately constructed memory leaks. My test example looks like this:
class Leaker
{
public:
char *_array;
Leaker()
{
_array=new char[1000];
}
~Leaker()
{
}
};
void *leaker()
{
void *p=malloc(1000);
int *pa=new int[2000];
{
Leaker l;
Leaker *pl=new Leaker();
}
return p;
}
int main (int argc, char **argv)
{
for (int i=0; i<1000; ++i) {
leaker();
}
sleep(2); // Needed to give Instruments a chance to poll memory
return 0;
}
Basically Instruments never found the obvious leaks. I was going nuts as to why, but then discovered "sec Between Auto Detections" in the "Leaks Configuration" panel under the Leaks panel. I dialed it back as low as it would go, which was 1 second, and placed the sleep(2) in in my code, and voila; leaks found!
As far as I'm concerned, a leak is a leak, regardless of whether it happens 30 minutes into an app or 30 milliseconds. In my case, I stripped the test case back to the above code, but my real application is a command-line application with no UI or anything and it runs very quickly; certainly less than the default 10 second sample interval.
Ok, so I can live with a couple of seconds upon exit of my app in instrumentation mode, but what I REALLY want, is to simply have Instruments snapshot memory on exit, then do whatever it needs over time while the app is running.
So... the question is: Is there a way to make Instruments snapshot memory on exit of an application, regardless of the sampling interval?
Cheers,
Shane

Instruments, in Leaks mode can be really powerful for leak tracing, but I've found that it's more biased towards event-based GUI apps than command line programs (particularly those which exit after a short time). There used to be a CHUD API where you could programmatically control aspects of the instrumentation, but last time I tried it the frameworks were no longer provided as part of the SDK. Perhaps some of this is now replaced with Dtrace.
Also, ensure you're up to date with Xcode as there were some recent improvements in this area which might make it easier to do what you need. You could also keep the short delay before exit but make it conditional on the presence of an environment variable, and then set that environment variable in the Instruments launch properties for your app, so that running outside Instruments doesn't have the delay.

Most unit testing code executes the desired code paths and exits. Although this is perfectly normal for unit testing, it creates a problem for the leaks tool, which needs time to analyze the process memory space. To fix this problem, you should make sure your unit-testing code does not exit immediately upon completing its tests. You can do this by putting the process to sleep indefinitely instead of exiting normally.
https://developer.apple.com/library/ios/documentation/Performance/Conceptual/ManagingMemory/Articles/FindingLeaks.html

I've just decided to leave the 2 second delay during my debug+leaking build.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

What are the possible causes of "BUG: scheduling while atomic?" - c++

This error was caused by a bad build. A clean build by itself did not help. A fresh checkout and build was required to resolve this issue.

Related

Deleted file still reported as existing (Windows only)

SQLite C++ 'database is locked' when multiple processes access db in readonly mode

QSettings - Sync issue between two process

Gdb process record/replay execution log

Memory leak checking using Instruments on Mac

Categories

Resources