I am currently developing a small C++ program that uses a database connection.
It is a connection with a MySQL database through CPPCONN connector.
Cause
I am using multiple threads and therefor I have created the following methods:
void Database::startThread()
{
fDriver->threadInit();
}
void Database::stopThread()
{
fDriver->threadEnd();
}
void Database::connect(const string & host, const string & user, const string & password, const string & database)
{
fDriver = sql::mysql::get_driver_instance();
fConnection.reset(fDriver->connect((SQLString)host,(SQLString)user,(SQLString)password));
fConnection->setSchema((SQLString) database);
fStatement.reset(fConnection->createStatement());
fConnection->setClientOption("multi-queries","true");
fConnection->setClientOption("multi-statements","true");
}
The problem is that I encounter a segmentation fault at the fDriver->threadInit() call.
I can assure you that fDriver is properly instantiated at that point through the connect function.
(fDriver is not null either)
The crash
Unfortunately I cannot give much more useful information but this is GDB's backtrace:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff4d66700 (LWP 16786)]
0x0000000000414547 in Database::startThread (this=Unhandled dwarf expression opcode 0xf3
#0 0x0000000000414547 in Database::startThread (this=Unhandled dwarf expression opcode 0xf3) at src/core/database.cpp:73
#1 0x0000000000405443 in Parser::Parser (this=0x7ffff4d659b8) at src/core/sv_parse.cpp:11
#2 0x000000000041e76d in MessageProcessor::MessageProcessor (this=0x7ffff4d659b0, serverStartTime=...) at src/server/messageProcessor.cpp:12
#3 0x000000000041bae8 in Server::__lambda1::operator() (__closure=0x62c740) at src/server/server.cpp:89
#4 0x00007ffff763f550 in execute_native_thread_routine () at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
#5 0x00007ffff6edb851 in start_thread () from /lib64/libpthread.so.0
#6 0x00007ffff6c2994d in clone () from /lib64/libc.so.6
Remark
Now the weird part: this crash does not occur all the time !
Sometimes it works perfectly.
But it is of course extremely annoying if it doesn't.
CPPCONN version is 1.1.3 and we are using g++ version 4.8.1.
I hope someone can shed some light on this mystery !
Giriel
I struggled for hours with the same mysterious segmentation faults.
I found that adding mutex lock around get_driver_instance() solves the problem.
Here is a basic skeleton for a threaded function. This works for selecting from database, might not work for inserting or updating.
#include <mutex>
std::mutex mtx;
void test()
{
sql::Driver *driver;
sql::Connection *con;
try {
mtx.lock();
driver = get_driver_instance();
mtx.unlock();
driver->threadInit();
con = driver->connect(HOST, USER, PASS);
...
con->close();
driver->threadEnd();
} catch(...) { ... }
}
Related
I do operations on an STL map in the following functions, all of which are protected by a mutex:-
static std::mutex track_active_lock_mtx;
typedef intrusive_ptr<WatchCtxInternal> WatchCtxInternal_h;
static std::map<WatchCtxInternal*, WatchCtxInternal_h> actives;
void* get_ptr(WatchCtxInternal_h ctx)
{
unique_lock<mutex> trackActiveLock(track_active_lock_mtx);
if(actives.find(ctx.get()) == actives.end()) {
actives.insert(make_pair(ctx.get(), ctx));
}
trackActiveLock.unlock();
return ctx.get();
}
void genericWatcher(void *watcherCtx)
{
unique_lock<mutex> trackActiveLock(track_active_lock_mtx);
auto it = actives.find((WatchCtxInternal*)watcherCtx);
if (it == actives.end()) {
return;
}
//do unrelated stuff
actives.erase(it);
}
I got a segmentation fault in the first function:-
Program terminated with signal SIGSEGV, Segmentation fault.
#0 _M_lower_bound (this=<optimized out>, __k=<optimized out>, __y=0xf31256e8, __x=0x65687465) at /volume/evo/files/opt/poky/1.8.2-4/sysroots/i586-poky-linux/usr/include/c++/4.9.2/bits/stl_tree.h:1261
1261 if (!_M_impl._M_key_compare(_S_key(__x), __k))
(gdb) bt
#0 _M_lower_bound (this=<optimized out>, __k=<optimized out>, __y=0xf31256e8, __x=0x65687465) at /volume/evo/files/opt/poky/1.8.2-4/sysroots/i586-poky-linux/usr/include/c++/4.9.2/bits/stl_tree.h:1261
#1 find (__k=<optimized out>, this=0xf6ac8e2c <actives>) at /volume/evo/files/opt/poky/1.8.2-4/sysroots/i586-poky-linux/usr/include/c++/4.9.2/bits/stl_tree.h:1913
#2 find (__x=<optimized out>, this=0xf6ac8e2c <actives>) at /volume/evo/files/opt/poky/1.8.2-4/sysroots/i586-poky-linux/usr/include/c++/4.9.2/bits/stl_map.h:860
#3 get_ptr (ctx=...)
(gdb)fr 3
(gdb) p ctx
$4 = {px = 0xf3124d30}
EDIT: I managed to get a stack trace using the Memcheck tool. What is happening is that the static map gets cleaned up as part of the process exit, but a callback to genericWatcher is occurring in the other thread before completely exiting:-
main.cpp
static void thread1(void *arg) {
//call genericWatcher repeatedly
}
int main() {
if(fork() == 0) {
pthread_create(..., thread1,..)
//call get_ptr() repeatedly
}
return 0;
}
Is there any way to prevent this? I could allocate a singleton that holds the actives map, but I try to avoid using singletons
The most likely point of failure is the erase call in your release callback because it's the only access point to your map that hasn't got any guarding mechanism. Are you sure at that point that your WatchCtx is part of the map's keys? If not, it sounds possible that the insert is already letting go.
But, like Velkan already said, valgrind (or your debugger of choice) will give you certainty.
The code below throws a segmentation fault inside the .join() of the std::thread class. However, that is happen only I use cv::fastMalloc to allocate a data array. If I use the 'new' keyword or the std::malloc function no error happens.
I need understand why this error happens because in fact I need a cv::Mat that uses this function.
int main() {
uchar* data = (uchar*) cv::fastMalloc(640);
std::atomic<bool> running(true);
std::thread thread([&] () {
while(running) {
// I'll perform some process with data here
// for now, just to illustrate, I put thread to sleep
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
});
std::this_thread::sleep_for(std::chrono::seconds(1));
running = false;
// segfault is thrown here
thread.join();
cv::fastFree(data);
return 0;
}
The GDB callstack follows below
#0 00429B26 _pthread_cleanup_dest () (??:??)
#1 003E32A0 ?? () (??:??)
Does anyone know what might be happening? I really think it is too crazy :S.
Thanks.
I solved this issue reinstalling the opencv. Apparently the problem was the different versions of compilers that I had compiled the opencv and I'm using in this example.
For the record, I had compiled the opencv some time ago with a MinGW version that not support std::thread (I think 4.7.x).
I'm communicating with a hardware device using QSerialPort. New data does not emit the "readyRead"-Signal, so I decided to write a read thread using QThread.
This is the code:
void ReadThread::run()
{
while(true){
readData();
if (buffer.size() > 0) parseData();
}
}
and
void ReadThread::readData()
{
buffer.append(device->readAll();
}
with buffer being an private QByteArray and device being a pointer to the QSerialPort. ParseData will parse the data and emit some signals. Buffer is cleared when parseData is left.
This works, however after some time (sometimes 10 seconds, sometimes 1 hour) the program crashes with SIGSEGV with the following trace:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff3498700 (LWP 24870)]
malloc_consolidate (av=av#entry=0x7fffec000020) at malloc.c:4151
(gdb) bt
#0 malloc_consolidate (av=av#entry=0x7fffec000020) at malloc.c:4151
#1 0x00007ffff62c2ee8 in _int_malloc (av=av#entry=0x7fffec000020, bytes=bytes#entry=32769) at malloc.c:3423
#2 0x00007ffff62c4661 in _int_realloc (av=av#entry=0x7fffec000020, oldp=oldp#entry=0x7fffec0013b0, oldsize=oldsize#entry=64, nb=nb#entry=32784) at malloc.c:4286
#3 0x00007ffff62c57b9 in __GI___libc_realloc (oldmem=0x7fffec0013c0, bytes=32768) at malloc.c:3029
#4 0x00007ffff70d1cdd in QByteArray::reallocData(unsigned int, QFlags<QArrayData::AllocationOption>) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#5 0x00007ffff70d1f07 in QByteArray::resize(int) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#6 0x00007ffff799f9fc in free (bytes=<optimized out>, this=0x609458)
at ../../include/QtSerialPort/5.3.2/QtSerialPort/private/../../../../../src/serialport/qt4support/include/private/qringbuffer_p.h:140
#7 read (maxLength=<optimized out>, data=<optimized out>, this=0x609458)
at ../../include/QtSerialPort/5.3.2/QtSerialPort/private/../../../../../src/serialport/qt4support/include/private/qringbuffer_p.h:326
#8 QSerialPort::readData (this=<optimized out>, data=<optimized out>, maxSize=<optimized out>) at qserialport.cpp:1341
#9 0x00007ffff722bdf0 in QIODevice::read(char*, long long) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#10 0x00007ffff722cbaf in QIODevice::readAll() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#11 0x00007ffff7bd0741 in readThread::readData (this=0x6066c0) at ../reader.cpp:212
#12 0x00007ffff7bc80d0 in readThread::run (this=0x6066c0) at ../reader.cpp:16
#13 0x00007ffff70cdd2e in ?? () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#14 0x00007ffff6e1c0a4 in start_thread (arg=0x7ffff3498700) at pthread_create.c:309
#15 0x00007ffff632f04d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
I'm not sure how to reproduce the problem correctly, since it appears randomly. If I comment out the "readData()" in my while loop, the crashes do not appear anymore (of course no data can be parsed, then).
Does anyone have a clue what this could be?
What is the buffer? Could it be, another thread is reading the data from the buffer and clears it afterwards?
Try to lock it (and all other data shared between threads) e.g. with a mutex
QMutex mx; // could be also member of the ReadThread class
void ReadThread::readData()
{
mx.lock();
buffer.append(device->readAll();
mx.unlock();
}
And do the same in the code which reads and clears the buffer from another thread (I'm not doing the assumption, that this is parseData())
Another possibility could be, parseData() calls some code running in GUI-Thread. This doesn't work in Qt4 and probably also in Qt5
You're using the instance of a QObject from multiple threads at once. This generally speaking leads to undefined behavior, as you've just seen. QSerialPort will work just fine on the GUI thread. Only once you get it to work there, you can move it to a worker thread.
Note that if the event loop (app.exec() call in main() or QThread::run()) isn't executing, the signals won't be happening. It looks as if you tried to write pseudo synchronous code and have (predictably) failed. Don't do that.
Something like this is supposed to work:
#include <QtCore>
#include <QtSerialPort>
int main(int argc, char ** argv) {
QCoreApplication app(argc, argv);
QSerialPort port;
port.setPortName(...);
port.setBaudRate(...);
... // etc
if (! port.open(QIODevice::ReadWrite)) {
qWarning() << "can't open the port";
return 1;
}
... // set the port
connect(&port, &QIODevice::readyRead, [&]{
qDebug() << "got" << port.readAll().size() << "bytes";
});
return app.exec(); // the signals will be emitted from here
}
Ensure that all serial port related objects are initialized and used only in the separate thread. Send received data or parsed events to the UI thread by using signal/slot-mechanism.
Note also that if you inherit QThread in readThread, the constructor may be executed in the UI thread and other functions in the readThread. In that case, start the readThread and run separate initialization function before other functions (for example, by sending proper signal from the UI thread).
What possible reasons do you know for the situation, described in the title? Here's what my bt looks like:
#0 0x00a40089 in ?? ()
#1 0x09e3fac0 in ?? ()
#2 0x09e34f30 in ?? ()
#3 0xb7ef9074 in ?? ()
#4 0xb7ef9200 in ?? ()
#5 0xb7ef9028 in ?? ()
#6 0x081d45a0 in LogFile::Flush ()
#7 0x081d45a0 in LogFile::Flush ()
#8 0x081d46e0 in LogFile::Close ()
#9 0x081d4dbf in LogFile::OpenLogFile ()
#10 0x081d4eb9 in LogFile::PerformPeriodicalFlush ()
#11 0x081d4fca in LogFile::StoreRecord ()
#12 0x081d50c2 in LogFile::StoreRecord ()
and it gives me Program terminated with signal 11, Segmentation fault.
The wrapper around fflush() is simple, does nothing, just calls fflash and check for errors (if the returned code is <0 ). So, I guess the seg fault is caused by fflash. Or it's possible to be somewhere else, because of the ?? at the top of the stack?
OS: RHEL5; gcc version 3.4.6 20060404 (Red Hat 3.4.6-3); debugged with gdb, with the original exe with max debug information in it.
I know about seg fault on no space on the disk, but this is not this case (as I have a watch-dog for the application, that restarts the program again and everything keeps working just fine).
Any ideas would be helpful.
Thanks.
EDIT
void LogFile::PerformPeriodicalFlush( const utils::dt::TimeStamp& tsNow )
throw( LibCException )
{
m_tsLastPeriodicalCheck = tsNow;
struct stat LogFileStat;
int nResult = stat( m_sCurrentFullFileName.c_str(), &LogFileStat );
if ( 0 == nResult && S_ISREG( LogFileStat.st_mode ) )
{
//we successfuly stated the file, so it exists. We can safely perform
//a flush.
try
{
Flush();
return;
}
catch ( LibCException& )
{
OpenLogFile( tsNow );
return;
}
}
else
{
OpenLogFile( tsNow );
}
}
void RotatingLogFile::Flush() throw( object::LibCException )
{
if ( m_pFile != NULL )
{
if ( fflush( m_pFile ) (less_than) 0 )
{
throw object::LibCException();
}
}
}
**NOTE** can't paste the whole code, it's a part of 10+ thousands of code. Also this is working for years on different applications, on real-time systems. Such crashes are very, very rare - kinda twice a year. So, I don't think this is problem in the code. I know that noone can help me with this kind of stuff, that's why I'm just asking for any ideas, why fflush may cause seg fault.
My guess: you have memory corruption somewhere and LogFile's "this" points to a memory area that you can't access.
Anyway, it's difficult to tell without code.
It appeared, that for some reasons, there was something strange with the permissions (not sure what exactly), but this had happened on a hour change, as different files are written for each hour. So, In some way, the file was created, but there were no permissions to write in it, or something like this. No one actually understood what, why and how that happened(because after the crash, the application was restarted and everything was just perfectly fine). So, flush crashed, because of no permissions to do that.
It's still mystery .. but solved xD
You don't provide the code for Flush(), but sounds strange to me that it is called twice. In fact it seems that it calls itself. This may cause some resource leak, depending on the implementation of Flush().
Run your program under valgrind, it will help you find the source of where your application's memory is corrupted.
I am struggling with calling a clutter function from an extra thread.
I use boost::thread for threading and the clutter library 1.0.
To be specific, the thread contains a looped function that emits boost::signals2::signal with parameters of x and y coordinates every once in a while.
That signal is connected to a function that hands those variables to clutter, i.e. x,y in
clutter_stage_get_actor_at_pos(CLUTTER_STAGE(actor),
CLUTTER_PICK_ALL, x, y);
And that is where i get a segfault.
Apparently clutter has some thread-handling routines. I tried calling
g_thread_init(NULL);
clutter_threads_init();
before starting clutter_main(). I also tried enclosing the clutter function in
clutter_threads_enter();
clutter_stage_get_actor_at_pos(CLUTTER_STAGE(actor),
CLUTTER_PICK_ALL, x, y);
clutter_threads_leave();
but that does also not do the trick..
Every hint is appreciated, thank you in advance!
Addendum
I just forged a minimal sample of what I am trying to do. I already 'protected' the clutter_main() routine as suggested. Some functions of clutter seem to work (e.g setting stage color or setting actor position) from the seperate thread. Is there still something wrong with my code?
#include <clutter/clutter.h>
#include <boost/thread.hpp>
ClutterActor *stage;
ClutterActor* rect = NULL;
void receive_loop()
{
while(1)
{
sleep(1);
clutter_threads_enter();
ClutterActor* clicked = clutter_stage_get_actor_at_pos(CLUTTER_STAGE(stage), CLUTTER_PICK_ALL,300, 500);
clutter_threads_leave();
}
}
int main(int argc, char *argv[])
{
clutter_init(&argc, &argv);
g_thread_init(NULL);
clutter_threads_init();
stage = clutter_stage_get_default();
clutter_actor_set_size(stage, 800, 600);
rect = clutter_rectangle_new();
clutter_actor_set_size(rect, 256, 128);
clutter_actor_set_position(rect, 300, 500);
clutter_group_add (CLUTTER_GROUP (stage), rect);
clutter_actor_show(stage);
boost::thread thread = boost::thread(&receive_loop);
clutter_threads_enter();
clutter_main();
clutter_threads_leave();
return 0;
}
Well, I think I found the answer..
Clutter Docs Gerneral
It says in section "threading model":
The only safe and portable way to use the Clutter API in a multi-threaded environment is to never access the API from a thread that did not call clutter_init() and clutter_main().
The common pattern for using threads with Clutter is to use worker threads to perform blocking operations and then install idle or timeour sources with the result when the thread finished.
Clutter provides thread-aware variants of g_idle_add() and g_timeout_add() that acquire the Clutter lock before invoking the provided callback: clutter_threads_add_idle() and clutter_threads_add_timeout().
So my correction to the minimal sample code would be to alter the receive_loop() to
void receive_loop()
{
while(1)
{
sleep(1);
int pos[2];
pos[0] = 400;
pos[1] = 200;
clutter_threads_add_idle_full (G_PRIORITY_HIGH_IDLE,
get_actor,
&pos,
NULL);
}
}
and to add the get_actor function (as in the example code on the menitioned doc page)
static gboolean
get_actor (gpointer data)
{
int* pos = (int*) data;
ClutterActor* clicked = clutter_stage_get_actor_at_pos(CLUTTER_STAGE(stage), CLUTTER_PICK_ALL, pos[0], pos[1]);
return FALSE;
}
clutter_threads_add_idle_full takes care of thread lock etc..
I struggled with a very similar situation in the Python bindings for clutter. I was never able to make the Clutter thread support work the way I wanted.
What finally did the trick was using an idle proc (gobject.idle_add in python) to push the work I needed done into the main clutter thread. That way I have only 1 thread making clutter calls and everything is fine.
I played with your code and it seems you are doing everything ok, though I'm no expert in Clutter. I also ran your program under gdb and some interesting things showed up:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb799db70 (LWP 3023)]
0x002d97c6 in glDisable () from /usr/lib/nvidia-current/libGL.so.1
(gdb) thread apply all bt
Thread 2 (Thread 0xb799db70 (LWP 3023)):
#0 0x002d97c6 in glDisable () from /usr/lib/nvidia-current/libGL.so.1
#1 0x001b3ec3 in cogl_disable_fog () from /usr/lib/libclutter-glx-1.0.so.0
#2 0x0018b00a in ?? () from /usr/lib/libclutter-glx-1.0.so.0
#3 0x0019dc82 in clutter_stage_get_actor_at_pos () from /usr/lib/libclutter-glx-1.0.so.0
#4 0x080498de in receive_loop () at seg.cpp:19
Apparently the crash happened on glDisable () from /usr/lib/nvidia-current/libGL.so.1. Notice that I use NVIDIA's OpenGL driver on my GeForce 8600 GT.
Can you confirm that your application also crashes on computers with other video cards (not NVIDIA)? I doubt the crash is due to a bug on NVIDIA's OpenGL implementation.
For me it seems that *clutter_threads_enter/leave()* is not protecting *clutter_stage_get_actor_at_pos()* since I tested *receive_loop()* being called as a callback:
g_signal_connect(stage, "button-press-event", G_CALLBACK(receive_loop), NULL);
so we know that your code seems to be ok.
I encourage you to send your question to Clutter discussion and help mailing list: clutter-app-devel-list
a mailing list for application developers using Clutter, its integration libraries or toolkits based on Clutter.
You can either use clutter_threads_add_idle to update ClutterActor or you need to fix the clutter_threads_enter/leave to switch OpenGL context as well so that you can use it inside a thread.
The crash
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb799db70 (LWP 3023)]
0x002d97c6 in glDisable () from /usr/lib/nvidia-current/libGL.so.1
(gdb) thread apply all bt
Thread 2 (Thread 0xb799db70 (LWP 3023)):
#0 0x002d97c6 in glDisable () from /usr/lib/nvidia-current/libGL.so.1
#1 0x001b3ec3 in cogl_disable_fog () from /usr/lib/libclutter-glx-1.0.so.0
#2 0x0018b00a in ?? () from /usr/lib/libclutter-glx-1.0.so.0
#3 0x0019dc82 in clutter_stage_get_actor_at_pos () from /usr/lib/libclutter-glx-1.0.so.0
#4 0x080498de in receive_loop () at seg.cpp:19
is because the calling thread didn't acquire OpenGL context so it crashed.