My program crashes on fflush because of seg fault, ... but not always? - c++

What possible reasons do you know for the situation, described in the title? Here's what my bt looks like:
#0 0x00a40089 in ?? ()
#1 0x09e3fac0 in ?? ()
#2 0x09e34f30 in ?? ()
#3 0xb7ef9074 in ?? ()
#4 0xb7ef9200 in ?? ()
#5 0xb7ef9028 in ?? ()
#6 0x081d45a0 in LogFile::Flush ()
#7 0x081d45a0 in LogFile::Flush ()
#8 0x081d46e0 in LogFile::Close ()
#9 0x081d4dbf in LogFile::OpenLogFile ()
#10 0x081d4eb9 in LogFile::PerformPeriodicalFlush ()
#11 0x081d4fca in LogFile::StoreRecord ()
#12 0x081d50c2 in LogFile::StoreRecord ()
and it gives me Program terminated with signal 11, Segmentation fault.
The wrapper around fflush() is simple, does nothing, just calls fflash and check for errors (if the returned code is <0 ). So, I guess the seg fault is caused by fflash. Or it's possible to be somewhere else, because of the ?? at the top of the stack?
OS: RHEL5; gcc version 3.4.6 20060404 (Red Hat 3.4.6-3); debugged with gdb, with the original exe with max debug information in it.
I know about seg fault on no space on the disk, but this is not this case (as I have a watch-dog for the application, that restarts the program again and everything keeps working just fine).
Any ideas would be helpful.
Thanks.
EDIT
void LogFile::PerformPeriodicalFlush( const utils::dt::TimeStamp& tsNow )
throw( LibCException )
{
m_tsLastPeriodicalCheck = tsNow;
struct stat LogFileStat;
int nResult = stat( m_sCurrentFullFileName.c_str(), &LogFileStat );
if ( 0 == nResult && S_ISREG( LogFileStat.st_mode ) )
{
//we successfuly stated the file, so it exists. We can safely perform
//a flush.
try
{
Flush();
return;
}
catch ( LibCException& )
{
OpenLogFile( tsNow );
return;
}
}
else
{
OpenLogFile( tsNow );
}
}
void RotatingLogFile::Flush() throw( object::LibCException )
{
if ( m_pFile != NULL )
{
if ( fflush( m_pFile ) (less_than) 0 )
{
throw object::LibCException();
}
}
}
**NOTE** can't paste the whole code, it's a part of 10+ thousands of code. Also this is working for years on different applications, on real-time systems. Such crashes are very, very rare - kinda twice a year. So, I don't think this is problem in the code. I know that noone can help me with this kind of stuff, that's why I'm just asking for any ideas, why fflush may cause seg fault.

My guess: you have memory corruption somewhere and LogFile's "this" points to a memory area that you can't access.
Anyway, it's difficult to tell without code.

It appeared, that for some reasons, there was something strange with the permissions (not sure what exactly), but this had happened on a hour change, as different files are written for each hour. So, In some way, the file was created, but there were no permissions to write in it, or something like this. No one actually understood what, why and how that happened(because after the crash, the application was restarted and everything was just perfectly fine). So, flush crashed, because of no permissions to do that.
It's still mystery .. but solved xD

You don't provide the code for Flush(), but sounds strange to me that it is called twice. In fact it seems that it calls itself. This may cause some resource leak, depending on the implementation of Flush().

Run your program under valgrind, it will help you find the source of where your application's memory is corrupted.

Related

C++: Custom formatting for exceptions uncaught by main()

I am creating a library that contains functions that can throw exceptions. To debug programs that use my library, I would like to provide a custom format-method that will give the programmer more information about these exceptions if they are uncaught by main().
Generally, my library can be called from a main function() written by an end user. The end user does not put a try..catch block in main() because the end user does not expect these exceptions (they should actually be avoided and/or caught by other, buggy libraries, between my library and main(), but they're not, and that's what we need to debug).
// The following example would actually be multiple files,
// but to keep this example simple, put it in "<filename>"
// and compile the following with "g++ <filename>".
// library file
class My_Exception
{
public:
char const* msg;
My_Exception(char const* msg) : msg(msg) {}
};
void Library_Function(bool rarely_true = false)
{
if (rarely_true)
throw My_Exception("some exceptional thing");
}
// note to discerning user: if you use the "rarely_true" feature,
// be sure to remember to catch "My_Exception"!!!!
// intermediate, buggy, library (written by someone else)
void Meta_Function()
{
Library_Function(true); // hahahaha not my problem!
}
// main program (written by yet someone else, no "try..except"
// allowed here)
int main()
{
Meta_Function();
}
When I run the above program, I get:
terminate called after throwing an instance of 'My_Exception'
Abort (core dumped)
I like the way there is an error message telling me about the uncaught exception. I would like to know the best way to add a hook to My_Exception so that the msg string would also be printed in this situation.
I am willing to register callbacks with the runtime system, or add methods to My_Exception, but I don't want to mess with main() itself. (I know this problem could be solved by telling the linker to use a different entry point having a try..catch, and wrapping main() in that, but it will be hard to get the end-user to adopt something like that).
Clearly there is already some exception-checking code after main(), as the above message was printed. The stack trace is:
#0 0x0000155554c0d428 in __GI_raise (sig=sig#entry=6)
at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x0000155554c0f02a in __GI_abort () at abort.c:89
#2 0x000015555502e8f7 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x0000155555034a46 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x0000155555034a81 in std::terminate() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x0000155555034cb4 in __cxa_throw ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00000000004006eb in Library_Function() ()
#7 0x00000000004006f4 in main ()
(gdb)
Aside: I don't at all understand why gdb says the program is aborting in Library_Function. That sounds wrong; it should at least have exited from main() after main() failed to catch the exception. Must be some language detail, like it preserves the stack until the exception is handled? In any case, I digress.
Maybe we can extend std::terminate() or cxa__throw() or some other runtime component to print msg in this case?
How this question is different
How come I don't can't print out error from my throw exception? 2 answers -- similar, but 1. my question involves an exception object (not a string) and therefore the point about custom formatting (in the question title) is relevant. 2. missing keyword "uncaught" from the title, so hard to find
Custom error message of re-thrown exception not printed by what() 1 answer
-- 1. already contains an answer to my question in their question, so cannot be the same question. Unless you consider "what tool pounds a nail" to be the same question as "why isn't my hammer working". 2. missing keyword "uncaught" from the title
looser throw specifier for ‘virtual const char ro_err::StdErr::what() const’ 1 answer*
-- 1. already contains an answer to my question in their question, so cannot be the same question. Unless you consider "what tool pounds a nail" to be the same question as "why isn't my hammer working". 2. missing keyword "uncaught" from the title
As suggested by πάντα ῥεῖ, you can try this
class myexception : public exception
{
public:
const char* what() const noexcept override
{
char* ch = "some exceptional thing";
return ch;
}
};
void Library_Function(bool rarely_true = false)
{
if (rarely_true)
throw myexception();
}
int main()
{
try
{
Library_Function(true);
}
catch (myexception& err)
{
std::cout << err.what();
}
return 0;
}

OpenCV allocation causes segfault in std::thread::join

The code below throws a segmentation fault inside the .join() of the std::thread class. However, that is happen only I use cv::fastMalloc to allocate a data array. If I use the 'new' keyword or the std::malloc function no error happens.
I need understand why this error happens because in fact I need a cv::Mat that uses this function.
int main() {
uchar* data = (uchar*) cv::fastMalloc(640);
std::atomic<bool> running(true);
std::thread thread([&] () {
while(running) {
// I'll perform some process with data here
// for now, just to illustrate, I put thread to sleep
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
});
std::this_thread::sleep_for(std::chrono::seconds(1));
running = false;
// segfault is thrown here
thread.join();
cv::fastFree(data);
return 0;
}
The GDB callstack follows below
#0 00429B26 _pthread_cleanup_dest () (??:??)
#1 003E32A0 ?? () (??:??)
Does anyone know what might be happening? I really think it is too crazy :S.
Thanks.
I solved this issue reinstalling the opencv. Apparently the problem was the different versions of compilers that I had compiled the opencv and I'm using in this example.
For the record, I had compiled the opencv some time ago with a MinGW version that not support std::thread (I think 4.7.x).

MySQL cppconn threads segmentation fault

I am currently developing a small C++ program that uses a database connection.
It is a connection with a MySQL database through CPPCONN connector.
Cause
I am using multiple threads and therefor I have created the following methods:
void Database::startThread()
{
fDriver->threadInit();
}
void Database::stopThread()
{
fDriver->threadEnd();
}
void Database::connect(const string & host, const string & user, const string & password, const string & database)
{
fDriver = sql::mysql::get_driver_instance();
fConnection.reset(fDriver->connect((SQLString)host,(SQLString)user,(SQLString)password));
fConnection->setSchema((SQLString) database);
fStatement.reset(fConnection->createStatement());
fConnection->setClientOption("multi-queries","true");
fConnection->setClientOption("multi-statements","true");
}
The problem is that I encounter a segmentation fault at the fDriver->threadInit() call.
I can assure you that fDriver is properly instantiated at that point through the connect function.
(fDriver is not null either)
The crash
Unfortunately I cannot give much more useful information but this is GDB's backtrace:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff4d66700 (LWP 16786)]
0x0000000000414547 in Database::startThread (this=Unhandled dwarf expression opcode 0xf3
#0 0x0000000000414547 in Database::startThread (this=Unhandled dwarf expression opcode 0xf3) at src/core/database.cpp:73
#1 0x0000000000405443 in Parser::Parser (this=0x7ffff4d659b8) at src/core/sv_parse.cpp:11
#2 0x000000000041e76d in MessageProcessor::MessageProcessor (this=0x7ffff4d659b0, serverStartTime=...) at src/server/messageProcessor.cpp:12
#3 0x000000000041bae8 in Server::__lambda1::operator() (__closure=0x62c740) at src/server/server.cpp:89
#4 0x00007ffff763f550 in execute_native_thread_routine () at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
#5 0x00007ffff6edb851 in start_thread () from /lib64/libpthread.so.0
#6 0x00007ffff6c2994d in clone () from /lib64/libc.so.6
Remark
Now the weird part: this crash does not occur all the time !
Sometimes it works perfectly.
But it is of course extremely annoying if it doesn't.
CPPCONN version is 1.1.3 and we are using g++ version 4.8.1.
I hope someone can shed some light on this mystery !
Giriel
I struggled for hours with the same mysterious segmentation faults.
I found that adding mutex lock around get_driver_instance() solves the problem.
Here is a basic skeleton for a threaded function. This works for selecting from database, might not work for inserting or updating.
#include <mutex>
std::mutex mtx;
void test()
{
sql::Driver *driver;
sql::Connection *con;
try {
mtx.lock();
driver = get_driver_instance();
mtx.unlock();
driver->threadInit();
con = driver->connect(HOST, USER, PASS);
...
con->close();
driver->threadEnd();
} catch(...) { ... }
}

Segmentation fault in std function std::_Rb_tree_rebalance_for_erase ()

(Note to any future readers: The error, unsurprisingly, is in my code and not std::_Rb_tree_rebalance_for_erase () )
I'm somewhat new to programming and am unsure how to deal with a segmentation fault that appears to be coming from a std function. I hope I'm doing something stupid (i.e., misusing a container), because I have no idea how to fix it.
The precise error is
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x000000000000000c
0x00007fff8062b144 in std::_Rb_tree_rebalance_for_erase ()
(gdb) backtrace
#0 0x00007fff8062b144 in std::_Rb_tree_rebalance_for_erase ()
#1 0x000000010000e593 in Simulation::runEpidSim (this=0x7fff5fbfcb20) at stl_tree.h:1263
#2 0x0000000100016078 in main () at main.cpp:43
The function that exits successfully just before the segmentation fault updates the contents of two containers. One is a boost::unordered_multimap called carriage; it contains one or more struct Infection objects. The other container is of type std::multiset< Event, std::less< Event > > EventPQ called ce.
void Host::recover( int s, double recoverTime, EventPQ & ce ) {
// Clearing all serotypes in carriage
// and their associated recovery events in ce
// and then updating susceptibility to each serotype
double oldRecTime;
int z;
for ( InfectionMap::iterator itr = carriage.begin(); itr != carriage.end(); itr++ ) {
z = itr->first;
oldRecTime = (itr->second).recT;
EventPQ::iterator epqItr = ce.find( Event(oldRecTime) );
assert( epqItr != ce.end() );
ce.erase( epqItr );
immune[ z ]++;
}
carriage.clear();
calcSusc(); // a function that edits an array
cout << "Done with sync_recovery event." << endl;
}
The last cout << line appears immediately before the seg fault.
My idea so far is that the rebalancing is being attempted on ce immediately after this function, but I am unsure why the rebalancing would be failing.
Update
I've confirmed the seg fault goes away (though the program then immediately crashes for other reasons) when I remove ce.erase( epqItr );. I am able to remove events successfully in another place in the code; the code I use there to erase items in ce is identical to what's here.
Backtracing without optimization (thanks, bdk) reveals much more information:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x000000000000000c
0x00007fff8062b144 in std::_Rb_tree_rebalance_for_erase ()
(gdb) backtrace
#0 0x00007fff8062b144 in std::_Rb_tree_rebalance_for_erase ()
#1 0x00000001000053d2 in std::_Rb_tree, std::less, > std::allocator >::erase (this=0x7fff5fbfdfe8, __position={_M_node = 0x10107cb50}) at > stl_tree.h:1263
#2 0x0000000100005417 in std::multiset, std::allocator >::erase (this=0x7fff5fbfdfe8, __position={_M_node = 0x10107cb50}) at stl_multiset.h:346
#3 0x000000010000ba71 in Simulation::runEpidSim (this=0x7fff5fbfcb40) at Simulation.cpp:426
#4 0x000000010001fb31 in main () at main.cpp:43
Unless Xcode is reading line numbers wrong, the only stl_tree.h in my hard drive is blank on line 1263.
A few people asked to see the function that calls recover. It's a bit complicated:
struct updateRecovery{
updateRecovery( int s, double t, EventPQ & ce ) : s_(s), t_(t), ce_(ce) {}
void operator() (boost::shared_ptr<Host> ptr ) {
ptr->recover( s_, t_, ce_ );
}
private:
int s_;
double t_;
EventPQ & ce_;
};
// allHosts is a boost::multiindex container of boost::shared_ptr< Host >
// currentEvents is the EventPQ container
// it is an iterator to a specific member of allHosts
allHosts.modify( it, updateRecovery( s, t, currentEvents ) );
cout << "done with recovery" << endl;
The last cout prints. The code worked before without this particular version of the recovery function.
Noah Roberts correctly pointed out that the problem is at Simulation.cpp, line 426. Jump below for embarrassing solution.
Possibly you're holding onto an iterator into ce across the call to recover. If recover happens to remove that item the iterator will be invalidated and any future use (say an attempt to erase it) could result in a seg fault.
It would help if we could see more context of how ce is used before and after the call to recover.
The problem was that on line 426 of Simulation.cpp, I tried to delete an event in the EventPQ currentEvents (a.k.a. ce) container that my recover() function had just deleted. The iterator had obviously been invalidated. Dumb.
Lessons:
Debug on code that has not been optimized
Pay close attention to what the non-std related frames imply
And for the future: Trace memory in valgrind
I'm still stumped why the debugger referred me to an apparently blank line in stl_tree.h.
I've massive appreciation here for the people who have helped me work through this. I'm going to revise my question so it's more concise for any future readers.
Perhaps the call to assert is not compiled with your configuration. Assertions in production code are usually a Bad Idea[TM].
You could also be exceeding immune's boundaries.
Try:
if (epqItr != ce.end())
{
ce.erase(epqItr);
if (z is within immune's bounds)
{
++immune[z];
}
}

What could cause a dynamic_cast to crash?

I have a piece of code looking like this :
TAxis *axis = 0;
if (dynamic_cast<MonitorObjectH1C*>(obj))
axis = (dynamic_cast<MonitorObjectH1C*>(obj))->GetXaxis();
Sometimes it crashes :
Thread 1 (Thread -1208658240 (LWP 11400)):
#0 0x0019e7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x048c67fb in __waitpid_nocancel () from /lib/tls/libc.so.6
#2 0x04870649 in do_system () from /lib/tls/libc.so.6
#3 0x048709c1 in system () from /lib/tls/libc.so.6
#4 0x001848bd in system () from /lib/tls/libpthread.so.0
#5 0x0117a5bb in TUnixSystem::Exec () from /opt/root/lib/libCore.so.5.21
#6 0x01180045 in TUnixSystem::StackTrace () from /opt/root/lib/libCore.so.5.21
#7 0x0117cc8a in TUnixSystem::DispatchSignals ()
from /opt/root/lib/libCore.so.5.21
#8 0x0117cd18 in SigHandler () from /opt/root/lib/libCore.so.5.21
#9 0x0117bf5d in sighandler () from /opt/root/lib/libCore.so.5.21
#10 <signal handler called>
#11 0x0533ddf4 in __dynamic_cast () from /usr/lib/libstdc++.so.6
I have no clue why it crashes. obj is not null (and if it was it would not be a problem, would it ?).
What could be the reason for a dynamic cast to crash ?
If it can't cast, it should just return NULL no ?
Some possible reasons for the crash:
obj points to an object with a non-polymorphic type (a class or struct with no virtual methods, or a fundamental type).
obj points to an object that has been freed.
obj points to unmapped memory, or memory that has been mapped in such a way as to generate an exception when accessed (such as a guard page or inaccessible page).
obj points to an object with a polymorphic type, but that type was defined in an external library that was compiled with RTTI disabled.
Not all of these problems necessarily cause a crash in all situations.
I suggest using a different syntax for this code snippet.
if (MonitorObjectH1C* monitorObject = dynamic_cast<MonitorObjectH1C*>(obj))
{
axis = monitorObject->GetXaxis();
}
You can still crash if some other thread is deleting what monitorObject points to or if obj is crazy garbage, but at least your problem isn't casting related anymore and you're not doing the dynamic_cast twice.
As it crashes only sometimes, i bet it's a threading issue. Check all references to 'obj':
grep -R 'obj.*=' .
dynamic_cast will return 0 if the cast fails and you are casting to a pointer, which is your case. The problem is that you have either corrupted the heap earlier in your code, or rtti wasn't enabled.
Are you sure that the value of 'obj' has been correctly defined?
If for example it is uninitialised (ie random) them I could see it causing a crash.
Can the value of obj be changed by a different thread?