(Note to any future readers: The error, unsurprisingly, is in my code and not std::_Rb_tree_rebalance_for_erase () )
I'm somewhat new to programming and am unsure how to deal with a segmentation fault that appears to be coming from a std function. I hope I'm doing something stupid (i.e., misusing a container), because I have no idea how to fix it.
The precise error is
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x000000000000000c
0x00007fff8062b144 in std::_Rb_tree_rebalance_for_erase ()
(gdb) backtrace
#0 0x00007fff8062b144 in std::_Rb_tree_rebalance_for_erase ()
#1 0x000000010000e593 in Simulation::runEpidSim (this=0x7fff5fbfcb20) at stl_tree.h:1263
#2 0x0000000100016078 in main () at main.cpp:43
The function that exits successfully just before the segmentation fault updates the contents of two containers. One is a boost::unordered_multimap called carriage; it contains one or more struct Infection objects. The other container is of type std::multiset< Event, std::less< Event > > EventPQ called ce.
void Host::recover( int s, double recoverTime, EventPQ & ce ) {
// Clearing all serotypes in carriage
// and their associated recovery events in ce
// and then updating susceptibility to each serotype
double oldRecTime;
int z;
for ( InfectionMap::iterator itr = carriage.begin(); itr != carriage.end(); itr++ ) {
z = itr->first;
oldRecTime = (itr->second).recT;
EventPQ::iterator epqItr = ce.find( Event(oldRecTime) );
assert( epqItr != ce.end() );
ce.erase( epqItr );
immune[ z ]++;
}
carriage.clear();
calcSusc(); // a function that edits an array
cout << "Done with sync_recovery event." << endl;
}
The last cout << line appears immediately before the seg fault.
My idea so far is that the rebalancing is being attempted on ce immediately after this function, but I am unsure why the rebalancing would be failing.
Update
I've confirmed the seg fault goes away (though the program then immediately crashes for other reasons) when I remove ce.erase( epqItr );. I am able to remove events successfully in another place in the code; the code I use there to erase items in ce is identical to what's here.
Backtracing without optimization (thanks, bdk) reveals much more information:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x000000000000000c
0x00007fff8062b144 in std::_Rb_tree_rebalance_for_erase ()
(gdb) backtrace
#0 0x00007fff8062b144 in std::_Rb_tree_rebalance_for_erase ()
#1 0x00000001000053d2 in std::_Rb_tree, std::less, > std::allocator >::erase (this=0x7fff5fbfdfe8, __position={_M_node = 0x10107cb50}) at > stl_tree.h:1263
#2 0x0000000100005417 in std::multiset, std::allocator >::erase (this=0x7fff5fbfdfe8, __position={_M_node = 0x10107cb50}) at stl_multiset.h:346
#3 0x000000010000ba71 in Simulation::runEpidSim (this=0x7fff5fbfcb40) at Simulation.cpp:426
#4 0x000000010001fb31 in main () at main.cpp:43
Unless Xcode is reading line numbers wrong, the only stl_tree.h in my hard drive is blank on line 1263.
A few people asked to see the function that calls recover. It's a bit complicated:
struct updateRecovery{
updateRecovery( int s, double t, EventPQ & ce ) : s_(s), t_(t), ce_(ce) {}
void operator() (boost::shared_ptr<Host> ptr ) {
ptr->recover( s_, t_, ce_ );
}
private:
int s_;
double t_;
EventPQ & ce_;
};
// allHosts is a boost::multiindex container of boost::shared_ptr< Host >
// currentEvents is the EventPQ container
// it is an iterator to a specific member of allHosts
allHosts.modify( it, updateRecovery( s, t, currentEvents ) );
cout << "done with recovery" << endl;
The last cout prints. The code worked before without this particular version of the recovery function.
Noah Roberts correctly pointed out that the problem is at Simulation.cpp, line 426. Jump below for embarrassing solution.
Possibly you're holding onto an iterator into ce across the call to recover. If recover happens to remove that item the iterator will be invalidated and any future use (say an attempt to erase it) could result in a seg fault.
It would help if we could see more context of how ce is used before and after the call to recover.
The problem was that on line 426 of Simulation.cpp, I tried to delete an event in the EventPQ currentEvents (a.k.a. ce) container that my recover() function had just deleted. The iterator had obviously been invalidated. Dumb.
Lessons:
Debug on code that has not been optimized
Pay close attention to what the non-std related frames imply
And for the future: Trace memory in valgrind
I'm still stumped why the debugger referred me to an apparently blank line in stl_tree.h.
I've massive appreciation here for the people who have helped me work through this. I'm going to revise my question so it's more concise for any future readers.
Perhaps the call to assert is not compiled with your configuration. Assertions in production code are usually a Bad Idea[TM].
You could also be exceeding immune's boundaries.
Try:
if (epqItr != ce.end())
{
ce.erase(epqItr);
if (z is within immune's bounds)
{
++immune[z];
}
}
Related
My program crashes before main() function. I determine this using "cerr":
int main(int argc, char **argv)
{
cerr << " MAAIN " << endl;
The message from gdb:
Reading symbols for shared libraries ...........+++............................ done
CA(34652) malloc: *** error for object 0x7fff76694860: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Program received signal SIGABRT, Aborted.
0x00007fff88e1782a in __kill ()
(gdb) bt
#0 0x00007fff88e1782a in __kill ()
#1 0x00007fff8c4d2a9c in abort ()
#2 0x00007fff8c53184c in free ()
#3 0x00000001026a1db0 in std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::overflow ()
(gdb)
I have checked the code for "delete" and "free" functions. All deletes are done with check like this:
if (x) delete x;
Please help me with answering two questions:
1. What can be the possible problem?
2. How to find it? (I have a big code with a lot of files and cmake for compilation).
P.S. I read Is there any way a C/C++ program can crash before main()? but looking at gdb massage I suppose libraries are ok.
cout is not a good way to check where your program is crashing because cout does not immediately flush its buffer and it is possible that you programs crashes after cout but before flushing buffer. it's better to check it with cerr instead of cout
And before main function, constructors of global variables will call. so take look at them if you think it crashes before start.
an other possibility is allocating memory for arrays in you main function that happens before calling of main function. if they are huge. you must use new to allocate memory for them.
std::basic_stringbuf<char, std::char_traits<char>... tells me that it's std::string that is going wrong. One quite possible scenario is that you have something that is trying to "free" a string, that hasn't been constructed correctly or that has been overwritten by uncareful use of arrays.
Or you are relying on some global variables in different source files, so you have something like this:
// main.cpp:
...
extern string str; // str lives in another .cpp file
....
myclass x(str); // Construction using str.
// otherfile.cpp
string str("foobar");
In this case, str may not be constructed by the time x is being constructed, and the string is "invalid".
There are oodles of other possibilities along similar lines.
The address here: 0x7fff76694860 is on the stack. If it's always the same value, you could try to track down where it is.
Sorry if I wasn't able to put a better title to my question.
I was debugging my program when I noticed something very interesting. The code is very straightforward. please follow my comments inline:
//my session class
class Session
{
public:
/// Constructor.
Session(boost::asio::io_service &io_service)
: socket_(io_service)
{
}
boost::asio::ip::tcp::socket& socket()
{
return socket_;
}
void async_read(/*...*/);
void async_write(/*...*/);
//blah blah
private:
std::vector<char> inbound_data_;//<---note this variable, but don't mind it until i tell you
std::string outbound_data_;
boost::asio::ip::tcp::socket socket_;
}
typedef boost::shared_ptr<Session> session_ptr; //just for easy reading
//and this is my connection server class
class ConnectionServer {
public:
void ConnectionServer::CreatSocketAndAccept() {
session_ptr new_sess(new Session(io_service_));//<--I created a scope limited shared_ptr
Print()<< "new_sess.use_count()= " << new_sess.use_count() << std::endl;//prints 1
acceptor_.async_accept(new_sess->socket(),//<-used it for async connection acceptance
boost::bind(&ConnectionServer::handle_accept, this,
boost::asio::placeholders::error, new_sess));
Print()<< "new_sess.use_count()= " << new_sess.use_count() << std::endl;//prints 2
}//<-- Scope is ending. what happens to my new_sess? who keeps a copy of my session?
//and now the strangest thing:
void ConnectionServer::handle_accept(const boost::system::error_code& e, session_ptr sess) {
if (!e) {
Print()<< "sess.use_count()= " << sess.use_count() << std::endl;//prints 4 !!!! while I have never copied the session anywhere else in between
Print() << "Connection Accepted" << std::endl;
handleNewClient(sess);
}
else
{
std::cout << "Connection Refused" << std::endl;
}
CreatSocketAndAccept();
}
I don't know who(in boost::asio) copies my shared_ptr internally and when it is going to release them all.
In fact, I noticed this situation when:
My application runs to completion and at the time when containers full of nested shared_ptr ed objects are being cleaned up(automatically and not by me),
I get a seg fault after ~Session() is called where program is trying to deal with a std::vector<char> (this is where I told you to remember in the beginning).
I could see this through eclipse debugger.
I am not good in reading seg faults but I guess the program is trying to clear a vector that doesn't exist.
Sorry for the long question
I value your time and appreciate your kind comments.
EDIT-1:
I just modified my application to use raw pointers for creating new Session(s) rather than shared_ptr. The seg fault is gone if I dont delete the Session. So at least I am sure the cause of the seg fault is in Session .
EDIT-2:
As I mentioned in my previous update, the problem occurs when I try to delete the session but every time the trace leading to the seg fault is different.
sometimes this:
Basic Debug [C/C++ Application]
SimMobility_Short [10350] [cores: 0]
Thread [1] 10350 [core: 0] (Suspended : Signal : SIGSEGV:Segmentation fault)
malloc_consolidate() at malloc.c:4,246 0x7ffff5870e20
malloc_consolidate() at malloc.c:4,215 0x7ffff5871b19
_int_free() at malloc.c:4,146 0x7ffff5871b19
__gnu_cxx::new_allocator<char>::deallocate() at new_allocator.h:100 0xa4ab4a
std::_Vector_base<char, std::allocator<char> >::_M_deallocate() at stl_vector.h:175 0xab9508
std::_Vector_base<char, std::allocator<char> >::~_Vector_base() at stl_vector.h:161 0xabf8c7
std::vector<char, std::allocator<char> >::~vector() at stl_vector.h:404 0xabeca4
sim_mob::Session::~Session() at Session.hpp:35 0xabea8d
safe_delete_item<sim_mob::Session>() at LangHelpers.hpp:136 0xabef31
sim_mob::ConnectionHandler::~ConnectionHandler() at ConnectionHandler.cpp:40 0xabd7e6
<...more frames...>
gdb
and some times this:
Basic Debug [C/C++ Application]
SimMobility_Short [10498] [cores: 1]
Thread [1] 10498 [core: 1] (Suspended : Signal : SIGSEGV:Segmentation fault)
_int_free() at malloc.c:4,076 0x7ffff5871674
std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() at 0x7ffff639d540
sim_mob::ConnectionHandler::~ConnectionHandler() at ConnectionHandler.cpp:30 0xabd806
boost::checked_delete<sim_mob::ConnectionHandler>() at checked_delete.hpp:34 0xadd482
boost::detail::sp_counted_impl_p<sim_mob::ConnectionHandler>::dispose() at sp_counted_impl.hpp:78 0xadd6a2
boost::detail::sp_counted_base::release() at sp_counted_base_gcc_x86.hpp:145 0x849d5e
boost::detail::shared_count::~shared_count() at shared_count.hpp:305 0x849dd7
boost::shared_ptr<sim_mob::ConnectionHandler>::~shared_ptr() at shared_ptr.hpp:164 0x84a668
sim_mob::ClientHandler::~ClientHandler() at ClientHandler.cpp:42 0xac726d
sim_mob::ClientHandler::~ClientHandler() at ClientHandler.cpp:45 0xac72da
<...more frames...>
gdb
does it mean my memory is already corrupted? How can I do more checks? Thank you
This line is where the magic lives:
acceptor_.async_accept(new_sess->socket(),//<-used it for async connection acceptance
boost::bind(&ConnectionServer::handle_accept, this,
boost::asio::placeholders::error, new_sess));
The async_accept has an (optional) second parameter - a completion function which you are using here. You are using boost::bind to create a functor that matches the completion function declaration. You are passing a new_sess smart pointer to that handler (this is why the smart_pointer is not deleted when you leave the scope).
In other words: The async_accept function takes either a functor with no parameters or a functor that accepts an error. You may now create a class that overloads the operator() with that signature. Instead you use boost::bind. Boost::bind allows you to either provide the parameters when the (inner) function is called or when constructing the functor by calling boost::bind. You provided some parameters when calling boost::bind - the smart pointer to the session.
This is a common pattern with boost::asio. You pass your context to the asynchronous function. When this function detects an error all you need to do is to leave the function. The context then leaves the scope and will be deleted. When no error is detected you pass the context (via boost::bind) to the next async function and the context will be kept alive.
You should be able to use shared_ptr in that way, I use it in the same manner without issue.
Internally, asio keeps a copy of your shared_ptr (via boost::bind) until it calls handle_accept. This is what allows you to pass the shared_ptr to begin with. If you did not add it as one of the arguments, then it would clean up the object as soon as it scoped in the function you created it.
I suspect that you have other undefined behavior that using a raw pointer with does not uncover.
To (try to) answer your second question: It seems like a you are issuing a double delete on the session. This is only possible if you create a second scoped_ptr from a raw pointer. This is something you shouldn't do. Are you passing a raw pointer to session to any function that in turn creates a scoped ptr of it?
You could try to let Session inherit enable_shared_from_this. This will fix the problem as any raw pointer uses the same scoped_ptr counter. But you should not see this as a real fix. The real fix would be to eliminate the multiple scope_ptr instanciations.
Edit: Added another debug possibility
Something else you could try would be to set a breakpoint in the destructor of the session and see the backtrace of the first/second delete.
As covered in this answer, it is fine to use shared pointers with Boost.Asio's async_* functions.
Based on the call stacks and behavior, it looks as though at least one resource is being deleted twice. Is it possible that Session is being managed through both a raw pointer and a shared_ptr?
Managing with boost::shared_ptr:
void ConnectionServer::CreatSocketAndAccept() {
session_ptr new_sess(new Session(io_service_)); // shared pointer
...
}
Managing with raw-pointer:
sim_mob::Session::~Session()
safe_delete_item<sim_mob::Session>() // raw pointer
sim_mob::ConnectionHandler::~ConnectionHandler()
If ConnectionHandler was managing Session with boost::shared_ptr, then the call stack should show boost::shared_ptr<sim_mob::Session>::~shared_ptr(). Also, be careful not to create a shared_ptr from a raw pointer that is already being managed by a shared_ptr, as it will result in the shared_ptrs managing the resource as two distinct resources, resulting in a double deletion:
// p1 and p2 use the same reference count to manage the int.
boost::shared_ptr<int> p1(new int(42));
boost::shared_ptr<int> p2(p1); // good
// p3 uses a different reference count, causing int to be managed
// as if it was a different resource.
boost::shared_ptr<int> p3(p1.get()); // bad
As a side note, one common idiom is to have Session inherit from enable_shared_from_this. It allows for Session to remain alive throughout the duration of its asynchronous call chains by passing the shared pointer as a handle to the instance in place of this. For example, it would allow for Session to remain alive while an asynchronous read operation is outstanding, as long as the result of shared_from_this() is bound as the instance handle to the Session::async_read callback.
I have done some searches on here, MSDN, and through some other forums via Google trying to find any sort of solution to this, but so far am stuck.
I have been looking for a week, trying to track down an access violation error in my C++ Program. I cant really post code here as it is under some IP Restrictions, but basically, it is a loop that is running roughly every 100ms reading bytes from a TCP Connection and placing them onto the back of a std::queue.
After I notice a particular byte sequence come through, I then remove x bytes from the queue and handle them as a message defined in an internal protocol.
What happens is, somewhere inside my application, the queue is becoming corrupted and crashing the application. So pair that with the fact that it is an access violation, it must be a dodgy pointer somewhere.
I have tried to use the VS2005 Debugger and Windbg to find it, I had call stacks to look at but it wasnt much help. All I could work out from it is that the cause is corruption of my internal queue. The reason it crashes is because the header of the message gets send to be parsed, but because it is corrupted everything falls over.
Then I tried Intel Thread Checker but that is far too slow to use in this application, as my program is part of a synchronous multi-threaded system.
Sometimes it will run for 300 reads... sometimes it can do 5000 reads... sometimes it can do 10000 reads before it crashes.
What are some other routes of diagnosis I can try? Am I missing something simple here that I should have checked already? From what I can see, anything being newed has a matching delete, and I am using Boost Librarys for Shared Pointers and Auto Pointers on long-living objects.
Use SEH(structured exception handling) to find out which part raises AV.
SEH in C++ example code from MSDN.
#include <stdio.h>
#include <windows.h>
#include <eh.h>
void SEFunc();
void trans_func( unsigned int, EXCEPTION_POINTERS* );
class SE_Exception
{
private:
unsigned int nSE;
public:
SE_Exception() {}
SE_Exception( unsigned int n ) : nSE( n ) {}
~SE_Exception() {}
unsigned int getSeNumber() { return nSE; }
};
int main( void )
{
try
{
_set_se_translator( trans_func );
SEFunc();
}
catch( SE_Exception e )
{
printf( "Caught a __try exception with SE_Exception.\n" );
}
}
void SEFunc()
{
__try
{
int x, y=0;
x = 5 / y;
}
__finally
{
printf( "In finally\n" );
}
}
void trans_func( unsigned int u, EXCEPTION_POINTERS* pExp )
{
printf( "In trans_func.\n" );
throw SE_Exception();
}
Random crash usually caused by heap corruption, it is hard to find. Past years I had deal with several heap corruption problems, as I remembered, one of the problems took me a whole weekend to track it down. Here're some suggestions:
Try app verifier first. details is in:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd371695(v=vs.85).aspx
.
Gflags:
http://msdn.microsoft.com/en-us/library/windows/hardware/ff549557(v=vs.85).aspx.
Use it to to enable Page heap verification.
The solution 1 and 2 are both using heap verification for your whole
program, so you may get many exceptions and slow down your program,
but some of them are not related to your problem. If you know which
part of code has errors, you can use window API _CrtSetDbgFlag to
enable heap verifciation, some thing like this:
`int tmpFlag = _CrtSetDbgFlag( _CRTDBG_REPORT_FLAG );
tmpFlag |= _CRTDBG_CHECK_ALWAYS_DF;
_CrtSetDbgFlag(tmpFlag); // verify heap when alloc and dealloc
//you code here, if the heap is corrupt, exception will be thrown at next allocation.
tmpFlag |= ~_CRTDBG_CHECK_ALWAYS_DF;
_CrtSetDbgFlag(tmpFlag)// do not verify heap`
I am doing debug for a (pthread) multithread C++ program on Linux.
It works well when thread number is small such as 1, 2,3.
When thread number is increased, I got SIGSEGV (segmentation fault , UNIX signal 11).
But, the error sometimes appear and sometimes disappear when I increase thread number above 4.
I used valgrind, I got
==29655== Process terminating with default action of signal 11 (SIGSEGV)
==29655== Access not within mapped region at address 0xFFFFFFFFFFFFFFF8
==29655== at 0x3AEB69CA3E: std::string::assign(std::string const&) (in /usr/lib64/libstdc++.so.6.0.8)
==29655== by 0x42A93C: bufferType::getSenderID(std::string&) const (boundedBuffer.hpp:29)
It seems that my code tried to read a memory which is not allocated.
But, I cannot find any bugs in the function getSenderID(). It only return a string of a member data in Class bufferType. It has been initialized.
I used GDB and DDD (GDB GUI) to find the bug , which also points there but the error sometimes disappear so that in GDB, I cannot capture it with breakpoint.
Moreover, I also print out values of the function pointed by valgrind, but it is not helpful because multiple threads print out results with different orders and they interleave with each other. Each time I run the code, the print-output is different.
The bufferType is in a map, the map may have multiple entries. Each entry can be written by one thread and read by another thread at the same time. I have used pthread read/write lock to lock a pthread_rwlock_t. Now, there is no SIGSEGV but the program stops in some point without progress. I think this is a deadlock. But, one map entry can only be written by only one thread at one time point, why still have deadlock ?
Would you please recommend some methods to capture the bug so that I can find it no matter how many threads I use to run the code.
thanks
The code of boundedBuffer.hpp is as follows:
class bufferType
{
private:
string senderID;// who write the buffer
string recvID; // who should read the buffer
string arcID; // which arc is updated
double price; // write node's price
double arcValue; // this arc flow value
bool updateFlag ;
double arcCost;
int arcFlowUpBound;
//boost::mutex senderIDMutex;
//pthread_mutex_t senderIDMutex;
pthread_rwlock_t senderIDrwlock;
pthread_rwlock_t setUpdateFlaglock;
public:
//typedef boost::mutex::scoped_lock lock; // synchronous read / write
bufferType(){}
void getPrice(double& myPrice ) const {myPrice = price;}
void getArcValue(double& myArcValue ) const {myArcValue = arcValue;}
void setPrice(double& myPrice){price = myPrice;}
void setArcValue(double& myValue ){arcValue = myValue;}
void readBuffer(double& myPrice, double& myArcValue );
void writeBuffer(double& myPrice, double& myArcValue );
void getSenderID(string& myID)
{
//boost::mutex::scoped_lock lock(senderIDMutex);
//pthread_rwlock_rdlock(&senderIDrwlock);
cout << "senderID is " << senderID << endl ;
myID = senderID;
//pthread_rwlock_unlock(&senderIDrwlock);
}
//void setSenderID(string& myID){ senderID = myID ;}
void setSenderID(string& myID)
{
pthread_rwlock_wrlock(&senderIDrwlock);
senderID = myID ;
pthread_rwlock_unlock(&senderIDrwlock);
}
void getRecvID(string& myID) const {myID = recvID;}
void setRecvID(string& myID){ recvID = myID ;}
void getArcID(string& myID) const {myID = arcID ;}
void setArcID(string& myID){arcID = myID ;}
void getUpdateFlag(bool& myFlag)
{
myFlag = updateFlag ;
if (updateFlag)
updateFlag = false;
}
//void setUpdateFlag(bool myFlag){ updateFlag = myFlag ;}
void setUpdateFlag(bool myFlag)
{
pthread_rwlock_wrlock(&setUpdateFlaglock);
updateFlag = myFlag ;
pthread_rwlock_unlock(&setUpdateFlaglock);
}
void getArcCost(double& myc) const {myc = arcCost; }
void setArcCost(double& myc){ arcCost = myc ;}
void setArcFlowUpBound(int& myu){ arcFlowUpBound = myu ;}
int getArcFlowUpBound(){ return arcFlowUpBound ;}
//double getLastPrice() const {return price; }
} ;
From the code, you can see that I have tried to use read/write lock to assure invariant.
Each entry in map has a buffer like this above. Now, I have got deadlock.
Access not within mapped region at address 0xFFFFFFFFFFFFFFF8
at 0x3AEB69CA3E: std::string::assign(std::string const&)
This would normally mean that you are assigning to a string* that was NULL, and then got decremented. Example:
#include <string>
int main()
{
std::string *s = NULL;
--s;
s->assign("abc");
}
g++ -g t.cc && valgrind -q ./a.out
...
==20980== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==20980== Access not within mapped region at address 0xFFFFFFFFFFFFFFF8
==20980== at 0x4EDCBE6: std::string::assign(char const*, unsigned long)
==20980== by 0x400659: main (/tmp/t.cc:8)
...
So show us the code in boundedBuffer.hpp (with line numbers), and think how that code could end up with a string pointer that points at -8.
Would you please recommend some methods to capture the bug so that I can find it no matter how many threads I use to run the code.
When thinking about multi-threaded programs, you must think about invariants. You should put assertions to confirm that your invariants do hold. You should think how they might be violated, and what violations would cause the post-mortem state you have observed.
Do you have any cases where an object (such as a string) is accessed in one thread while another thread is, or might be, modifying it? That's the usual cause of a problem like this.
Look at your instance of bufferType.
When was it instantiated?
If it was instantiated before threads were spawned, and then one of the threads modified it, you have a race condition without a lock.
Also, watch out for any static variables anywhere near or inside that bufferType.
From the looks of it, one of the threads probably has modified the member that returned by getSenderID().
If none of these problems are causing your error, try using valgrind's drd.
What possible reasons do you know for the situation, described in the title? Here's what my bt looks like:
#0 0x00a40089 in ?? ()
#1 0x09e3fac0 in ?? ()
#2 0x09e34f30 in ?? ()
#3 0xb7ef9074 in ?? ()
#4 0xb7ef9200 in ?? ()
#5 0xb7ef9028 in ?? ()
#6 0x081d45a0 in LogFile::Flush ()
#7 0x081d45a0 in LogFile::Flush ()
#8 0x081d46e0 in LogFile::Close ()
#9 0x081d4dbf in LogFile::OpenLogFile ()
#10 0x081d4eb9 in LogFile::PerformPeriodicalFlush ()
#11 0x081d4fca in LogFile::StoreRecord ()
#12 0x081d50c2 in LogFile::StoreRecord ()
and it gives me Program terminated with signal 11, Segmentation fault.
The wrapper around fflush() is simple, does nothing, just calls fflash and check for errors (if the returned code is <0 ). So, I guess the seg fault is caused by fflash. Or it's possible to be somewhere else, because of the ?? at the top of the stack?
OS: RHEL5; gcc version 3.4.6 20060404 (Red Hat 3.4.6-3); debugged with gdb, with the original exe with max debug information in it.
I know about seg fault on no space on the disk, but this is not this case (as I have a watch-dog for the application, that restarts the program again and everything keeps working just fine).
Any ideas would be helpful.
Thanks.
EDIT
void LogFile::PerformPeriodicalFlush( const utils::dt::TimeStamp& tsNow )
throw( LibCException )
{
m_tsLastPeriodicalCheck = tsNow;
struct stat LogFileStat;
int nResult = stat( m_sCurrentFullFileName.c_str(), &LogFileStat );
if ( 0 == nResult && S_ISREG( LogFileStat.st_mode ) )
{
//we successfuly stated the file, so it exists. We can safely perform
//a flush.
try
{
Flush();
return;
}
catch ( LibCException& )
{
OpenLogFile( tsNow );
return;
}
}
else
{
OpenLogFile( tsNow );
}
}
void RotatingLogFile::Flush() throw( object::LibCException )
{
if ( m_pFile != NULL )
{
if ( fflush( m_pFile ) (less_than) 0 )
{
throw object::LibCException();
}
}
}
**NOTE** can't paste the whole code, it's a part of 10+ thousands of code. Also this is working for years on different applications, on real-time systems. Such crashes are very, very rare - kinda twice a year. So, I don't think this is problem in the code. I know that noone can help me with this kind of stuff, that's why I'm just asking for any ideas, why fflush may cause seg fault.
My guess: you have memory corruption somewhere and LogFile's "this" points to a memory area that you can't access.
Anyway, it's difficult to tell without code.
It appeared, that for some reasons, there was something strange with the permissions (not sure what exactly), but this had happened on a hour change, as different files are written for each hour. So, In some way, the file was created, but there were no permissions to write in it, or something like this. No one actually understood what, why and how that happened(because after the crash, the application was restarted and everything was just perfectly fine). So, flush crashed, because of no permissions to do that.
It's still mystery .. but solved xD
You don't provide the code for Flush(), but sounds strange to me that it is called twice. In fact it seems that it calls itself. This may cause some resource leak, depending on the implementation of Flush().
Run your program under valgrind, it will help you find the source of where your application's memory is corrupted.