Using different gcc optimizations my program dies due different OS signals and I wonder if the cause is the same or not.
I was getting a core dump due a abort() in a c++ multithread program compiled using O2.
Program terminated with signal 6, Aborted.
#0 0x00007ff2572d28a5 in raise () from /lib64/libc.so.6
I just was not able to find out which was the cause as it seems to be in a local std::vector destructor.. that made no
sense for me.
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ff248d6c700 (LWP 16767))]#0 0x00007ff2572d28a5 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ff2572d28a5 in raise () from /lib64/libc.so.6
#1 0x00007ff2572d4085 in abort () from /lib64/libc.so.6
#2 0x00007ff25730fa37 in __libc_message () from /lib64/libc.so.6
#3 0x00007ff257315366 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007ff257317e93 in _int_free () from /lib64/libc.so.6
#5 0x000000000044dd45 in deallocate (this=0x7ff250389610) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/ext/new_allocator.h:95
#6 _M_deallocate (this=0x7ff250389610) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_vector.h:146
#7 ~_Vector_base (this=0x7ff250389610) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_vector.h:132
#8 ~vector (this=0x7ff250389610) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_vector.h:313
#9 ...
Studing deeper the code I realized that the vector was initialized using another vector comming from other thread and,
here is the point, no mutex was used to do that. In order to simplify
I wrote this code that reproduces that. (please ignore that stopThread is not protected)
void* doWork(void*)
{
while(!stopThread)
{
double min = std::numeric_limits<int>::max();
double max = std::numeric_limits<int>::min();
pthread_mutex_lock(&_mutex);
std::vector<double> localVector = (sharedVector);
sharedVector.clear();
pthread_mutex_unlock(&_mutex);
for(unsigned int index = 0; index < localVector.size(); ++index)
{
std::cout << "Thread 2 " << localVector[index] << ", " << std::endl;
if(min > localVector[index])
{
min = localVector[index];
}
if(max < localVector[index])
{
max = localVector[index];
}
}
}
return NULL;
}
int main()
{
pthread_mutex_init(&_mutex, NULL);
stopThread = false;
pthread_create(&_thread, NULL, doWork, NULL);
for(int i = 0; i < 10000; i++)
{
sharedVector.push_back(i);
std::cout << "Thread 1 " << i << std::endl;
usleep(5000);
}
stopThread = true;
pthread_join(_thread, NULL);
pthread_cancel(_thread);
std::cout << "Finished! " << std::endl;
}
I fixed that but I cannot say that I solved the problem (I know I fixed a problem but not the problem I was looking for) as the core happens once per month more or less.
So I decided to compile using O0 to see If i can see more details in the core file and then I forced the program to crash. Now, what I have is a Segfault where I expected.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f4598f70cd7 in memmove () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f4598f70cd7 in memmove () from /lib64/libc.so.6
#1 0x000000000045fb84 in std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<double> (__first=0x7f4580977ba0, __last=0x7f4580977ba8, __result=0x0)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_algobase.h:378
#2 0x0000000000465f01 in std::__copy_move_a<false, double const*, double*> (__first=0x7f4580977ba0, __last=0x7f4580977ba8, __result=0x0) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_algobase.h:397
#3 0x0000000000465e66 in std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*> (__first=4.3559999999999999, __last=3.1560000000000001, __result=0x0)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_algobase.h:436
#4 0x0000000000465d6d in std::copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*> (__first=4.3559999999999999, __last=3.1560000000000001, __result=0x0)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_algobase.h:468
#5 0x0000000000465c84 in std::__uninitialized_copy<true>::uninitialized_copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*> (__first=4.3559999999999999, __last=3.1560000000000001,
__result=0x0) at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_uninitialized.h:93
#6 0x0000000000465ad9 in std::uninitialized_copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*> (__first=4.3559999999999999, __last=3.1560000000000001, __result=0x0)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_uninitialized.h:117
#7 0x0000000000465718 in std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*, double> (__first=4.3559999999999999, __last=3.1560000000000001, __result=0x0)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_uninitialized.h:257
#8 0x00000000004650f9 in std::vector<double, std::allocator<double> >::vector (this=0x7f4594d90d70, __x=std::vector of length 1, capacity 4 = {...})
at /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/bits/stl_vector.h:243
#9 ...
I look for some documentation but i found nothing saying that the type of error can change due to the optimization.
However, I run the code above, that reproduces the problem and compiling with O0 a Segmentation fault happens but compiling with O2
it finishs fine.
Thanks for your time
You're locking the mutex while the worker thread access the shared vector; but not when the main thread modifies it. You need to guard all accesses to shared mutable data.
for(int i = 0; i < 10000; i++)
{
pthread_mutex_lock(&_mutex); // Add this
sharedVector.push_back(i);
pthread_mutex_unlock(&_mutex); // Add this
std::cout << "Thread 1 " << i << std::endl;
usleep(5000);
}
You might also consider using a condition variable to notify the worker thread when the vector changes, so that the worker doesn't consume resources busy-waiting.
Related
I'm trying to get the fps from multiple cameras. This is the main:
int main(int argc, char* argv[])
{
//Load all cameras IP
map<string, string> camerasIp; //camera ID and Streaming URL
LoadConfig(&camerasIp);
//Get FPS of all cameras
map<string,string>::iterator it_cam;
while(true)
{
for(it_cam = camerasIp.begin(); it_cam != camerasIp.end(); ++it_cam)
{
GetCamFps(it_cam->second);
}
cout << "" << endl;
}
//Send FPS to server
...
return 0;
}
This is the GetCamFps method:
void GetCamFps(string url)
{
cout << "VideoCapture" << endl;
VideoCapture video(url);
cout << "Get frames" << endl;
double fps = video.get(CAP_PROP_FPS);
cout <<"Frames: " << fps << endl;
video.release();
}
And this is the exit:
VideoCapture
Get frames
Frames: 30
VideoCapture
corrupted double-linked list
Aborted (core dumped)
I tried adding a sleep to let some time between opening one url and another but didn't works. I checked the map and it's correct.
When I comment the two firsts cout in the method it works but after a while it fail again.
Output of gdb with backtrace:
Thread 1 "Fps_Monitoring" received signal SIGABRT, Aborted.
__GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0 __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff51cc801 in __GI_abort () at abort.c:79
#2 0x00007ffff5215897 in __libc_message (action=action#entry=do_abort, fmt=fmt#entry=0x7ffff5342b9a "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007ffff521c90a in malloc_printerr (str=str#entry=0x7ffff5340cba "corrupted double-linked list") at malloc.c:5350
#4 0x00007ffff521cac4 in malloc_consolidate (av=av#entry=0x7ffff5577c40 <main_arena>) at malloc.c:4456
#5 0x00007ffff52207d8 in _int_malloc (av=av#entry=0x7ffff5577c40 <main_arena>, bytes=bytes#entry=1600) at malloc.c:3703
#6 0x00007ffff52214eb in _int_memalign (av=0x7ffff5577c40 <main_arena>, alignment=64, bytes=<optimized out>) at malloc.c:4694
#7 0x00007ffff5226fba in _mid_memalign (address=<optimized out>, bytes=1496, alignment=<optimized out>) at malloc.c:3314
#8 __posix_memalign (memptr=0x7fffffffdad0, alignment=<optimized out>, size=1496) at malloc.c:5369
#9 0x00007ffff028e7e3 in av_malloc () from /usr/local/lib/libavutil.so.56
#10 0x00007ffff06b253b in avformat_alloc_context () from /usr/local/lib/libavformat.so.58
#11 0x00007ffff5f8b9a3 in CvCapture_FFMPEG::open(char const*) () from /usr/local/lib/libopencv_videoio.so.4.2
#12 0x00007ffff5f8e9ff in cv::cvCreateFileCapture_FFMPEG_proxy(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
from /usr/local/lib/libopencv_videoio.so.4.2
#13 0x00007ffff5f71566 in cv::StaticBackend::createCapture(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const ()
from /usr/local/lib/libopencv_videoio.so.4.2
#14 0x00007ffff5f4cc17 in cv::VideoCapture::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) ()
from /usr/local/lib/libopencv_videoio.so.4.2
#15 0x00007ffff5f4f595 in cv::VideoCapture::VideoCapture(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) ()
from /usr/local/lib/libopencv_videoio.so.4.2
#16 0x000055555555f9b3 in GetCamFps(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
#17 0x000055555555d5df in main ()
Any idea why the second videocapture doesn't work?
Thanks.
Short: a (network) Client has a std::shared_ptr<Session>, when a client is destroyed, from ~Client the function queueSessionRemoval(const std::shared_ptr<Session> &session) is called, which adds it to std::map<std::chrono::seconds, std::vector<std::weak_ptr<Session>>> queuedSessionRemovals. This crashes with "double free or corruption".
Simplified code:
class Store
{
std::map<std::chrono::seconds, std::vector<std::weak_ptr<Session>>> queuedSessionRemovals;
std::mutex queuedSessionRemovalsMutex;
}
Client::~Client()
{
// removed irrelevant stuff.
// session is std::shared_ptr<Session>
store->queueSessionRemoval(session);
}
void Store::expireSessions()
{
std::lock_guard<std::mutex>(this->queuedSessionRemovalsMutex);
// Iterate over queuedSessionRemovals, etc, etc
// Removed; not important
}
void Store::queueSessionRemoval(const std::shared_ptr<Session> &session)
{
if (!session)
return;
auto removeAt = std::chrono::steady_clock::now() + std::chrono::seconds(session->getSessionExpiryInterval());
std::chrono::seconds secondsSinceEpoch = std::chrono::duration_cast<std::chrono::seconds>(removeAt.time_since_epoch());
std::lock_guard<std::mutex>(this->queuedSessionRemovalsMutex);
queuedSessionRemovals[secondsSinceEpoch].push_back(session);
}
The line that crashes is queuedSessionRemovals[secondsSinceEpoch].push_back(session);
#0 __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007fc09ab93859 in __GI_abort () at abort.c:79
#2 0x00007fc09abfe29e in __libc_message (action=action#entry=do_abort, fmt=fmt#entry=0x7fc09ad28298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3 0x00007fc09ac0632c in malloc_printerr (str=str#entry=0x7fc09ad2a628 "double free or corruption (fasttop)") at malloc.c:5347
#4 0x00007fc09ac07c95 in _int_free (av=0x7fc084000020, p=0x7fc0842197b0, have_lock=0) at malloc.c:4266
#5 0x00005590242cfb82 in __gnu_cxx::new_allocator<std::weak_ptr<Session> >::deallocate (this=0x7fc08021eaf8, __p=0x7fc0842197c0) at /usr/include/c++/9/ext/new_allocator.h:128
#6 0x00005590242cfae2 in std::allocator_traits<std::allocator<std::weak_ptr<Session> > >::deallocate (__a=..., __p=0x7fc0842197c0, __n=4115453) at /usr/include/c++/9/bits/alloc_traits.h:469
#7 0x00005590242cfa3a in std::_Vector_base<std::weak_ptr<Session>, std::allocator<std::weak_ptr<Session> > >::_M_deallocate (this=0x7fc08021eaf8, __p=0x7fc0842197c0, __n=4115453) at /usr/include/c++/9/bits/stl_vector.h:351
#8 0x000055902430fe13 in std::vector<std::weak_ptr<Session>, std::allocator<std::weak_ptr<Session> > >::_M_realloc_insert<std::weak_ptr<Session> > (this=0x7fc08021eaf8,
__position=<error reading variable: Cannot access memory at address 0x119>) at /usr/include/c++/9/bits/vector.tcc:500
#9 0x000055902430c384 in std::vector<std::weak_ptr<Session>, std::allocator<std::weak_ptr<Session> > >::emplace_back<std::weak_ptr<Session> > (this=0x7fc08021eaf8) at /usr/include/c++/9/bits/vector.tcc:121
#10 0x00005590243091da in std::vector<std::weak_ptr<Session>, std::allocator<std::weak_ptr<Session> > >::push_back (this=0x7fc08021eaf8, __x=...) at /usr/include/c++/9/bits/stl_vector.h:1201
#11 0x0000559024304a3a in Store::queueSessionRemoval (this=0x559024648200, session=std::shared_ptr<class Session> (use count 2, weak count 2) = {...}) at /bla/store.cpp:726
#12 0x00005590242e41f7 in Client::~Client (this=0x55902482ca00, __in_chrg=<optimized out>) at /bla/client.cpp:91
There is nothing going on like putting a raw pointer in a shared or weak one. It's all properly managed and the session is created with std::make_shared.
So, __gnu_cxx::new_allocator<std::weak_ptr<Session> >::deallocate crashes. Probably related to re-balancing of the map.
Changing the std::weak_ptr to a std::shared_ptr in queuedSessionRemovals doesn't help. Using other containers doesn't help. Making the argument to queueSessionRemoval not a reference doesn't help.
When I use a std::list instead of a std::map and keep it sorted with std::upper_bound to find the position, very funky results happen, where an empty list has a size() > 0.
The only thing I can imagine is that it has something to do with calling this from a destructor, but at that point in time, the client::session is still a valid shared pointer, so I wouldn't know why.
Any thoughts?
There is a bug in intel compiler on user-defined reduction in OpenMP which was discussed here (including the wrokaround). Now I want to pass the vector to a function and do the same thing but I get this error:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted
This is the example:
#include <iostream>
#include <vector>
#include <algorithm>
#include "omp.h"
#pragma omp declare reduction(vec_double_plus : std::vector<double> : \
std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus<double>())) \
initializer(omp_priv = omp_orig)
int foo(std::vector<double> &w){
#pragma omp parallel reduction(vec_double_plus:w)
{
#pragma omp for
for (int i = 0; i < 2; ++i)
for (int j = 0; j < w.size(); ++j)
w[j] += 1;
};
return 0;
}
int main() {
omp_set_num_threads(2);
std::vector<double> w(10,0);
foo(w);
for(auto i:w)
if(i != 2)
std::cout << i << std::endl;
return 0;
}
Again it works fine with GNU/6.4.0 but fails with intel/2018.1.163. Any ideas?
Update: I changed the values to make it easier to debug. I work on a remote node, so I am using terminal. I used gdb to debug the code that was compiled with intel/2018.1.163. I'm not sure if it is the right thing to do, or if there is a better way to debug the code. This is the error from gdb:
[New Thread 0x2aaaac68a780 (LWP 15573)]
terminate called recursively
terminate called after throwing an instance of 'std::bad_alloc
Program received signal SIGABRT, Aborted.
0x00002aaaabaf91f7 in raise () from /lib64/libc.so.6
And, this is the cmake configuration:
cmake_minimum_required(VERSION 3.2)
project(openmp_reduction001)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_BUILD_TYPE Debug)
find_package(OpenMP)
if(OPENMP_FOUND)
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
endif()
add_executable(openmp_reduction001 main.cpp)
Update2: The result of backtrace in gdb is added in the following. The computing node that I used has the intel compiler module loaded as the default compiler, but it looks into /usr/include/c++/4.8.5 for the header files. Is that normal? I looked into /usr/include/c++/. It only includes 4.4.7, 4.8.2, 4.8.5 folders. Another issue is at line #12, in which the length of the vector is -15, which probably causes the std::allocator to set its n parameter a very large number.
#0 0x00002aaaabaf91f7 in raise () from /lib64/libc.so.6
#1 0x00002aaaabafa8e8 in abort () from /lib64/libc.so.6
#2 0x00002aaaaad2fa55 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3 0x00002aaaaad2da36 in ?? () from /lib64/libstdc++.so.6
#4 0x00002aaaaad2da63 in std::terminate() () from /lib64/libstdc++.so.6
#5 0x00002aaaaad2dc83 in __cxa_throw () from /lib64/libstdc++.so.6
#6 0x00002aaaaad826d2 in std::__throw_bad_alloc() () from /lib64/libstdc++.so.6
#7 0x0000000000404022 in __gnu_cxx::new_allocator<double>::allocate (this=0x2aaaac689af0, __n=18446744073709551602)
at /usr/include/c++/4.8.5/ext/new_allocator.h:102
#8 0x0000000000403856 in std::_Vector_base<double, std::allocator<double> >::_M_allocate (this=0x2aaaac689af0, __n=18446744073709551602)
at /usr/include/c++/4.8.5/bits/stl_vector.h:168
#9 0x000000000040394b in std::_Vector_base<double, std::allocator<double> >::_M_create_storage (this=0x7fffffffa370, __n=18446744073709551601)
at /usr/include/c++/4.8.5/bits/stl_vector.h:181
#10 0x00000000004037a6 in std::_Vector_base<double, std::allocator<double> >::_Vector_base (this=0x7fffffffa370, __n=18446744073709551601, __a=...)
at /usr/include/c++/4.8.5/bits/stl_vector.h:136
#11 0x00000000004037fa in std::_Vector_base<double, std::allocator<double> >::_Vector_base (this=0x7fffffffa370)
at /usr/include/c++/4.8.5/bits/stl_vector.h:134
#12 0x0000000000403b15 in std::vector<double, std::allocator<double> >::vector (this=0x7fffffffa370,
__x=std::vector of length -15, capacity -17592185515333 = {...}) at /usr/include/c++/4.8.5/bits/stl_vector.h:312
#13 0x0000000000402cd3 in __udr_i_0x914e698 (__omp_priv=0x7fffffffa370, __omp_orig=0x7fffffffa838)
at /uufs/chpc.utah.edu/common/home/u1013493/openmp_reduction001/main.cpp:8
#14 0x0000000000402e7d in L__Z3fooRSt6vectorIdSaIdEE_14__par_region0_2_4 () at /uufs/chpc.utah.edu/common/home/u1013493/openmp_reduction001/main.cpp:14
#15 0x00002aaaab39e7a3 in __kmp_invoke_microtask ()
from /uufs/chpc.utah.edu/sys/installdir/intel/compilers_and_libraries_2018.1.163/linux/compiler/lib/intel64/libiomp5.so
I am receiving a segmentation fault whenever I attempt to push a templated object into a vector. I have ran gdb, and am still unable to understand why I get the segmentation fault error.
Program received signal SIGSEGV, Segmentation fault. 0x0000003f5869d4f3 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6
This is where I am inserting the object into the vector:
void ReadyDelivery::LoadTruck(){
string name = "";
int capacity = 0;
ifstream inputStream;
inputStream.open(m_truckFile.c_str());
while(inputStream >> name >> capacity ){
Truck<Item, MAX_CAPACITY> t(name,capacity);
cout<<name<<" "<<capacity<<endl;
m_truck.push_back(t);
}
cout<<"Trucks loaded: "<<m_truck.size()<<endl;
inputStream.close();
If I comment out where I push_back the object into the vector, there is no segmentation fault.
I also have a function that returns the vector. I am not sure if this could be causing it though...
vector<Truck<Item,MAX_CAPACITY> > & ReadyDelivery:: GetTruck(){
return m_truck;
}
Thanks for the help!
Here is the definition of m_truck, which is a private member variable:
private:
vector< Truck<Item, MAX_CAPACITY> > m_truck; //Vector of templated trucks
Here is where I construct the truck object in my template:
template <class T, int N>
Truck<T,N>::Truck(string inName, int capacity){
m_name = inName;
m_capacity = capacity;
}
If I run gdb and use the where command... I get this:
#0 0x000000346ce9d4f3 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6
#1 0x0000000000402bd9 in Item::operator= (this=0x60eb00) at Item.h:11
#2 0x0000000000402c66 in Tqueue<Item, 200>::~Tqueue (this=0x60e478,
__in_chrg=<value optimized out>) at Tqueue.h:96
#3 0x0000000000402b43 in Truck<Item, 200>::~Truck (this=0x60e450,
__in_chrg=<value optimized out>) at Truck.h:114
#4 0x00000000004027d5 in std::_Destroy<Truck<Item, 200> > (__pointer=0x60e450)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_construct.h:90
#5 0x000000000040249c in std::_Destroy_aux<false>::__destroy<Truck<Item, 200>*>
(__first=0x60e450, __last=0x60e488)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_construct.h:100
#6 0x00000000004021d3 in std::_Destroy<Truck<Item, 200>*> (__first=0x60e450,
__last=0x60e488)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_construct.h:123
#7 0x0000000000401cdf in std::_Destroy<Truck<Item, 200>*, Truck<Item, 200> > (
__first=0x60e450, __last=0x60e488)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_construct.h:149
#8 0x000000000040197c in std::vector<Truck<Item, 200>, std::allocator<Truck<Item, 200> > >::~vector (this=0x7fffffffe1b0, __in_chrg=<value optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_vector.h:313
#9 0x0000000000401593 in main (argc=1, argv=0x7fffffffe2c8) at driver.cpp:29
I have this C++ application running on a Ubuntu Lucid 10.04.3 LTS which is crashed and the reason really escapes me.
The method which exhibits failure is this one:
void
IoLogikCommunicator::processPacket(char const* data, WORD wSize)
{
std::string message(data, wSize);
std::stringstream ss(message);
std::string token;
std::vector<std::string> tokens;
while (std::getline(ss, token, '#')) // <- crash
tokens.push_back(token);
if (tokens[0] == "SENSORS")
processSensorsPacket(tokens);
else if (tokens[0] == "SELECTOR")
processSelectorPacket(tokens);
}
According to the core dump, data content is valid and it is:
p data
$1 = 0xb7520214 "SENSORS#192.168.107.62#DI:00#ON#DI:01#ON#DI:02#ON#DI:03#OFF#DI:04#OFF#DI:05#OFF"
p wSize
$2 = 79
The content of tokens, at crash time, is ["SENSORS"], so the first element was parsed correctly.
What happens then is:
Program terminated with signal 6, Aborted.
#0 0x009de422 in __kernel_vsyscall ()
(gdb) bt
#0 0x009de422 in __kernel_vsyscall ()
#1 0x0766a651 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0x0766da82 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0x076a149d in ?? () from /lib/tls/i686/cmov/libc.so.6
#4 0x076ab591 in ?? () from /lib/tls/i686/cmov/libc.so.6
#5 0x076ae710 in ?? () from /lib/tls/i686/cmov/libc.so.6
#6 0x076aff9c in malloc () from /lib/tls/i686/cmov/libc.so.6
#7 0x0070dc07 in operator new(unsigned int) () from /usr/lib/libstdc++.so.6
#8 0x006e7d06 in std::string::_Rep::_S_create(unsigned int, unsigned int, std::allocator<char> const&) () from /usr/lib/libstdc++.so.6
#9 0x006e9f70 in std::string::_M_mutate(unsigned int, unsigned int, unsigned int) () from /usr/lib/libstdc++.so.6
#10 0x006c4274 in std::basic_istream<char, std::char_traits<char> >& std::getline<char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, char) () from /usr/lib/libstdc++.so.6
given the SIGABRT it seems that an assert() fails inside the malloc invocation, but what could be the reason? Of course, it was impossible for me to reproduce the bug: this method is invoked several times per second and the application crashed after 30 and more days of continuous running.
The very same data, then, is processed by another identical application which is hosted on another machine: that one didn't crash.
Do you have any suggestion/hint/tips/pointer?