I was trying to use std::async in a heavy workload application to improve the performance but I encountered deadlock from time to time. I debugged for a very long time and I am almost certain that my code was fine and it seemed something wrong with std library.
So I wrote a simple test program to testify:
#include <iostream>
#include <vector>
#include <algorithm>
#include <numeric>
#include <future>
#include <string>
#include <mutex>
#include <unistd.h>
#include <atomic>
#include <iomanip>
std::atomic_long numbers[6];
void add(std::atomic_long& n)
{
++n;
}
void func2(std::atomic_long& n)
{
for (auto i = 0L; i < 1000000000000L; ++i)
{
std::async(std::launch::async, [&] {add(n);}); // Small task, I want to run them simultaneously
}
}
int main()
{
std::vector<std::future<void>> results;
for (int i = 0; i < 6; ++i)
{
auto& n = numbers[i];
results.push_back(std::async(std::launch::async, [&n] {func2(n);}));
}
while (true)
{
sleep(1);
for (int i = 0; i < 6; ++i)
std::cout << std::setw(20) << numbers[i] << " ";
std::cout << std::endl;
}
for (auto& r : results)
{
r.wait();
}
return 0;
}
This program will produce output like this:
763700 779819 754005 763287 767713 748994
768822 785172 759678 769393 772956 754469
773529 789382 763524 772704 776398 757864
778560 794419 768580 777507 781542 762991
782056 795578 771704 780554 784865 766162
801633 812610 788111 802617 803661 784894
After a time (minutes or hours), if there was a deadlock, the output will be like this:
4435337 4452421 4507907 4501378 2549550 4462899
4441213 4457648 4514424 4506626 2549550 4468019
4446301 4462675 4519272 4511889 2549550 4473266
4453940 4470304 4526382 4519513 2549550 4480872
4461095 4477708 4533272 4526901 2549550 4488313
4470974 4488287 4543442 4537286 2549550 4498733
The fifth column was frozen.
After one day, it became this:
23934912 23967635 24007250 23931203 2549550 3249788689
23934912 23967635 24007250 23931203 2549550 3249816818
23934912 23967635 24007250 23931203 2549550 3249835009
23934912 23967635 24007250 23931203 2549550 3249860262
23934912 23967635 24007250 23931203 2549550 3249894331
Almost all columns froze except last column. It look really odd.
I ran it on Linux, macOS, FreeBSD, and the result was:
macOS:10.15.2, Clang:11.0.0, no deadlock
FreeBSD:12.0, Clang:6.0.1, deadlock
Linux: ubuntu 5.0.0-37, g++:7.4.0, no deadlock
Linux: ubuntu 4.4.0-21, Clang:3.8.0, deadlock
In the gdb, the call stack was:
(gdb) thread apply all bt
Thread 10 (LWP 100467 of process 37763):
#0 0x000000080025c630 in ?? () from /lib/libthr.so.3
#1 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fff4ad57000
Thread 9 (LWP 100464 of process 37763):
#0 0x000000080046fafa in _umtx_op () from /lib/libc.so.7
#1 0x0000000800264912 in ?? () from /lib/libthr.so.3
#2 0x000000080031f9f9 in std::__1::mutex::unlock() () from /usr/lib/libc++.so.1
#3 0x00000008002e8f55 in std::__1::__assoc_sub_state::set_value() () from /usr/lib/libc++.so.1
#4 0x00000000002053e1 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__execute() ()
#5 0x0000000000205763 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >*> >(void*) ()
#6 0x000000080025c776 in ?? () from /lib/libthr.so.3
#7 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fff6944a000
Thread 8 (LWP 100431 of process 37763):
#0 0x000000080046fafa in _umtx_op () from /lib/libc.so.7
#1 0x0000000800264912 in ?? () from /lib/libthr.so.3
#2 0x000000080031f9f9 in std::__1::mutex::unlock() () from /usr/lib/libc++.so.1
#3 0x00000008002e8f55 in std::__1::__assoc_sub_state::set_value() () from /usr/lib/libc++.so.1
#4 0x00000000002053e1 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__execute() ()
#5 0x0000000000205763 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >*> >(void*) ()
#6 0x000000080025c776 in ?? () from /lib/libthr.so.3
#7 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffc371a000
Thread 7 (LWP 100657 of process 37763):
#0 0x000000080026a66c in ?? () from /lib/libthr.so.3
#1 0x000000080025e731 in ?? () from /lib/libthr.so.3
#2 0x0000000800268388 in ?? () from /lib/libthr.so.3
#3 0x000000080032de72 in std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) () from /usr/lib/libc++.so.1
#4 0x00000008002e971b in std::__1::__assoc_sub_state::wait() () from /usr/lib/libc++.so.1
#5 0x0000000000205389 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__on_zero_shared() ()
#6 0x000000000020346b in func2(std::__1::atomic<long>&) ()
#7 0x0000000000206f18 in main::$_1::operator()() const ()
#8 0x0000000000206eed in void std::__1::__async_func<main::$_1>::__execute<>(std::__1::__tuple_indices<>) ()
#9 0x0000000000206ea5 in std::__1::__async_func<main::$_1>::operator()() ()
#10 0x0000000000206df3 in std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::__execute() ()
#11 0x0000000000207183 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >*> >(void*) ()
#12 0x000000080025c776 in ?? () from /lib/libthr.so.3
#13 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdf5f9000
Thread 6 (LWP 100656 of process 37763):
#0 0x000000080026a66c in ?? () from /lib/libthr.so.3
#1 0x000000080025e731 in ?? () from /lib/libthr.so.3
#2 0x0000000800268388 in ?? () from /lib/libthr.so.3
#3 0x000000080032de72 in std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) () from /usr/lib/libc++.so.1
#4 0x00000008002e971b in std::__1::__assoc_sub_state::wait() () from /usr/lib/libc++.so.1
#5 0x0000000000205389 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__on_zero_shared() ()
#6 0x0000000000207a22 in std::__1::__release_shared_count::operator()(std::__1::__shared_count*) ()
#7 0x00000000002044f4 in std::__1::future<void> std::__1::__make_async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >(std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0>&&) ()
#8 0x00000000002035ea in std::__1::future<std::__1::__invoke_of<std::__1::decay<func2(std::__1::atomic<long>&)::$_0>::type>::type> std::__1::async<func2(std::__1::atomic<long>&)::$_0>(std::__1::launch, func2(std::__1::atomic<long>&)::$_0&&) ()
#9 0x0000000000203462 in func2(std::__1::atomic<long>&) ()
#10 0x0000000000206f18 in main::$_1::operator()() const ()
#11 0x0000000000206eed in void std::__1::__async_func<main::$_1>::__execute<>(std::__1::__tuple_indices<>) ()
#12 0x0000000000206ea5 in std::__1::__async_func<main::$_1>::operator()() ()
#13 0x0000000000206df3 in std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::__execute() ()
#14 0x0000000000207183 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >*> >(void*) ()
#15 0x000000080025c776 in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdf7fa000
Thread 5 (LWP 100655 of process 37763):
#0 0x000000080026a66c in ?? () from /lib/libthr.so.3
#1 0x000000080025e731 in ?? () from /lib/libthr.so.3
#2 0x0000000800268388 in ?? () from /lib/libthr.so.3
#3 0x000000080032de72 in std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) () from /usr/lib/libc++.so.1
#4 0x00000008002e971b in std::__1::__assoc_sub_state::wait() () from /usr/lib/libc++.so.1
#5 0x0000000000205389 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__on_zero_shared() ()
#6 0x0000000000207a22 in std::__1::__release_shared_count::operator()(std::__1::__shared_count*) ()
#7 0x00000000002044f4 in std::__1::future<void> std::__1::__make_async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >(std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0>&&) ()
#8 0x00000000002035ea in std::__1::future<std::__1::__invoke_of<std::__1::decay<func2(std::__1::atomic<long>&)::$_0>::type>::type> std::__1::async<func2(std::__1::atomic<long>&)::$_0>(std::__1::launch, func2(std::__1::atomic<long>&)::$_0&&) ()
#9 0x0000000000203462 in func2(std::__1::atomic<long>&) ()
#10 0x0000000000206f18 in main::$_1::operator()() const ()
#11 0x0000000000206eed in void std::__1::__async_func<main::$_1>::__execute<>(std::__1::__tuple_indices<>) ()
#12 0x0000000000206ea5 in std::__1::__async_func<main::$_1>::operator()() ()
#13 0x0000000000206df3 in std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::__execute() ()
#14 0x0000000000207183 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >*> >(void*) ()
#15 0x000000080025c776 in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdf9fb000
Thread 4 (LWP 100654 of process 37763):
#0 0x000000080026a66c in ?? () from /lib/libthr.so.3
#1 0x000000080025e731 in ?? () from /lib/libthr.so.3
#2 0x0000000800268388 in ?? () from /lib/libthr.so.3
#3 0x000000080032de72 in std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) () from /usr/lib/libc++.so.1
#4 0x00000008002e971b in std::__1::__assoc_sub_state::wait() () from /usr/lib/libc++.so.1
#5 0x0000000000205389 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__on_zero_shared() ()
#6 0x0000000000207a22 in std::__1::__release_shared_count::operator()(std::__1::__shared_count*) ()
#7 0x00000000002044f4 in std::__1::future<void> std::__1::__make_async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >(std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0>&&) ()
#8 0x00000000002035ea in std::__1::future<std::__1::__invoke_of<std::__1::decay<func2(std::__1::atomic<long>&)::$_0>::type>::type> std::__1::async<func2(std::__1::atomic<long>&)::$_0>(std::__1::launch, func2(std::__1::atomic<long>&)::$_0&&) ()
#9 0x0000000000203462 in func2(std::__1::atomic<long>&) ()
#10 0x0000000000206f18 in main::$_1::operator()() const ()
#11 0x0000000000206eed in void std::__1::__async_func<main::$_1>::__execute<>(std::__1::__tuple_indices<>) ()
#12 0x0000000000206ea5 in std::__1::__async_func<main::$_1>::operator()() ()
#13 0x0000000000206df3 in std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::__execute() ()
#14 0x0000000000207183 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >*> >(void*) ()
#15 0x000000080025c776 in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfbfc000
Thread 3 (LWP 100653 of process 37763):
#0 0x000000080026a66c in ?? () from /lib/libthr.so.3
#1 0x000000080025e731 in ?? () from /lib/libthr.so.3
#2 0x0000000800268388 in ?? () from /lib/libthr.so.3
#3 0x000000080032de72 in std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) () from /usr/lib/libc++.so.1
#4 0x00000008002e971b in std::__1::__assoc_sub_state::wait() () from /usr/lib/libc++.so.1
#5 0x0000000000205389 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__on_zero_shared() ()
#6 0x0000000000207a22 in std::__1::__release_shared_count::operator()(std::__1::__shared_count*) ()
#7 0x00000000002044f4 in std::__1::future<void> std::__1::__make_async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >(std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0>&&) ()
#8 0x00000000002035ea in std::__1::future<std::__1::__invoke_of<std::__1::decay<func2(std::__1::atomic<long>&)::$_0>::type>::type> std::__1::async<func2(std::__1::atomic<long>&)::$_0>(std::__1::launch, func2(std::__1::atomic<long>&)::$_0&&) ()
#9 0x0000000000203462 in func2(std::__1::atomic<long>&) ()
#10 0x0000000000206f18 in main::$_1::operator()() const ()
#11 0x0000000000206eed in void std::__1::__async_func<main::$_1>::__execute<>(std::__1::__tuple_indices<>) ()
#12 0x0000000000206ea5 in std::__1::__async_func<main::$_1>::operator()() ()
#13 0x0000000000206df3 in std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::__execute() ()
#14 0x0000000000207183 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >*> >(void*) ()
#15 0x000000080025c776 in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfdfd000
Thread 2 (LWP 100652 of process 37763):
#0 0x000000080026a66c in ?? () from /lib/libthr.so.3
#1 0x000000080025e731 in ?? () from /lib/libthr.so.3
#2 0x0000000800268388 in ?? () from /lib/libthr.so.3
#3 0x000000080032de72 in std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) () from /usr/lib/libc++.so.1
#4 0x00000008002e971b in std::__1::__assoc_sub_state::wait() () from /usr/lib/libc++.so.1
#5 0x0000000000205389 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__on_zero_shared() ()
#6 0x0000000000207a22 in std::__1::__release_shared_count::operator()(std::__1::__shared_count*) ()
#7 0x00000000002044f4 in std::__1::future<void> std::__1::__make_async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >(std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0>&&) ()
#8 0x00000000002035ea in std::__1::future<std::__1::__invoke_of<std::__1::decay<func2(std::__1::atomic<long>&)::$_0>::type>::type> std::__1::async<func2(std::__1::atomic<long>&)::$_0>(std::__1::launch, func2(std::__1::atomic<long>&)::$_0&&) ()
#9 0x0000000000203462 in func2(std::__1::atomic<long>&) ()
#10 0x0000000000206f18 in main::$_1::operator()() const ()
#11 0x0000000000206eed in void std::__1::__async_func<main::$_1>::__execute<>(std::__1::__tuple_indices<>) ()
#12 0x0000000000206ea5 in std::__1::__async_func<main::$_1>::operator()() ()
#13 0x0000000000206df3 in std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::__execute() ()
#14 0x0000000000207183 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >*> >(void*) ()
#15 0x000000080025c776 in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfffe000
Thread 1 (LWP 100148 of process 37763):
#0 0x00000008004f984a in _nanosleep () from /lib/libc.so.7
#1 0x000000080025f17c in ?? () from /lib/libthr.so.3
#2 0x000000080045fe0b in sleep () from /lib/libc.so.7
#3 0x0000000000203b7b in main ()
It seems lots of threads got stuck on std::__1::condition_variable::wait, which is unreasonable, in the test code, there is no use of any condition at all.
Can somebody tell me, am I doing it wrong or there is a bug in the std library?
Thanks. This example didn't fully mimic the actual behavior of my program. I simplified it too much.
Now I add vector of future, this is more like it:
void func2(std::atomic_long& n)
{
std::vector<std::future<void>> rs;
for (auto i = 0L; i < 1000000000000L; ++i)
{
rs.push_back(std::async(std::launch::async, [&] {add(n);}));
}
for (auto& r : rs)
{
r.wait();
}
}
But it still got the same result:
On macOS, it was ok.
29693311 29904143 29994992 29856976 30020535 29832796
29709344 29917687 30005488 29875611 30039727 29848932
29725334 29930826 30019428 29892350 30056678 29866293
29737403 29948258 30036760 29904964 30074102 29883648
29746597 29965134 30050115 29914459 30086189 29900767
29761543 29977363 30066833 29929475 30101723 29915059
29777678 29993381 30084101 29949095 30117847 29926040
29794253 30007301 30102985 29972819 30129613 29939935
On freebsd, it froze again:
34079 29595 38239 508788 30194 41242
34079 29595 38239 509103 30194 41242
34079 29595 38239 509583 30194 41242
34079 29595 38239 509808 30194 41242
34079 29595 38239 510187 30194 41242
34079 29595 38239 510543 30194 41242
34079 29595 38239 510932 30194 41242
34079 29595 38239 511616 30194 41242
34079 29595 38239 512111 30194 41242
34079 29595 38239 512952 30194 41242
34079 29595 38239 514032 30194 41242
34079 29595 38239 514205 30194 41242
34079 29595 38239 514577 30194 41242
You are not taking into account the return value of std::async, the future returned will block any execution until the end of the task you started with std::async. This program as written, it's not performing what you expect.
In addition you are calling std::async with recursion and it's not granted it will spawn a new thread, it can manage a pool so if the pool is busy you program can freeze apparently since the loop you are doing is really long. If you want more control you can use std::thread with std::packaged_task
Related
The output of backtrace of GDB is pretty messy, especially for template.
For instance:
Thread 2 (LWP 100146 of process 1245):
#0 thr_new () at thr_new.S:3
#1 0x000000080025c3da in _pthread_create (thread=0x7fffdfffd880, attr=<optimized out>, start_routine=0x205500 <void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >*> >(void*)>, arg=0x8007fa8e0) at /usr/src/lib/libthr/thread/thr_create.c:188
#2 0x0000000000204e40 in std::__1::thread::thread<void (std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >*, void>(void (std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::*&&)(), std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >*&&) ()
#3 0x0000000000204309 in std::__1::future<void> std::__1::__make_async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >(std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0>&&) ()
#4 0x00000000002035ea in std::__1::future<std::__1::__invoke_of<std::__1::decay<func2(std::__1::atomic<long>&)::$_0>::type>::type> std::__1::async<func2(std::__1::atomic<long>&)::$_0>(std::__1::launch, func2(std::__1::atomic<long>&)::$_0&&) ()
#5 0x0000000000203462 in func2(std::__1::atomic<long>&) ()
#6 0x0000000000206f18 in main::$_1::operator()() const ()
#7 0x0000000000206eed in void std::__1::__async_func<main::$_1>::__execute<>(std::__1::__tuple_indices<>) ()
#8 0x0000000000206ea5 in std::__1::__async_func<main::$_1>::operator()() ()
#9 0x0000000000206df3 in std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::__execute() ()
#10 0x0000000000207183 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >*> >(void*) ()
#11 0x000000080025c776 in thread_start (curthread=0x8007de500) at /usr/src/lib/libthr/thread/thr_create.c:292
#12 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfffe000
In frame #8, there are three pairs of parentheses, std::__1::__async_func<main::$_1>::operator()() () what do they mean exactly?
Frame #8 doesn't have debug info, so GDB can't accurately describe it.
Consider this test case:
struct Foo {
int operator()(void) {
return 1; // line 3
}
};
int main()
{
return Foo()();
}
When compiled with g++ -g t.cc and at breakpoint on line 3, this is what GDB displays:
Breakpoint 1, Foo::operator() (this=0x7fffffffdcff) at t.cc:3
3 return 1;
(gdb) bt
#0 Foo::operator() (this=0x7fffffffdcff) at t.cc:3
#1 0x0000555555555139 in main () at t.cc:10
But compile the same source without -g, set a breakpoint on _ZN3FooclEv, and this is what you will see:
Breakpoint 1, 0x0000555555555140 in Foo::operator()() ()
(gdb) bt
#0 0x0000555555555140 in Foo::operator()() ()
#1 0x0000555555555139 in main ()
The first two sets of parenthesis come from demangling the symbol:
c++filt _ZN3FooclEv
Foo::operator()()
The third set is added by GDB because the symbol being displayed is in the .text section and is assumed to be a function.
#0 0x0000003d7e432925 in raise () from /lib64/libc.so.6
#1 0x0000003d7e43408d in abort () from /lib64/libc.so.6
#2 0x00007ff601e3ba55 in os::abort(bool) () from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
#3 0x00007ff601fbbf87 in VMError::report_and_die() () from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
#4 0x00007ff601e4096f in JVM_handle_linux_signal () from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
#5 <signal handler called>
#6 0x00007ff5fe8f218e in LdiInterFromArray () from /home/zhaojuan/project/DataType/thirdparty/occi-11.2/lib/libclntsh.so.11.1
#7 0x00007ff5ff85a1eb in kpcceiyd2iyd () from /home/zhaojuan/project/DataType/thirdparty/occi-11.2/lib/libclntsh.so.11.1
#8 0x00007ff600138c1d in ttccfpg () from /home/zhaojuan/project/DataType/thirdparty/occi-11.2/lib/libclntsh.so.11.1
#9 0x00007ff600136e90 in ttcfour () from /home/zhaojuan/project/DataType/thirdparty/occi-11.2/lib/libclntsh.so.11.1
#10 0x00007ff5fe5c45f3 in kpufcpf () from /home/zhaojuan/project/DataType/thirdparty/occi-11.2/lib/libclntsh.so.11.1
#11 0x00007ff5fe5c2872 in kpufch0 () from /home/zhaojuan/project/DataType/thirdparty/occi-11.2/lib/libclntsh.so.11.1
#12 0x00007ff5fe5c110f in kpufch () from /home/zhaojuan/project/DataType/thirdparty/occi-11.2/lib/libclntsh.so.11.1
#13 0x00007ff5fe556a03 in OCIStmtFetch () from /home/zhaojuan/project/DataType/thirdparty/occi-11.2/lib/libclntsh.so.11.1
#14 0x00007ff600a29b33 in oracle::occi::ResultSetImpl::next(unsigned int) () from /home/zhaojuan/project/DataType/thirdparty/occi-11.2/lib/libocci.so.11.1
#15 0x0000000000c6f481 in xcloud::xos::OracleLoader::RunLoadMain (this=0x78e6680) at /home/zhaojuan/project/DataType/be/src/exec_xos/OracleLoader.cpp:366
#16 0x0000000000c70f49 in xcloud::xos::OracleLoaderThread (This=<value optimized out>)
at /home/zhaojuan/project/DataType/be/src/exec_xos/OracleLoader.cpp:43
#17 0x0000003d7e8079d1 in start_thread () from /lib64/libpthread.so.0
#18 0x0000003d7e4e8b6d in clone () from /lib64/libc.so.6
hi , I met a occi problem, I want to get interval value from oracle, but it was core dump ,why?
the simplest code is
if (ttype == oracle::occi::OCCI_SQLT_INTERVAL_YM) {
LOG(INFO) << "set YM buffer";
m_res->setDataBuffer(i + 1, m_colBuf[i], ttype, 5, &(m_resultInfos[i].fieldLens[0]),
&(m_resultInfos[i].fieldFlags[0]), &(m_resultInfos[i].fieldRCs[0]));
} else if (ttype == oracle::occi::OCCI_SQLT_INTERVAL_DS) {
m_res->setDataBuffer(i + 1, m_colBuf[i], ttype, 11, &(m_resultInfos[i].fieldLens[0]),
&(m_resultInfos[i].fieldFlags[0]), &(m_resultInfos[i].fieldRCs[0]));
}
m_res is a type of oracle::occi::ResultSet*
m_colBuf[i] is alloc(colWidth[i] * fetchsize)
and it was crashed at:
if (0 == m_res->next(m_fetchSize)) {
LOG(INFO) << "Oracle Fetch finished!";
break;
}
I am trying to join the thread using boost and the program seems to wait indefinitely when I do that.
Thread->join();
On the console I see the following output when I break.
Program received signal SIGINT, Interrupt.
pthread_cond_wait##GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
Here is the GDB trace.
#0 pthread_cond_wait##GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00000000004900a3 in boost::condition_variable::wait(boost::unique_lock<boost::mutex>&) () at /usr/local/boost/include/boost/thread/pthread/condition_variable.hpp:73
#2 0x00007ffff724e333 in boost::thread::join_noexcept() () at libs/thread/src/pthread/thread.cpp:316
#3 0x000000000048e135 in LIN::waitForThreads() () at /usr/local/boost/include/boost/thread/detail/thread.hpp:767
#4 0x00000000004882fd in fa::testinit::test_method() () at /home/ubuntu/UnitTest01.cpp:205
#5 0x0000000000488ef1 in fa::testinit_invoker() () at /home/ubuntu/UnitTest01.cpp:173
#6 0x000000000047296b in boost::detail::function::function_obj_invoker0<boost::detail::forward, int>::invoke(boost::detail::function::function_buffer&) ()
at /usr/local/boost/include/boost/function/function_template.hpp:771
#7 0x000000000044aabd in boost::execution_monitor::catch_signals(boost::function<int ()> const&) () at /usr/local/boost/include/boost/function/function_template.hpp:771
#8 0x000000000044ab91 in boost::execution_monitor::execute(boost::function<int ()> const&) () at /usr/local/boost/include/boost/test/impl/execution_monitor.ipp:1207
#9 0x000000000044b2d5 in boost::execution_monitor::vexecute(boost::function<void ()> const&) () at /usr/local/boost/include/boost/test/impl/execution_monitor.ipp:1313
#10 0x0000000000452bb2 in boost::unit_test::unit_test_monitor_t::execute_and_translate(boost::function<void ()> const&, unsigned int) () at /usr/local/boost/include/boost/test/impl/unit_test_monitor.ipp:46
#11 0x0000000000484760 in boost::unit_test::framework::state::execute_test_tree(unsigned long, unsigned int) () at /usr/local/boost/include/boost/test/impl/framework.ipp:685
#12 0x00000000004849ac in boost::unit_test::framework::state::execute_test_tree(unsigned long, unsigned int) () at /usr/local/boost/include/boost/test/impl/framework.ipp:636
#13 0x0000000000460329 in boost::unit_test::framework::run(unsigned long, bool) () at /usr/local/boost/include/boost/test/impl/framework.ipp:636
#14 0x0000000000460957 in boost::unit_test::unit_test_main(boost::unit_test::test_suite* (*)(int, char**), int, char**) () at usr/local/boost/include/boost/test/impl/unit_test_main.ipp:228
#15 0x00007ffff6673aed in __libc_start_main (main=0x4419c0 <main>, argc=1, argv=0x7fffffffe6c8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe6b8) at libc-start.c:269
#16 0x0000000000442ec9 in _start () at ../sysdeps/x86_64/start.S:122
waitForThreads code :
{
stopthread = true;
//wait for Thread 2
boost::mutex::scoped_lock lock(thread2_lock);
lock.unlock();
m_ConditionVariable.notify_one();
m_Thread->join();
}
Thread 2 code:
while(!stopthread){
boost::mutex::scoped_lock lock(thread2_lock);
while (m_Queue.empty())
{
m_ConditionVariable.wait(lock);
}
if (!m_Queue.empty()){
struct queue_output out;
out = m_Queue.front();
m_Queue.pop_front();
boost::mutex::scoped_lock new_lock(another_lock);
if (write(fd, &out, sizeof(struct queue_output)) == -1)
close(out.fd);
}
}
code to update queue:
{
boost::mutex::scoped_lock lock(thread2_lock);
m_Queue.push_back(input);
lock.unlock();
m_ConditionVariable.notify_one();
}
In UnitTest01.cpp, testinit first I am creating the thread, then updating the queue, then calling join.
Can someone please help me understand why it is hanging while thread joining?
Thanks in advance.
I use Poco-libraries parallel socket acceptor in my application and it sometimes crashes. Here is the backtrace of my application:
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f9ed30ee107 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f9ed30ee107 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f9ed30ef4e8 in __GI_abort () at abort.c:89
#2 0x00007f9ed312c044 in __libc_message (do_abort=do_abort#entry=1,
fmt=fmt#entry=0x7f9ed321ec60 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007f9ed313181e in malloc_printerr (action=1, str=0x7f9ed321f000 "malloc(): memory corruption (fast)",
ptr=<optimized out>) at malloc.c:4996
#4 0x00007f9ed3133bbb in _int_malloc (av=0x7f9ecc000020, bytes=32) at malloc.c:3359
#5 0x00007f9ed3134eb0 in __GI___libc_malloc (bytes=32) at malloc.c:2891
#6 0x00007f9ed39d82e8 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x0000000000471058 in Poco::Net::ParallelSocketAcceptor<BFSTcpServiceHandler, Poco::Net::SocketReactor>::createServiceHandler (this=0x7f9ed0e17d70, socket=...) at /usr/local/include/Poco/Net/ParallelSocketAcceptor.h:172
#8 0x00000000004709d2 in Poco::Net::ParallelSocketAcceptor<BFSTcpServiceHandler, Poco::Net::SocketReactor>::onAccept
(this=0x7f9ed0e17d70, pNotification=0x7f9ecc0009c0) at /usr/local/include/Poco/Net/ParallelSocketAcceptor.h:160
#9 0x0000000000472bfe in Poco::Observer<Poco::Net::ParallelSocketAcceptor<BFSTcpServiceHandler, Poco::Net::SocketReactor>, Poco::Net::ReadableNotification>::notify (this=0x7f9ecc001d20, pNf=0x7f9ecc0009c0)
at /usr/local/include/Poco/Observer.h:86
#10 0x00007f9ed4709c4b in Poco::NotificationCenter::postNotification(Poco::AutoPtr<Poco::Notification>) ()
from /usr/local/lib/libPocoFoundation.so.30
#11 0x00007f9ed43c6630 in Poco::Net::SocketNotifier::dispatch(Poco::Net::SocketNotification*) ()
from /usr/local/lib/libPocoNet.so.30
#12 0x00007f9ed43c38a4 in Poco::Net::SocketReactor::dispatch(Poco::AutoPtr<Poco::Net::SocketNotifier>&, Poco::Net::SocketNotification*) () from /usr/local/lib/libPocoNet.so.30
#13 0x00007f9ed43c3d1b in Poco::Net::SocketReactor::dispatch(Poco::Net::Socket const&, Poco::Net::SocketNotification*)
() from /usr/local/lib/libPocoNet.so.30
#14 0x00007f9ed43c4910 in Poco::Net::SocketReactor::run() () from /usr/local/lib/libPocoNet.so.30
#15 0x000000000046a8dc in BFSTcpServer::run () at src/BFSTcpServer.cpp:69
#16 0x0000000000459c1b in std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) (this=0x1ee8d38)
at /usr/include/c++/4.9/functional:1700
---Type <return> to continue, or q <return> to quit---
#17 0x0000000000459b63 in std::_Bind_simple<void (*())()>::operator()() (this=0x1ee8d38)
at /usr/include/c++/4.9/functional:1688
#18 0x0000000000459ae0 in std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (this=0x1ee8d20)
at /usr/include/c++/4.9/thread:115
#19 0x00007f9ed3a2f970 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#20 0x00007f9ed4eaa0a4 in start_thread (arg=0x7f9ed0e18700) at pthread_create.c:309
#21 0x00007f9ed319eccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
I use the socket acceptor like this:
...
ServerSocket serverSocket(port);
reactor = new SocketReactor();
ParallelSocketAcceptor<BFSTcpServiceHandler,SocketReactor> acceptor(serverSocket, *reactor);
reactor->run();
...
And my servicehandler class is as follows:
class BFSTcpServiceHandler {
Poco::Net::StreamSocket socket;
Poco::Net::SocketReactor& reactor;
...
void onReadable(const Poco::AutoPtr<Poco::Net::ReadableNotification>& pNf);
...
public:
BFSTcpServiceHandler(Poco::Net::StreamSocket& _socket,
Poco::Net::SocketReactor& _reactor);
virtual ~BFSTcpServiceHandler();
};
//And implementation:
BFSTcpServiceHandler::BFSTcpServiceHandler(StreamSocket& _socket,
SocketReactor& _reactor): socket(_socket),reactor(_reactor) {
//Register Callbacks
reactor.addEventHandler(socket, NObserver<BFSTcpServiceHandler,
ReadableNotification>(*this, &BFSTcpServiceHandler::onReadable));
}
BFSTcpServiceHandler::~BFSTcpServiceHandler() {
//Unregister Callbacks
reactor.removeEventHandler(socket, NObserver<BFSTcpServiceHandler,
ReadableNotification>(*this, &BFSTcpServiceHandler::onReadable));
//Close socket
try {
socket.shutdown();
socket.close();
}catch(Exception &e){
LOG(ERROR)<<"ERROR IN CLOSING CONNECTION";
}
}
void BFSTcpServiceHandler::onReadable(
const Poco::AutoPtr<Poco::Net::ReadableNotification>& pNf) {
...
int read = socket.receiveBytes(_packet,sizeof(reqPacket.opCode));
...
//connection is served just close it!
delete this;
}
I am not sure if this is a bug in poco or my program. Either way, I will highly appreciate any comment or help. You can take a look at the full source of mine here:
https://github.com/bshafiee/BFS/blob/master/src/BFSTcpServer.cpp#L57
https://github.com/bshafiee/BFS/blob/master/src/BFSTcpServiceHandler.cpp#L65
Compiler Info:
gcc (Debian 4.9.1-19) 4.9.1
Poco Verision:
1.6.0 (2014-12-22)
My linux(ubuntu 12.04) process crash when I use ACE_5.7.1. My code:
ACE_INET_Addr remote_addr(server_addr.c_str());
ACE_SOCK_Stream stream;
ACE_SOCK_Connector connector;
ACE_Time_Value to(1, 0), to2(2,0);
ret = connector.connect(stream, remote_addr, &to);
Stack info:
Program terminated with signal 6, Aborted
#0 0x00002b47daf8d425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00002b47daf8d425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00002b47daf90b8b in __GI_abort () at abort.c:91
#2 0x00002b47dafcb39e in __libc_message (do_abort=2, fmt=0x2b47db0d2e3f "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:201
#3 0x00002b47db061817 in __GI___fortify_fail (msg=0x2b47db0d2dd6 "buffer overflow detected") at fortify_fail.c:32
#4 0x00002b47db060710 in __GI___chk_fail () at chk_fail.c:29
#5 0x00002b47db0617ce in __fdelt_chk (d=<optimized out>) at fdelt_chk.c:26
#6 0x00002b47d9ee9c3b in is_set (handle=1537, this=0x2b48883e9d90) at /home/cfcheng/MSP4.0/source/source/engine/ivs/../../share/ACE-5.7.1/ace/Handle_Set.inl:84
#7 set_bit (handle=1537, this=0x2b48883e9d90) at /home/cfcheng/MSP4.0/source/source/engine/ivs/../../share/ACE-5.7.1/ace/Handle_Set.inl:103
#8 ACE::handle_timed_complete (h=1537, timeout=0x2b48883ea110, is_tli=0) at ACE.cpp:2547
#9 0x00002b47d9f4e5c7 in ACE_SOCK_Connector::complete (this=<optimized out>, new_stream=..., remote_sap=0x0, tv=<optimized out>) at SOCK_Connector.cpp:262
#10 0x00002b47d9f4e737 in ACE_SOCK_Connector::shared_connect_finish (this=0x2b48883ea17f, new_stream=..., timeout=0x2b48883ea110, result=-1) at SOCK_Connector.cpp:155
#11 0x000000000044f4a9 in res_update_work::send_notify(std::string const&, std::string const&) ()
#12 0x000000000044fe7e in res_update_work::batch_send(std::string const&, ACE_Time_Value const&, bool) ()
#13 0x00000000004506c5 in res_update_work::update_all_res() ()
#14 0x0000000000450ada in res_update_work::svc() ()
#15 0x00002b47d9f56427 in ACE_Task_Base::svc_run (args=0x3e5a490) at Task.cpp:275
#16 0x00002b47d9f578c4 in ACE_Thread_Adapter::invoke (this=0x3e5ae90) at Thread_Adapter.cpp:98
#17 0x00002b47dad41e9a in start_thread (arg=0x2b4888401700) at pthread_create.c:308
#18 0x00002b47db04accd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#19 0x0000000000000000 in ?? ()
Who know why these code cause exception.
Thanks very much.
You are passing automatic objects to connect , I suspect this is causing the problem. Try to allocate the objects passed to connect dynamically, or make them attributes of an object which is still alive after the function where you call connect terminates. The same with the connector object.