I wrote a multi-thread program and tested OK in Linux:g++12.2.0,clang++15.0.2-1 and Windows:Visual Studio 2022 17.4.2, but caused a deadlock in Windows:MingW-w64.
After a lot of debugging, I found a simple kind of code would cause a deadlock at sometimes when compiled by MingW-w64, no matter with or without optimize options, usually less than 10000 loops would enough to block the progress.
I'm not sure if there is something unsafe in this code or just MingW-w64 has a bug with semaphore.
While in Linux or compiling by Visual Studio would run forever.
And, if replace acquire() to try_acquire_for(chrono::milliseconds(1000)), the program would also run forever under MingW-w64 (without pause).
This is the code:
#include <iostream>
#include <thread>
#include <semaphore>
using namespace std;
std::counting_semaphore<3> cs1(0), cs2(0);
int main(int argc, char const *argv[])
{
thread th(
[]()
{
for (int j = 0;; j--)
{
cs1.release();
printf("%d\n", j);
cs2.acquire();
}
});
for (int i = 0;; i++)
{
cs2.release();
printf("%d\n", i);
cs1.acquire();
}
th.join();
return 0;
}
this is last rows of output in one run:
...
-804
805
806
-805
-806
-807
-808
807
808
809
810
-809
-810
-811
811
812
(blocked)
Seems like the release operation before print(i=812) had not wake the thread waiting at acquire after print(j=-811).
Both MingW-w64-g++ and MingW-w64-clang++ have this problem, this is the info of MingW-w64 (the thread and exception model should be POSIX-seh):
winlibs personal build version gcc-12.2.0-llvm-14.0.6-mingw-w64ucrt-10.0.0-r2
This is the winlibs 64-bit standalone build of:
- GCC 12.2.0
- GDB 12.1
- LLVM/Clang/LLD/LLDB 14.0.6
- MinGW-w64 10.0.0 (linked with ucrt)
- GNU Binutils 2.39
- GNU Make 4.3
- PExports 0.47
- dos2unix 7.4.3
- Yasm 1.3.0
- NASM 2.15.05
- JWasm 2.12pre
- ninja 1.11.0
- doxygen 1.9.5
This build was compiled with GCC 12.2.0 and packaged on 2022-08-28.
Please check out http://winlibs.com/ for the latest personal build.
Related
I was trying to play around with the new parallel library features proposed in the C++17 standard, but I couldn't get it to work. I tried compiling with the up-to-date versions of g++ 8.1.1 and clang++-6.0 and -std=c++17, but neither seemed to support #include <execution>, std::execution::par or anything similar.
When looking at the cppreference for parallel algorithms there is a long list of algorithms, claiming
Technical specification provides parallelized versions of the following 69 algorithms from algorithm, numeric and memory: ( ... long list ...)
which sounds like the algorithms are ready 'on paper', but not ready to use yet?
In this SO question from over a year ago the answers claim these features hadn't been implemented yet. But by now I would have expected to see some kind of implementation. Is there anything we can use already?
GCC 9 has them but you have to install TBB separately
In Ubuntu 19.10, all components have finally aligned:
GCC 9 is the default one, and the minimum required version for TBB
TBB (Intel Thread Building Blocks) is at 2019~U8-1, so it meets the minimum 2018 requirement
so you can simply do:
sudo apt install gcc libtbb-dev
g++ -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -o main.out main.cpp -ltbb
./main.out
and use as:
#include <execution>
#include <algorithm>
std::sort(std::execution::par_unseq, input.begin(), input.end());
see also the full runnable benchmark below.
GCC 9 and TBB 2018 are the first ones to work as mentioned in the release notes: https://gcc.gnu.org/gcc-9/changes.html
Parallel algorithms and <execution> (requires Thread Building Blocks 2018 or newer).
Related threads:
How to install TBB from source on Linux and make it work
trouble linking INTEL tbb library
Ubuntu 18.04 installation
Ubuntu 18.04 is a bit more involved:
GCC 9 can be obtained from a trustworthy PPA, so it is not so bad
TBB is at version 2017, which does not work, and I could not find a trustworthy PPA for it. Compiling from source is easy, but there is no install target which is annoying...
Here are fully automated tested commands for Ubuntu 18.04:
# Install GCC 9
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-9 g++-9
# Compile libtbb from source.
sudo apt-get build-dep libtbb-dev
git clone https://github.com/intel/tbb
cd tbb
git checkout 2019_U9
make -j `nproc`
TBB="$(pwd)"
TBB_RELEASE="${TBB}/build/linux_intel64_gcc_cc7.4.0_libc2.27_kernel4.15.0_release"
# Use them to compile our test program.
g++-9 -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -I "${TBB}/include" -L
"${TBB_RELEASE}" -Wl,-rpath,"${TBB_RELEASE}" -o main.out main.cpp -ltbb
./main.out
Test program analysis
I have tested with this program that compares the parallel and serial sorting speed.
main.cpp
#include <algorithm>
#include <cassert>
#include <chrono>
#include <execution>
#include <random>
#include <iostream>
#include <vector>
int main(int argc, char **argv) {
using clk = std::chrono::high_resolution_clock;
decltype(clk::now()) start, end;
std::vector<unsigned long long> input_parallel, input_serial;
unsigned int seed;
unsigned long long n;
// CLI arguments;
std::uniform_int_distribution<uint64_t> zero_ull_max(0);
if (argc > 1) {
n = std::strtoll(argv[1], NULL, 0);
} else {
n = 10;
}
if (argc > 2) {
seed = std::stoi(argv[2]);
} else {
seed = std::random_device()();
}
std::mt19937 prng(seed);
for (unsigned long long i = 0; i < n; ++i) {
input_parallel.push_back(zero_ull_max(prng));
}
input_serial = input_parallel;
// Sort and time parallel.
start = clk::now();
std::sort(std::execution::par_unseq, input_parallel.begin(), input_parallel.end());
end = clk::now();
std::cout << "parallel " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;
// Sort and time serial.
start = clk::now();
std::sort(std::execution::seq, input_serial.begin(), input_serial.end());
end = clk::now();
std::cout << "serial " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;
assert(input_parallel == input_serial);
}
On Ubuntu 19.10, Lenovo ThinkPad P51 laptop with CPU: Intel Core i7-7820HQ CPU (4 cores / 8 threads, 2.90 GHz base, 8 MB cache), RAM: 2x Samsung M471A2K43BB1-CRC (2x 16GiB, 2400 Mbps) a typical output for an input with 100 million numbers to be sorted:
./main.out 100000000
was:
parallel 2.00886 s
serial 9.37583 s
so the parallel version was about 4.5 times faster! See also: What do the terms "CPU bound" and "I/O bound" mean?
We can confirm that the process is spawning threads with strace:
strace -f -s999 -v ./main.out 100000000 |& grep -E 'clone'
which shows several lines of type:
[pid 25774] clone(strace: Process 25788 attached
[pid 25774] <... clone resumed> child_stack=0x7fd8c57f4fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fd8c57f59d0, tls=0x7fd8c57f5700, child_tidptr=0x7fd8c57f59d0) = 25788
Also, if I comment out the serial version and run with:
time ./main.out 100000000
I get:
real 0m5.135s
user 0m17.824s
sys 0m0.902s
which confirms again that the algorithm was parallelized since real < user, and gives an idea of how effectively it can be parallelized in my system (about 3.5x for 8 cores).
Error messages
Hey, Google, index this please.
If you don't have tbb installed, the error is:
In file included from /usr/include/c++/9/pstl/parallel_backend.h:14,
from /usr/include/c++/9/pstl/algorithm_impl.h:25,
from /usr/include/c++/9/pstl/glue_execution_defs.h:52,
from /usr/include/c++/9/execution:32,
from parallel_sort.cpp:4:
/usr/include/c++/9/pstl/parallel_backend_tbb.h:19:10: fatal error: tbb/blocked_range.h: No such file or directory
19 | #include <tbb/blocked_range.h>
| ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.
so we see that <execution> depends on an uninstalled TBB component.
If TBB is too old, e.g. the default Ubuntu 18.04 one, it fails with:
#error Intel(R) Threading Building Blocks 2018 is required; older versions are not supported.
You can refer https://en.cppreference.com/w/cpp/compiler_support to check all C++ feature implementation status. For your case, just search "Standardization of Parallelism TS", and you will find only MSVC and Intel C++ compilers support this feature now.
Intel has released a Parallel STL library which follows the C++17 standard:
https://github.com/intel/parallelstl
It is being merged into GCC.
Gcc does not yet implement the Parallelism TS (see https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2017)
However libstdc++ (with gcc) has an experimental mode for some equivalent parallel algorithms. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html
Getting it to work:
Any use of parallel functionality requires additional compiler and
runtime support, in particular support for OpenMP. Adding this support
is not difficult: just compile your application with the compiler flag
-fopenmp. This will link in libgomp, the GNU Offloading and Multi Processing Runtime Library, whose presence is mandatory.
Code example
#include <vector>
#include <parallel/algorithm>
int main()
{
std::vector<int> v(100);
// ...
// Explicitly force a call to parallel sort.
__gnu_parallel::sort(v.begin(), v.end());
return 0;
}
Gcc now support execution header, but not standard clang build from https://apt.llvm.org
Eigen is a popular C++ library, but icpc seems to have a problem generating debugging info from code that uses Eigen. I'm using the compiler icpc version 13.1.1. I checked with both Eigen 3.2.8 and 3.1.3. It's going to be hard to recompile all the libraries I need with another compiler, so does anyone see a good solution to get Eigen to work with a debugger?
The problem is that variable values don't always get updated in the debugger. Here is main.cpp
#include "stdio.h"
#include "/home/mylogin/include/Eigen/Core"
using namespace std;
int main(int argc, char* argv[])
{
printf("Starting main\n");
double mytest = 3.0;
// If the next line is commented out, the debugger works
Eigen::Vector3d v(1,2,3);
printf("This is mytest %f \n",mytest);
return 0;
}
I compile with
icpc -O0 -debug -I/home/mylogin/include/ main.cpp
Then you can run the debugger
idbc ./a.out
Intel(R) Debugger for applications running on Intel(R) 64, Version 13.0, Build [80.215.23]
------------------
object file name: ./a.out
Reading symbols from /mnt/io1/home/mylogin/a.out...done.
(idb) break main
Breakpoint 1 at 0x4005fb: file /mnt/io1/home/mylogin/main.cpp, line 142.
(idb) run
Starting program: /mnt/io1/home/mylogin/a.out
[New Thread 18379 (LWP 18379)]
Breakpoint 1, main (argc=1, argv=0x7fff8b2e89b8) at /mnt/io1/home/mylogin/main.cpp:8
8 printf("Starting main\n");
(idb) next
Starting main
11 Eigen::Vector3d v(1,2,3);
(idb) next
12 printf("This is mytest %f \n",mytest);
(idb) next
This is mytest 3.000000
13 return 0;
(idb) print mytest
$1 = 5.9415882155426741e-313
You see in the last few lines that the executable prints "3.0" correctly. You also see that the variable is not printed correctly by the debugger.
Both gdb and idbc show the problem. It doesn't seem to be because it's near the start or end of the function main(). The CPU is
Intel(R) Xeon(R) CPU E5-2650 0 # 2.00GHz
Linux version is
Description: Scientific Linux release 6.4 (Carbon)
Thanks for ideas!
My application is crashing when built on a new Apple laptop and then launched on a much older Apple laptop.
The application is built using Xcode 6.4, on OSX 10.9 and 10.10, when using llvm 6.1 and C++11. The SDK is 10.10, the target OSX is 10.7. Optimizations are off.
The crash is very very early on when the C runtime is loading my application binary and initializing the modules.
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 com.MyCompany.MyApplication 0x000000010cd10e7a _GLOBAL__I_a + 10
1 dyld 0x00007fff61fd3ceb ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 265
2 dyld 0x00007fff61fd3e78 ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40
3 dyld 0x00007fff61fd0871 ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int,
This is before any of my application code. The crash does not occur on the build machine (i7 CPU). Crashes occur on i5 and Core 2 Duo machines. I suspect that an extended (CPU specific) instruction is creating the crash on load.
When I use the same Xcode, same llvm, etc to build the application on the Core 2 Duo machine there is no crash.
I am also using homebrew: libmtp, libusb, libusb-compat, cryptopp, curl (with c-ares, openssl, nghttp2), boost. I have specified C++11 where necessary, and have specified --build-bottle. I am statically linking to these libraries.
I have tried to use otool -tV on all libraries, the final binary, etc to find SSE instructions.
I have tried to set the Xcode LLVM build setting "Enable Additional Vector Extensions" to "platform" and "SSE3" to no avail. This is probably because homebrew isn't passing the --universal flag from curl to the building of openssl and it's cryptlib.
I have taken static libraries libcurl.a (CURL), libssl.a (OpenSSL), libcrypto.a (OpenSSL), libz.a (zlib) from the older machine and added them to my repository. Using Xcode to link them into my application solves the problem.
Are there other tools I can should use to narrow down the offending instruction?
Are there other explanations for the crash?
Addendum:
In addition to building the libraries on an older machine, I have also created a proof of concept, minimal, instant crash program that reports a slightly different crash location, but demonstrates the issue:
On an i7 (new Apple computer with new Intel CPU), use homebrew to install:
brew install curl --with-c-ares --with-openssl
Then copy this source into file sse.cpp:
#define CURL_STATICLIB
#include <curl/curl.h>
int main(int argc, const char * argv[]) {
curl_global_init(CURL_GLOBAL_ALL);
return 0;
}
Compile it:
clang++ sse.cpp -c -arch x86_64 -I/usr/local/opt/curl/include
clang++ -o a.out sse.o /usr/local/opt/openssl/lib/libssl.a /usr/local/opt/openssl/lib/libcrypto.a /usr/local/opt/zlib/lib/libz.a /usr/local/opt/curl/lib/libcurl.a /usr/local/opt/c-ares/lib/libcares.a -stdlib=libc++ -framework LDAP
Now move to an older Apple computer with older Intel CPU, and crash it:
./a.out
Crash Report (compressed):
Process: a.out [569]
...
Code Type: X86-64 (Native)
Parent Process: bash [448]
Responsible: Terminal [339]
...
OS Version: Mac OS X 10.10.5 (14F27)
...
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes: 0x0000000000000001, 0x0000000000000000
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 a.out 0x000000010dbdce3f ENGINE_new + 36
1 a.out 0x000000010dbe05e3 ENGINE_load_dynamic + 11
2 a.out 0x000000010dbdf04a ENGINE_load_builtin_engines + 24
3 a.out 0x000000010dc76b36 Curl_ossl_init + 14
4 a.out 0x000000010dc5c2a5 curl_global_init + 114
5 a.out 0x000000010db51d95 main + 37
6 libdyld.dylib 0x00007fff88b735c9 start + 1
Does your code work when you disable compiler optimizations? If not, how about trying an older version of Xcode? It could just be a compiler bug, though I'd hope not! If you can find a working compiler or set of compiler options to check against, you could use LLVM's bugpoint tool to isolate which file is being miscompiled.
The solution appears to involve using:
export HOMEBREW_BUILD_BOTTLE=1
export HOMEBREW_BOTTLE_ARCH=core2
When building the homebrew libraries. Using Intel XED I was able to check the emitted machine code for unsupported instructions:
xed_cmd="/usr/local/bin/xed"
ar -x libcurl.a
parts=(*.o)
for j in "${parts[#]}"; do
chipcheck=$(${xed_cmd} -i ${j} -chip-check ${chipToCheck})
chiperrors=$(echo "${chipcheck}" | grep "# Total Chip Check Errors")
if [[ "$chiperrors" != "# Total Chip Check Errors: 0" ]] ; then
echo ERROR ${libname} ${j} $chiperrors
fi
done
I'm using pthread_cond_timedwait on a thread loop to execute at every X ms (unless it is waked first).
When I'm using gdb to debug it sometimes it the function never returns.
This forum post also have the same problem, but there is no solution.
Here's some code that reproduces the problem:
#include <errno.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
static pthread_cond_t s_cond = PTHREAD_COND_INITIALIZER;
static pthread_mutex_t s_mutex = PTHREAD_MUTEX_INITIALIZER;
int main(int argc, char **argv)
{
int rc = 0;
struct timespec curts = { 0 }; /* transformed timeout value */
clock_gettime(CLOCK_REALTIME, &curts);
curts.tv_sec += 10; /* Add 10 seconds to current time*/
pthread_mutex_lock(&s_mutex);
printf("pthread_cond_timedwait\n");
rc = pthread_cond_timedwait(&s_cond, &s_mutex, &curts);
if (rc == ETIMEDOUT)
{
printf("Timer expired \n");
}
pthread_mutex_unlock(&s_mutex);
return 1;
}
If I run it, it will run OK, and if I run in gdb it will also run OK.
I've narrowed down to these steps (I've named the program timedTest ):
Run the program;
While it runs attach gdb to it;
Execute continue on gdb;
The timedTest program never returns...;
Then, if I hit Ctrl+C on the terminal running gdb and run continue again, then the program will return.
I can probably use some other method to achieve what I want in this case, but I assume that it should be a solution to this problem.
EDIT:
Looks like this only happens in some machines, so maybe there's something to do with gcc / glibc / gdb / kernel versions...
Versions where this happens almost always:
$ ldd --version
ldd (Ubuntu EGLIBC 2.13-0ubuntu13) 2.13
$ gcc --version
gcc (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2
$ gdb --version
GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
$ uname -a
Linux geovani 2.6.38-8-generic-pae #42-Ubuntu SMP Mon Apr 11 05:17:09 UTC 2011 i686 i686 i386 GNU/Linux
According to this forum post, this is a bug in the 2.6.38 kernel. I've made some tests with a 2.6.39 kernel and problem does not happen. Rolling back to the 2.6.38 it appears again.
I have the following smal C++ program
#include <stdio.h>
#include <stdlib.h>
int main(void) {
puts("!!!Hello World!!!");
return EXIT_SUCCESS;
}
I compile in Mac OS X Leopard last release using:
g++ -g hello.cpp -o hello.exe
being g++:
host:bin macbook$ g++ --ver
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5493~1/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5493)
then I try to debug this program using fsf-gdb 7.1:
fsf-gdb hello.exe
put a breakpoint in main:
(gdb) b main
Breakpoint 1 at 0x1f8f: file hello.cpp, line 5.
run the program:
(gdb) r
Starting program: /Users/horacio/work/software/gdb/gdb-7.2-inst/bin/hello.exe
Breakpoint 1, main () at hello.cpp:5
5 puts("!!!Hello World!!!");
and try to step, and this happens:
(gdb) n
0x00003045 in ?? ()
This is the output if I do the same under Ubuntu Linux:
(gdb) n
!!!Hello World!!!
6 return EXIT_SUCCESS;
where gdb=7.1 and g++=4.3.4
What is the problem ???? I honestly do not understand why this does not work in mac os x.
Maybe the problem is the gdb version used in mac or the gcc version in mac. Which other alternatives exist for gdb in mac?
Thanks in advance
PS: Apple Leopard's gdb does not produce this error. But I want to use Eclipse CDT, and it can not work with Apple's gdb, that is why I am trying to use a non-Apple gdb version.
This works fine if you use the gdb bundled with Mac. Apple's gcc includes some small Apple-specific extensions, so would not be surprised that it is not 100% compatible with some other version of gdb. You may also have built your custom gdb incorrectly.
You mention that your g++ is 4.3.4, but the one you show above is 4.0.1.
Propably not releted to the actual issue but do remember that when you compile for debugging purposes, disable compiler optimizations with -O0 flag. If you don't pass that for gcc when compiling, you will get "funky" results when you are doing step execution with gdb.
My first thought was that your fsf-gdb doesn't understand Mach-0 binaries.
A quick look at Google came back with: http://reverse.put.as/2009/01/14/how-to-compile-gdb-and-other-apple-open-source-packages-in-mac-os-x/ which reveals that building gdb is not quite as trivial as one might think.