Speed up extremely slow MinGW-w64 compilation/linking? - c++

How can I speed up MinGW-w64's extremely slow C++ compilation/linking?
Compiling a trivial "Hello World" program:
#include <iostream>
int main()
{
std::cout << "hello world" << std::endl;
}
...takes 3 minutes(!) on this otherwise-unloaded Windows 10 box (i7-6700, 32GB of RAM, decent SATA SSD):
> ptime.exe g++ main.cpp
ptime 1.0 for Win32, Freeware - http://www.pc-tools.net/
Copyright(C) 2002, Jem Berkes <jberkes#pc-tools.net>
=== g++ main.cpp ===
Execution time: 180.488 s
Process Explorer shows the g++ process tree bottoming out in ld.exe which doesn't use any appreciable CPU or I/O for the duration.
Running the g++ process tree through API Monitor shows there are three unusually long syscalls in ld.exe: two NtCreateFile()s and a NtOpenFile(), each operating on a.exe and taking 60 seconds apiece.
The slowness only happens when using the default a.exe output; g++ -o foo.exe main.cpp takes 2 seconds, tops.
"Well don't use a.exe as an output name then!" isn't really a solution since this behavior causes CMake to take ages doing compiler feature detection.
GCC toolchain versions:
>g++ --version
g++ (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0
>ld --version
GNU ld (GNU Binutils) 2.30

Given that I couldn't repro the problem in a clean Windows 10 VM and the dependence on the output filename led me down the path of anti-virus/anti-malware interference.
fltmc instances listed several possible filesystem filter drivers; guess-n-check narrowed it down to two of Carbon Black's: carbonblackk & ParityDriver.
Using Regedit to disable them via setting Start to 0x4 ("Disabled", 0x2 == Automatic, 0x3 == Manual) in these two registry keys followed by a reboot fixed the slowness:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\carbonblackk
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ParityDriver

Related

64-bit version of GCC not compiling 64-bit exe

I am beginner regarding gcc command line compilation.
I need a help regarding -m64 flag.
I installed gcc compiler using MinGW.
I checked for gcc version by following,
gcc -v command, which shows Target: x86_64-w64-mingw32.
So I assume, 64-bit version of gcc is installed.
Objective: I wrote a small program to check, if the main.exe is generated for 32 or 64 bit.
#include<stdio.h>
int main(void)
{
printf("The Size is: %lu\n", sizeof(long));
return 0;
}
I compiled using following command, gcc -o main main.c. When I execute the main.exe, it outputs, The Size is: 4.
But I expected the output to be `The Size is: 8'.
So i modified the command as gcc -m64 -o main main.c. When I executed the main.exe again, still it outputs `The Size is: 4'
How to compile for 64-bit version exe?
As others have said in the comments, the size of long can be 8 or 4 bytes on a 64bit system. You can try sizeof(size_t) or sizeof(void*). Even this might not be reliable on every system (but should work for Windows, Linux, macOS).
Here is a better way of doing it.
First download Sigcheck from Microsoft https://learn.microsoft.com/en-us/sysinternals/downloads/sigcheck then run it like below:
C:\Sigcheck>sigcheck64.exe -u -e "C:\Sublime C++ Projects\runtime_measure.exe"
Sigcheck v2.82 - File version and signature viewer
Copyright (C) 2004-2021 Mark Russinovich
Sysinternals - www.sysinternals.com
c:\sublime c++ projects\runtime_measure.exe:
Verified: Unsigned
Link date: 7:43 PM 12/8/2021
Publisher: n/a
Company: n/a
Description: n/a
Product: n/a
Prod version: n/a
File version: n/a
MachineType: 64-bit
As you can see, in this case, runtime_measure.exe is a 64-bit binary.
Don't forget to give the correct address so that the terminal can find and execute sigcheck64.exe from the directory you have placed it.
Also, notice the use of two parameters -u and -e in the command.
x86_64-w64-mingw32:
The mingw32 is compiler that will generate 32bits executables.
The references to 64bit in you package name indicates that this compiler runs in 64bits mode.
If you wan't to generate 64 bits executables, you will need mingw64 compiler:
https://www.mingw-w64.org/

Can't compile with mingw linking a library on Linux to create executable for Windows

I'm trying to compile C/C++ code from my Debian partition to generate some executable files for Windows.
Running $ uname -a on the command line gives Linux machine 5.14.0-2-amd64 #1 SMP Debian 5.14.9-2 (2021-10-03) x86_64 GNU/Linux. My processor is an Intel® Core™ i5-1035G4 CPU # 1.10GHz × 8, with a Mesa Intel® Iris(R) Plus Graphics (ICL GT1.5) integrated GPU.
A minimal example to show my current situation includes the following code (called code.cpp):
#include <iostream>
#include <CL/opencl.hpp>
int main()
{
std::vector <cl::Platform> all_platforms; //Get all platforms
cl::Platform::get(&all_platforms);
if (all_platforms.size() == 0)
{
std::cout << "No platforms found. Check OpenCL installation." << std::endl;
exit(1);
}
int pz = all_platforms.size();
std::cout << "Platforms size: " << pz << std::endl;
for (int i = 0; i < pz; i++)
{
cl::Platform default_platform = all_platforms[i];
std::cout << "Using platform: " << default_platform.getInfo<CL_PLATFORM_NAME>() << std::endl;
}
return(0);
}
which uses OpenCL to print all recognized devices. I compile my code writing g++ code.cpp -o code.out -lOpenCL. The executable file code.out works fine, doing what you would expect it to do. I have another program which uses GSL (GNU Scientific Library) written in C which also works well, linking with -lgsl (therefore I think there's not a problem with my code or the regular compilation process). Both OpenCL and GSL were installed from the official repositories (~# apt install ...) with no problem at all. When I execute code.out the output is
Platforms size: 2
Using platform: Intel(R) OpenCL HD Graphics
Using platform: Portable Computing Language
I installed mingw (via ~# apt install mingw-w64) to create executable files to be run on Windows, and for basic programs (i.e. without "external" libraries) it works well (replacing gcc by x86_64-w64-mingw32-gcc or i686-w64-mingw32-gcc). However for the code written above (and for the one using GSL) it doesn't work. Most of the error outputs are very similar for both examples, and I will show the command line outputs for the code using OpenCL.
When I try x86_64-w64-mingw32-g++ code.cpp -o code.out -lOpenCL the output is
code.cpp:2:10: fatal error: CL/opencl.hpp: No such file or directory
2 | #include <CL/opencl.hpp>
| ^~~~~~~~~~~~~~~
compilation terminated.
I thought this meant that I needed to be more specific when linking and including, so I gave the explicit path where the headers are located (found them via dpkg -S opencl.hpp or dpkg -S gsl*.h), and the .so file for OpenCL was found via dpkg -S *OpenCL.so, while the one for GSL was found using dpkg -S *gsl.so. When I try x86_64-w64-mingw32-g++ code.cpp -o code.out -I/usr/include/ -L/usr/lib/x86_64-linux-gnu/libOpenCL.so the output is
In file included from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/cwchar:44,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/bits/postypes.h:40,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/iosfwd:40,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/ios:38,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/ostream:38,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/iostream:39,
from code.cpp:1:
/usr/include/wchar.h:27:10: fatal error: bits/libc-header-start.h: No such file or directory
27 | #include <bits/libc-header-start.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Therefore it seems that MinGW needs additional instructions to properly find, include and/or link the libraries. I don't know how to solve this problem. Those are my attempts based on some answers I've found, and the documentation provided by MinGW says nothing about this. The exact same problem occurs no matter if I use x86_64-w64-mingw32-g++ or i686-w64-mingw32-g++, or their gcc counterparts.
When cross-compiling make sure you are only linking things targeting the same platform together. In other words, your dependencies (and their dependencies) must be for the same target platform. You can't link with those libraries for your build platform.
So if you have a Windows 64-bit application that depends on OpenCL, you will need to link it against a Windows 64-bit build of OpenCL.
The OpenCL the sources can be found here:
https://github.com/KhronosGroup/OpenCL-Headers
https://github.com/KhronosGroup/OpenCL-ICD-Loader
so you would need to build those first.

Are C++17 Parallel Algorithms implemented already?

I was trying to play around with the new parallel library features proposed in the C++17 standard, but I couldn't get it to work. I tried compiling with the up-to-date versions of g++ 8.1.1 and clang++-6.0 and -std=c++17, but neither seemed to support #include <execution>, std::execution::par or anything similar.
When looking at the cppreference for parallel algorithms there is a long list of algorithms, claiming
Technical specification provides parallelized versions of the following 69 algorithms from algorithm, numeric and memory: ( ... long list ...)
which sounds like the algorithms are ready 'on paper', but not ready to use yet?
In this SO question from over a year ago the answers claim these features hadn't been implemented yet. But by now I would have expected to see some kind of implementation. Is there anything we can use already?
GCC 9 has them but you have to install TBB separately
In Ubuntu 19.10, all components have finally aligned:
GCC 9 is the default one, and the minimum required version for TBB
TBB (Intel Thread Building Blocks) is at 2019~U8-1, so it meets the minimum 2018 requirement
so you can simply do:
sudo apt install gcc libtbb-dev
g++ -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -o main.out main.cpp -ltbb
./main.out
and use as:
#include <execution>
#include <algorithm>
std::sort(std::execution::par_unseq, input.begin(), input.end());
see also the full runnable benchmark below.
GCC 9 and TBB 2018 are the first ones to work as mentioned in the release notes: https://gcc.gnu.org/gcc-9/changes.html
Parallel algorithms and <execution> (requires Thread Building Blocks 2018 or newer).
Related threads:
How to install TBB from source on Linux and make it work
trouble linking INTEL tbb library
Ubuntu 18.04 installation
Ubuntu 18.04 is a bit more involved:
GCC 9 can be obtained from a trustworthy PPA, so it is not so bad
TBB is at version 2017, which does not work, and I could not find a trustworthy PPA for it. Compiling from source is easy, but there is no install target which is annoying...
Here are fully automated tested commands for Ubuntu 18.04:
# Install GCC 9
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-9 g++-9
# Compile libtbb from source.
sudo apt-get build-dep libtbb-dev
git clone https://github.com/intel/tbb
cd tbb
git checkout 2019_U9
make -j `nproc`
TBB="$(pwd)"
TBB_RELEASE="${TBB}/build/linux_intel64_gcc_cc7.4.0_libc2.27_kernel4.15.0_release"
# Use them to compile our test program.
g++-9 -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -I "${TBB}/include" -L
"${TBB_RELEASE}" -Wl,-rpath,"${TBB_RELEASE}" -o main.out main.cpp -ltbb
./main.out
Test program analysis
I have tested with this program that compares the parallel and serial sorting speed.
main.cpp
#include <algorithm>
#include <cassert>
#include <chrono>
#include <execution>
#include <random>
#include <iostream>
#include <vector>
int main(int argc, char **argv) {
using clk = std::chrono::high_resolution_clock;
decltype(clk::now()) start, end;
std::vector<unsigned long long> input_parallel, input_serial;
unsigned int seed;
unsigned long long n;
// CLI arguments;
std::uniform_int_distribution<uint64_t> zero_ull_max(0);
if (argc > 1) {
n = std::strtoll(argv[1], NULL, 0);
} else {
n = 10;
}
if (argc > 2) {
seed = std::stoi(argv[2]);
} else {
seed = std::random_device()();
}
std::mt19937 prng(seed);
for (unsigned long long i = 0; i < n; ++i) {
input_parallel.push_back(zero_ull_max(prng));
}
input_serial = input_parallel;
// Sort and time parallel.
start = clk::now();
std::sort(std::execution::par_unseq, input_parallel.begin(), input_parallel.end());
end = clk::now();
std::cout << "parallel " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;
// Sort and time serial.
start = clk::now();
std::sort(std::execution::seq, input_serial.begin(), input_serial.end());
end = clk::now();
std::cout << "serial " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;
assert(input_parallel == input_serial);
}
On Ubuntu 19.10, Lenovo ThinkPad P51 laptop with CPU: Intel Core i7-7820HQ CPU (4 cores / 8 threads, 2.90 GHz base, 8 MB cache), RAM: 2x Samsung M471A2K43BB1-CRC (2x 16GiB, 2400 Mbps) a typical output for an input with 100 million numbers to be sorted:
./main.out 100000000
was:
parallel 2.00886 s
serial 9.37583 s
so the parallel version was about 4.5 times faster! See also: What do the terms "CPU bound" and "I/O bound" mean?
We can confirm that the process is spawning threads with strace:
strace -f -s999 -v ./main.out 100000000 |& grep -E 'clone'
which shows several lines of type:
[pid 25774] clone(strace: Process 25788 attached
[pid 25774] <... clone resumed> child_stack=0x7fd8c57f4fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fd8c57f59d0, tls=0x7fd8c57f5700, child_tidptr=0x7fd8c57f59d0) = 25788
Also, if I comment out the serial version and run with:
time ./main.out 100000000
I get:
real 0m5.135s
user 0m17.824s
sys 0m0.902s
which confirms again that the algorithm was parallelized since real < user, and gives an idea of how effectively it can be parallelized in my system (about 3.5x for 8 cores).
Error messages
Hey, Google, index this please.
If you don't have tbb installed, the error is:
In file included from /usr/include/c++/9/pstl/parallel_backend.h:14,
from /usr/include/c++/9/pstl/algorithm_impl.h:25,
from /usr/include/c++/9/pstl/glue_execution_defs.h:52,
from /usr/include/c++/9/execution:32,
from parallel_sort.cpp:4:
/usr/include/c++/9/pstl/parallel_backend_tbb.h:19:10: fatal error: tbb/blocked_range.h: No such file or directory
19 | #include <tbb/blocked_range.h>
| ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.
so we see that <execution> depends on an uninstalled TBB component.
If TBB is too old, e.g. the default Ubuntu 18.04 one, it fails with:
#error Intel(R) Threading Building Blocks 2018 is required; older versions are not supported.
You can refer https://en.cppreference.com/w/cpp/compiler_support to check all C++ feature implementation status. For your case, just search "Standardization of Parallelism TS", and you will find only MSVC and Intel C++ compilers support this feature now.
Intel has released a Parallel STL library which follows the C++17 standard:
https://github.com/intel/parallelstl
It is being merged into GCC.
Gcc does not yet implement the Parallelism TS (see https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2017)
However libstdc++ (with gcc) has an experimental mode for some equivalent parallel algorithms. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html
Getting it to work:
Any use of parallel functionality requires additional compiler and
runtime support, in particular support for OpenMP. Adding this support
is not difficult: just compile your application with the compiler flag
-fopenmp. This will link in libgomp, the GNU Offloading and Multi Processing Runtime Library, whose presence is mandatory.
Code example
#include <vector>
#include <parallel/algorithm>
int main()
{
std::vector<int> v(100);
// ...
// Explicitly force a call to parallel sort.
__gnu_parallel::sort(v.begin(), v.end());
return 0;
}
Gcc now support execution header, but not standard clang build from https://apt.llvm.org

Xcode: how to build for older Intel processors (i5, Core 2 Duo) on i7

My application is crashing when built on a new Apple laptop and then launched on a much older Apple laptop.
The application is built using Xcode 6.4, on OSX 10.9 and 10.10, when using llvm 6.1 and C++11. The SDK is 10.10, the target OSX is 10.7. Optimizations are off.
The crash is very very early on when the C runtime is loading my application binary and initializing the modules.
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 com.MyCompany.MyApplication 0x000000010cd10e7a _GLOBAL__I_a + 10
1 dyld 0x00007fff61fd3ceb ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 265
2 dyld 0x00007fff61fd3e78 ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40
3 dyld 0x00007fff61fd0871 ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int,
This is before any of my application code. The crash does not occur on the build machine (i7 CPU). Crashes occur on i5 and Core 2 Duo machines. I suspect that an extended (CPU specific) instruction is creating the crash on load.
When I use the same Xcode, same llvm, etc to build the application on the Core 2 Duo machine there is no crash.
I am also using homebrew: libmtp, libusb, libusb-compat, cryptopp, curl (with c-ares, openssl, nghttp2), boost. I have specified C++11 where necessary, and have specified --build-bottle. I am statically linking to these libraries.
I have tried to use otool -tV on all libraries, the final binary, etc to find SSE instructions.
I have tried to set the Xcode LLVM build setting "Enable Additional Vector Extensions" to "platform" and "SSE3" to no avail. This is probably because homebrew isn't passing the --universal flag from curl to the building of openssl and it's cryptlib.
I have taken static libraries libcurl.a (CURL), libssl.a (OpenSSL), libcrypto.a (OpenSSL), libz.a (zlib) from the older machine and added them to my repository. Using Xcode to link them into my application solves the problem.
Are there other tools I can should use to narrow down the offending instruction?
Are there other explanations for the crash?
Addendum:
In addition to building the libraries on an older machine, I have also created a proof of concept, minimal, instant crash program that reports a slightly different crash location, but demonstrates the issue:
On an i7 (new Apple computer with new Intel CPU), use homebrew to install:
brew install curl --with-c-ares --with-openssl
Then copy this source into file sse.cpp:
#define CURL_STATICLIB
#include <curl/curl.h>
int main(int argc, const char * argv[]) {
curl_global_init(CURL_GLOBAL_ALL);
return 0;
}
Compile it:
clang++ sse.cpp -c -arch x86_64 -I/usr/local/opt/curl/include
clang++ -o a.out sse.o /usr/local/opt/openssl/lib/libssl.a /usr/local/opt/openssl/lib/libcrypto.a /usr/local/opt/zlib/lib/libz.a /usr/local/opt/curl/lib/libcurl.a /usr/local/opt/c-ares/lib/libcares.a -stdlib=libc++ -framework LDAP
Now move to an older Apple computer with older Intel CPU, and crash it:
./a.out
Crash Report (compressed):
Process: a.out [569]
...
Code Type: X86-64 (Native)
Parent Process: bash [448]
Responsible: Terminal [339]
...
OS Version: Mac OS X 10.10.5 (14F27)
...
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes: 0x0000000000000001, 0x0000000000000000
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 a.out 0x000000010dbdce3f ENGINE_new + 36
1 a.out 0x000000010dbe05e3 ENGINE_load_dynamic + 11
2 a.out 0x000000010dbdf04a ENGINE_load_builtin_engines + 24
3 a.out 0x000000010dc76b36 Curl_ossl_init + 14
4 a.out 0x000000010dc5c2a5 curl_global_init + 114
5 a.out 0x000000010db51d95 main + 37
6 libdyld.dylib 0x00007fff88b735c9 start + 1
Does your code work when you disable compiler optimizations? If not, how about trying an older version of Xcode? It could just be a compiler bug, though I'd hope not! If you can find a working compiler or set of compiler options to check against, you could use LLVM's bugpoint tool to isolate which file is being miscompiled.
The solution appears to involve using:
export HOMEBREW_BUILD_BOTTLE=1
export HOMEBREW_BOTTLE_ARCH=core2
When building the homebrew libraries. Using Intel XED I was able to check the emitted machine code for unsupported instructions:
xed_cmd="/usr/local/bin/xed"
ar -x libcurl.a
parts=(*.o)
for j in "${parts[#]}"; do
chipcheck=$(${xed_cmd} -i ${j} -chip-check ${chipToCheck})
chiperrors=$(echo "${chipcheck}" | grep "# Total Chip Check Errors")
if [[ "$chiperrors" != "# Total Chip Check Errors: 0" ]] ; then
echo ERROR ${libname} ${j} $chiperrors
fi
done

gmon.out isn't created when I compile with -pg flag with g++

I'm running on Mac OSX, version 10.8.5 (Mountain Lion). I have the following simple C++ code.
main.cpp:
#include <iostream>
int main ()
{
std::cout << "Hello world!"<<std::endl;
std::cout << "Goodbye world!"<<std::endl;
return 0;
}
I'm trying to get gprof to work on my computer. As the manual suggests, I enter the following two lines into my terminal:
g++ -g -pg main.cpp -o a.out
./a.out
However this does not generate a gmon.out file as it is supposed to. When I try typing gprof in the terminal, it says:
gprof: can't open: gmon.out (No such file or directory)
which is to be expected since gmon.out isn't there...
Any ideas on what I'm doing wrong?
EDIT: Some other things that may help:
My friend, who has a similar OS X version (I can ask him later to confirm), and the exact same versions of g++ and gprof, was able to
use gprof successfully as I have outlined.
I'm using an older version of g++ but I have read online that updating to a newer version didn't help.
a.out works perfectly, it prints out Hello world! and Goodbye world!. I also tried this with a more complex C++ program with
several classes and it still has the same problem. Everything
compiles and runs normally but no gmon.out file is produced.
You have to realize that OS X/MacOS does not provide GNU GCC on the system by default.
Note the output of this command:
ls -la /usr/bin/g++ /usr/bin/clang++
These executables look identical. (Actually! It looks like they are different, but somehow the filesize is identical!)
As far as I can tell, clang doesn't support the production of gprof output. As confusing as it may be, the gcc program will run clang.
I would recommend trying to use homebrew to install GCC on OS X/MacOS. You do want to be careful about how it gets installed, etc., so that you know which command corresponds to which compiler.