perf-events not showing StackTraces on debian 8 jessie - profiling

I am trying to profile a simple C program with perf-events on debian 8 jessie. I can see symbols but I am unable to get stacktraces. The same procedure generates good stacktraces on ubuntu 16.04.
I have installed linux-image-amd64-dbg and libc6-dbg.
I have confirmed that kernel config parameters include CONFIG_KALLSYMS=y
I have compiled the program with gcc -g3 -O0 hello.c to enable debug symbols.
I start profiling with the following command.
sudo perf record -g ./a.out
I generate a flame graph Flame Graph with the following command
sudo perf script | ~/code/FlameGraph/stackcollapse-perf.pl | \
~/code/FlameGraph/flamegraph.pl > perf-kernel.svg
This is the listing for hello.c which I am trying to profile
#include <stdio.h>
#include <unistd.h>
void do2() {
FILE* f = fopen("/dev/zero", "r");
int fd = fileno(f);
char buf[100];
while(1) {
read(fd, buf, sizeof(buf)/sizeof(buf[0]));
}
}
int main(void)
{
do2();
return 0;
}
This is the flame graph with debian jessie
This is the flame graph with ubuntu
Why are stack traces missing in debian jessie?
Thanks
Sharath

Managed to find the issue.
I had to enable CONFIG_FRAME_POINTER=y and recompile the kernel as per Brendan Gregg's perf site
It is unfortunate that the kernel shipping with Debian 8 does not have this enabled, which breaks perf

Related

Can't compile with mingw linking a library on Linux to create executable for Windows

I'm trying to compile C/C++ code from my Debian partition to generate some executable files for Windows.
Running $ uname -a on the command line gives Linux machine 5.14.0-2-amd64 #1 SMP Debian 5.14.9-2 (2021-10-03) x86_64 GNU/Linux. My processor is an Intel® Core™ i5-1035G4 CPU # 1.10GHz × 8, with a Mesa Intel® Iris(R) Plus Graphics (ICL GT1.5) integrated GPU.
A minimal example to show my current situation includes the following code (called code.cpp):
#include <iostream>
#include <CL/opencl.hpp>
int main()
{
std::vector <cl::Platform> all_platforms; //Get all platforms
cl::Platform::get(&all_platforms);
if (all_platforms.size() == 0)
{
std::cout << "No platforms found. Check OpenCL installation." << std::endl;
exit(1);
}
int pz = all_platforms.size();
std::cout << "Platforms size: " << pz << std::endl;
for (int i = 0; i < pz; i++)
{
cl::Platform default_platform = all_platforms[i];
std::cout << "Using platform: " << default_platform.getInfo<CL_PLATFORM_NAME>() << std::endl;
}
return(0);
}
which uses OpenCL to print all recognized devices. I compile my code writing g++ code.cpp -o code.out -lOpenCL. The executable file code.out works fine, doing what you would expect it to do. I have another program which uses GSL (GNU Scientific Library) written in C which also works well, linking with -lgsl (therefore I think there's not a problem with my code or the regular compilation process). Both OpenCL and GSL were installed from the official repositories (~# apt install ...) with no problem at all. When I execute code.out the output is
Platforms size: 2
Using platform: Intel(R) OpenCL HD Graphics
Using platform: Portable Computing Language
I installed mingw (via ~# apt install mingw-w64) to create executable files to be run on Windows, and for basic programs (i.e. without "external" libraries) it works well (replacing gcc by x86_64-w64-mingw32-gcc or i686-w64-mingw32-gcc). However for the code written above (and for the one using GSL) it doesn't work. Most of the error outputs are very similar for both examples, and I will show the command line outputs for the code using OpenCL.
When I try x86_64-w64-mingw32-g++ code.cpp -o code.out -lOpenCL the output is
code.cpp:2:10: fatal error: CL/opencl.hpp: No such file or directory
2 | #include <CL/opencl.hpp>
| ^~~~~~~~~~~~~~~
compilation terminated.
I thought this meant that I needed to be more specific when linking and including, so I gave the explicit path where the headers are located (found them via dpkg -S opencl.hpp or dpkg -S gsl*.h), and the .so file for OpenCL was found via dpkg -S *OpenCL.so, while the one for GSL was found using dpkg -S *gsl.so. When I try x86_64-w64-mingw32-g++ code.cpp -o code.out -I/usr/include/ -L/usr/lib/x86_64-linux-gnu/libOpenCL.so the output is
In file included from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/cwchar:44,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/bits/postypes.h:40,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/iosfwd:40,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/ios:38,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/ostream:38,
from /usr/lib/gcc/x86_64-w64-mingw32/10-win32/include/c++/iostream:39,
from code.cpp:1:
/usr/include/wchar.h:27:10: fatal error: bits/libc-header-start.h: No such file or directory
27 | #include <bits/libc-header-start.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Therefore it seems that MinGW needs additional instructions to properly find, include and/or link the libraries. I don't know how to solve this problem. Those are my attempts based on some answers I've found, and the documentation provided by MinGW says nothing about this. The exact same problem occurs no matter if I use x86_64-w64-mingw32-g++ or i686-w64-mingw32-g++, or their gcc counterparts.
When cross-compiling make sure you are only linking things targeting the same platform together. In other words, your dependencies (and their dependencies) must be for the same target platform. You can't link with those libraries for your build platform.
So if you have a Windows 64-bit application that depends on OpenCL, you will need to link it against a Windows 64-bit build of OpenCL.
The OpenCL the sources can be found here:
https://github.com/KhronosGroup/OpenCL-Headers
https://github.com/KhronosGroup/OpenCL-ICD-Loader
so you would need to build those first.

Speed up extremely slow MinGW-w64 compilation/linking?

How can I speed up MinGW-w64's extremely slow C++ compilation/linking?
Compiling a trivial "Hello World" program:
#include <iostream>
int main()
{
std::cout << "hello world" << std::endl;
}
...takes 3 minutes(!) on this otherwise-unloaded Windows 10 box (i7-6700, 32GB of RAM, decent SATA SSD):
> ptime.exe g++ main.cpp
ptime 1.0 for Win32, Freeware - http://www.pc-tools.net/
Copyright(C) 2002, Jem Berkes <jberkes#pc-tools.net>
=== g++ main.cpp ===
Execution time: 180.488 s
Process Explorer shows the g++ process tree bottoming out in ld.exe which doesn't use any appreciable CPU or I/O for the duration.
Running the g++ process tree through API Monitor shows there are three unusually long syscalls in ld.exe: two NtCreateFile()s and a NtOpenFile(), each operating on a.exe and taking 60 seconds apiece.
The slowness only happens when using the default a.exe output; g++ -o foo.exe main.cpp takes 2 seconds, tops.
"Well don't use a.exe as an output name then!" isn't really a solution since this behavior causes CMake to take ages doing compiler feature detection.
GCC toolchain versions:
>g++ --version
g++ (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0
>ld --version
GNU ld (GNU Binutils) 2.30
Given that I couldn't repro the problem in a clean Windows 10 VM and the dependence on the output filename led me down the path of anti-virus/anti-malware interference.
fltmc instances listed several possible filesystem filter drivers; guess-n-check narrowed it down to two of Carbon Black's: carbonblackk & ParityDriver.
Using Regedit to disable them via setting Start to 0x4 ("Disabled", 0x2 == Automatic, 0x3 == Manual) in these two registry keys followed by a reboot fixed the slowness:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\carbonblackk
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ParityDriver

MySQL Connector/C++ crashes with g++8.1.0 and C++17

I just started programming in C++ using Visual Studio 2017 and connected to my database. Everything worked fine so far. Then I wanted to use a c++17 feature and realized I needed to upgrade my compiler.
After upgrading my compiler to g++8.1.0 using this blog with these commands (and a lot of trial and error):
git clone https://bitbucket.org/sol_prog/raspberry-pi-gcc-binary.git
cd raspberry-pi-gcc-binary
tar xf gcc-8.1.0.tar.bz2
mv gcc-8.1.0 /usr/bin
cd ..
rm -r raspberry-pi-gcc-binary
rm /usr/bin/gcc
rm /usr/bin/g++
ln -s /usr/bin/gcc-8.1.0/bin/gcc-8.1.0 /usr/bin/gcc
ln -s /usr/bin/gcc-8.1.0/bin/g++-8.1.0 /usr/bin/g++
# After getting the error "GLIBCXX_3.4.21 not found" I did the following:
rm /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
# I don't know if this (next line) was necessary
rm /usr/lib/arm-linux-gnueabihf/libstdc++.so.6.0.22
ln -s /usr/bin/gcc-8.1.0/lib/libstdc++.so.6.0.25 /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
Now the program compiles, but when I excecute it, it crashes. When I switch to the old compiler (g++-4.9) it works again. Maybe the new compiler isn't properly installed. Instructions on how to install GCC-8 on Raspbian would be great.
This is basically the program:
#include<string>
#include<stdlib.h>
#include<iostream>
#include<exception>
#include<mysql_driver.h> //OR #include<cppconn/driver.h>
#include<mysql_connection.h> //OR #include<cppconn/connection.h>
sql::mysql::MySQL_Driver* driver; //OR sql::Driver*
sql::Connection* conn;
int main(const int argc, const char* argv[]) {
try {
driver = sql::mysql::get_driver_instance(); //OR get_driver_instance();
conn = driver->connect("tcp://127.0.0.1:3306", "root", "MY_PW");
conn->setSchema("MY_DB");
} catch (std::exception &e) {
//err("Failed to connect to the database.", e);
}
}
And this is the error I get in Visual studio:
Aborted (gdb) 1055-var-create - * "__size" (gdb)
1066-stack-select-frame 16 (gdb) 1067-var-create - * "argc" The thread
'LSTGA.out' (0x5cfa) has exited with code 0 (0x0). The program '' has
exited with code 0 (0x0).
And this is the error I get when running it directly via SSH:
* Error in `./LSTGA.out': munmap_chunk(): invalid pointer: 0x7ec3c50c *
Aborted
The call stack:
Additional information:
Previous compiler: gcc-4.9 / g++-4.9 (still present)
New compiler: gcc-8.1.0 / g++-8.1.0
Previous LIBSTDC++: 6.0.22 (Removed)
New LIBSTDC++: 6.0.25
Remote machine: Raspberry Pi 2b
Remote OS: Raspbian
Local OS: Windows 10
Visual studio Community 2017 15.8.8
Thanks in advance. -Minding
I found the issue here:
Even a small change in the compiler version can cause problems. If you obtain error messages that you suspect are related to binary incompatibilities, build Connector/C++ from source, using the same compiler and linker that you use to build and link your application.
So the solution is to build the MySQL Connector/C++ from source as described here.
I hope this helps.

Are C++17 Parallel Algorithms implemented already?

I was trying to play around with the new parallel library features proposed in the C++17 standard, but I couldn't get it to work. I tried compiling with the up-to-date versions of g++ 8.1.1 and clang++-6.0 and -std=c++17, but neither seemed to support #include <execution>, std::execution::par or anything similar.
When looking at the cppreference for parallel algorithms there is a long list of algorithms, claiming
Technical specification provides parallelized versions of the following 69 algorithms from algorithm, numeric and memory: ( ... long list ...)
which sounds like the algorithms are ready 'on paper', but not ready to use yet?
In this SO question from over a year ago the answers claim these features hadn't been implemented yet. But by now I would have expected to see some kind of implementation. Is there anything we can use already?
GCC 9 has them but you have to install TBB separately
In Ubuntu 19.10, all components have finally aligned:
GCC 9 is the default one, and the minimum required version for TBB
TBB (Intel Thread Building Blocks) is at 2019~U8-1, so it meets the minimum 2018 requirement
so you can simply do:
sudo apt install gcc libtbb-dev
g++ -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -o main.out main.cpp -ltbb
./main.out
and use as:
#include <execution>
#include <algorithm>
std::sort(std::execution::par_unseq, input.begin(), input.end());
see also the full runnable benchmark below.
GCC 9 and TBB 2018 are the first ones to work as mentioned in the release notes: https://gcc.gnu.org/gcc-9/changes.html
Parallel algorithms and <execution> (requires Thread Building Blocks 2018 or newer).
Related threads:
How to install TBB from source on Linux and make it work
trouble linking INTEL tbb library
Ubuntu 18.04 installation
Ubuntu 18.04 is a bit more involved:
GCC 9 can be obtained from a trustworthy PPA, so it is not so bad
TBB is at version 2017, which does not work, and I could not find a trustworthy PPA for it. Compiling from source is easy, but there is no install target which is annoying...
Here are fully automated tested commands for Ubuntu 18.04:
# Install GCC 9
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-9 g++-9
# Compile libtbb from source.
sudo apt-get build-dep libtbb-dev
git clone https://github.com/intel/tbb
cd tbb
git checkout 2019_U9
make -j `nproc`
TBB="$(pwd)"
TBB_RELEASE="${TBB}/build/linux_intel64_gcc_cc7.4.0_libc2.27_kernel4.15.0_release"
# Use them to compile our test program.
g++-9 -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -I "${TBB}/include" -L
"${TBB_RELEASE}" -Wl,-rpath,"${TBB_RELEASE}" -o main.out main.cpp -ltbb
./main.out
Test program analysis
I have tested with this program that compares the parallel and serial sorting speed.
main.cpp
#include <algorithm>
#include <cassert>
#include <chrono>
#include <execution>
#include <random>
#include <iostream>
#include <vector>
int main(int argc, char **argv) {
using clk = std::chrono::high_resolution_clock;
decltype(clk::now()) start, end;
std::vector<unsigned long long> input_parallel, input_serial;
unsigned int seed;
unsigned long long n;
// CLI arguments;
std::uniform_int_distribution<uint64_t> zero_ull_max(0);
if (argc > 1) {
n = std::strtoll(argv[1], NULL, 0);
} else {
n = 10;
}
if (argc > 2) {
seed = std::stoi(argv[2]);
} else {
seed = std::random_device()();
}
std::mt19937 prng(seed);
for (unsigned long long i = 0; i < n; ++i) {
input_parallel.push_back(zero_ull_max(prng));
}
input_serial = input_parallel;
// Sort and time parallel.
start = clk::now();
std::sort(std::execution::par_unseq, input_parallel.begin(), input_parallel.end());
end = clk::now();
std::cout << "parallel " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;
// Sort and time serial.
start = clk::now();
std::sort(std::execution::seq, input_serial.begin(), input_serial.end());
end = clk::now();
std::cout << "serial " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;
assert(input_parallel == input_serial);
}
On Ubuntu 19.10, Lenovo ThinkPad P51 laptop with CPU: Intel Core i7-7820HQ CPU (4 cores / 8 threads, 2.90 GHz base, 8 MB cache), RAM: 2x Samsung M471A2K43BB1-CRC (2x 16GiB, 2400 Mbps) a typical output for an input with 100 million numbers to be sorted:
./main.out 100000000
was:
parallel 2.00886 s
serial 9.37583 s
so the parallel version was about 4.5 times faster! See also: What do the terms "CPU bound" and "I/O bound" mean?
We can confirm that the process is spawning threads with strace:
strace -f -s999 -v ./main.out 100000000 |& grep -E 'clone'
which shows several lines of type:
[pid 25774] clone(strace: Process 25788 attached
[pid 25774] <... clone resumed> child_stack=0x7fd8c57f4fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fd8c57f59d0, tls=0x7fd8c57f5700, child_tidptr=0x7fd8c57f59d0) = 25788
Also, if I comment out the serial version and run with:
time ./main.out 100000000
I get:
real 0m5.135s
user 0m17.824s
sys 0m0.902s
which confirms again that the algorithm was parallelized since real < user, and gives an idea of how effectively it can be parallelized in my system (about 3.5x for 8 cores).
Error messages
Hey, Google, index this please.
If you don't have tbb installed, the error is:
In file included from /usr/include/c++/9/pstl/parallel_backend.h:14,
from /usr/include/c++/9/pstl/algorithm_impl.h:25,
from /usr/include/c++/9/pstl/glue_execution_defs.h:52,
from /usr/include/c++/9/execution:32,
from parallel_sort.cpp:4:
/usr/include/c++/9/pstl/parallel_backend_tbb.h:19:10: fatal error: tbb/blocked_range.h: No such file or directory
19 | #include <tbb/blocked_range.h>
| ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.
so we see that <execution> depends on an uninstalled TBB component.
If TBB is too old, e.g. the default Ubuntu 18.04 one, it fails with:
#error Intel(R) Threading Building Blocks 2018 is required; older versions are not supported.
You can refer https://en.cppreference.com/w/cpp/compiler_support to check all C++ feature implementation status. For your case, just search "Standardization of Parallelism TS", and you will find only MSVC and Intel C++ compilers support this feature now.
Intel has released a Parallel STL library which follows the C++17 standard:
https://github.com/intel/parallelstl
It is being merged into GCC.
Gcc does not yet implement the Parallelism TS (see https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2017)
However libstdc++ (with gcc) has an experimental mode for some equivalent parallel algorithms. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html
Getting it to work:
Any use of parallel functionality requires additional compiler and
runtime support, in particular support for OpenMP. Adding this support
is not difficult: just compile your application with the compiler flag
-fopenmp. This will link in libgomp, the GNU Offloading and Multi Processing Runtime Library, whose presence is mandatory.
Code example
#include <vector>
#include <parallel/algorithm>
int main()
{
std::vector<int> v(100);
// ...
// Explicitly force a call to parallel sort.
__gnu_parallel::sort(v.begin(), v.end());
return 0;
}
Gcc now support execution header, but not standard clang build from https://apt.llvm.org

CUDA nvlink warning : SM Arch ('sm_35') not found

yesterday installed cuda-6.5 to my ubunutu14.04. I followed the steps stated in cuda's getting started guide. Checked for System requirements and mine was OK. Did pre-installations, uninstalled previously installed cuda, and installed package manager installation. All these steps were successfully performed. I skipped steps runfile installation and croos-build environment fro arm. In [post-installation actions][2] step, added
export PATH=/usr/local/cuda-6.5/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATH
these lines to .profile file. I upgraded my driver to the latest available driver by running the command sudo apt-get install cuda-drivers. Also verified that I installed correct driver. Rebooted my computer and vaulla cuda-6.5 is build successfully. But when I compile my simpleCuda.cu file
#include <stdio.h>
#include <cuda.h>
#include<iostream>
#include <thrust/device_vector.h>
#include <thrust/logical.h>
#include <thrust/functional.h>
#include <cassert>
#include <cublas_v2.h>
using namespace std;
int main(){
float* dev,host;
cudaError_t stat = cudaMalloc((void**)&dev,10*sizeof(float));
cout << "stat " << stat << endl;
return 0;
}
with nvcc -arch=sm_35 -rdc=true -lcublas -lcublas_device -lcudadevrt -o my simpleCuda.cu compile parameters set I got a warning message
nvlink warning : SM Arch ('sm_35') not found in '/usr/local/cuda-6.5/bin/../targets/x86_64-linux/lib/libcublas_device.a:maxwell_sgemm.asm.o'
nvlink warning : SM Arch ('sm_35') not found in '/usr/local/cuda-6.5/bin/../targets/x86_64-linux/lib/libcublas_device.a:maxwell_sm50_sgemm.o'
`
. In this link I see that it can be ignored. But I don't want to ignore this message. I compiled this simpleCuda.cu with the same compilation parameters set on different computer with cuda-5.5 compilation tool. It does not give me any warning message about architecture linking (-arch=sm_35). I want to get rid of this warning message. These compile parameters are not necessary for this particular code I posted, but further I will need them. I appreciate all your help.
This was apparently a toolchain limitation which was rectified in the CUDA 7 production release.
[This answer was assembled from comments to get the question off the unanswered queue for the CUDA tag]