using address sanitizer with OpenCV - c++

I'm trying to use Google's Address Sanitizer with a CUDA project, more precisely with OpenCV cuda functions. However I got an 'out of memory' error on the first cuda call.
OpenCV Error: Gpu API call (out of memory) in getDevice, file opencv-2.4.11/src/opencv-2.4.11/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp, line 664
terminate called after throwing an instance of 'cv::Exception'
what(): opencv-2.4.11/src/opencv-2.4.11/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp:664: error: (-217) out of memory in function getDevice
It can be reproduced with
#include <opencv2/gpu/gpu.hpp>
int main()
{
cv::gpu::printCudaDeviceInfo(cv::gpu::getDevice());
return 0;
}
compiled with
clang++ -fsanitize=address -lstdc++ -lopencv_gpu -lopencv_core -o sanitizer sanitizer.cpp && LD_LIBRARY_PATH=/usr/local/lib ./sanitizer
I've got the same result with gcc.
I've also tried blacklisting cuda functions without result.
Now using cuda without opencv:
#include <cuda_runtime.h>
int main()
{
int count = -1;
cudaGetDevice(&count);
cout << "Device count: " << count << endl;
return 0;
}
clang++ -O1 -g -fsanitize=address -fsanitize-blacklist=asan.blacklist -stdlib=libstdc++ -lstdc++ -I/opt/cuda/include -L/opt/cuda/lib64 -lcudart -o sanitizer sanitizer.cpp && ./sanitizer
The sanitizer stops on a memory leak:
=================================================================
==25344==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 136 byte(s) in 1 object(s) allocated from:
#0 0x4bc4a2 (/home/pluc/work/tests/sanitizer+0x4bc4a2)
#1 0x7f71f0fa69ba (<unknown module>)
SUMMARY: AddressSanitizer: 136 byte(s) leaked in 1 allocation(s).
My question is how can I use the address sanitizer to sanitize my software without getting stuck with this? How can I at least properly blacklist all cuda related calls?
I didn't find anything relevant on a famous web search engine. It's like people don't use cuda or asan or both. Do guys just have a build with cuda completly disabled?
I'm guessing asan is having a hard time with cuda memory management but I'm looking for a way to use this tool for the rest of my codebase at least.

For people looking here: The solution I have found (based on the related github issue mentioned by #BenC) is to run export ASAN_OPTIONS=protect_shadow_gap=0.
I have not succeeded in finding a way to change the behavior when compiling yet.

Related

How do you find out the cause of rare crashes that are caused by things that are not caught by try catch (access violation, divide by zero, etc.)?

I am a .NET programmer who is starting to dabble into C++. In C# I would put the root function in a try catch, this way I would catch all exceptions, save the stack trace, and this way I would know what caused the exception, significantly reducing the time spent debugging.
But in C++ some stuff(access violation, divide by zero, etc.) are not caught by try catch. How do you deal with them, how do you know which line of code caused the error?
For example let's assume we have a program that has 1 million lines of code. It's running 24/7, has no user-interaction. Once in a month it crashes because of something that is not caught by try catch. How do you find out which line of code caused the crash?
Environment: Windows 10, MSVC.
C++ is meant to be a high performance language and checks are expensive. You can't run at C++ speeds and at the same time have all sorts of checks. It is by design.
Running .Net this way is akin to running C++ in debug mode with sanitizers on. So if you want to run your application with all the information you can, turn on debug mode in your cmake build and add sanitizers, at least undefined and address sanitizers.
For Windows/MSVC it seems that address sanitizers were just added in 2021. You can check the announcement here: https://devblogs.microsoft.com/cppblog/addresssanitizer-asan-for-windows-with-msvc/
For Windows/mingw or Linux/* you can use Gcc and Clang's builtin sanitizers that have largely the same usage/syntax.
To set your build to debug mode:
cd <builddir>
cmake -DCMAKE_BUILD_TYPE=debug <sourcedir>
To enable sanitizers, add this to your compiler command line: -fsanitize=address,undefined
One way to do that is to add it to your cmake build so altogether it becomes:
cmake -DCMAKE_BUILD_TYPE=debug \
-DCMAKE_CXX_FLAGS_DEBUG_INIT="-fsanitize=address,undefined" \
<sourcedir>
Then run your application binary normally as you do. When an issue is found a meaningful message will be printed along with a very informative stack trace.
Alternatively you can set so the sanitizer breaks inside the debugger (gdb) so you can inspect it live but that only works with the undefined sanitizer. To do so, replace
-fsanitize=address,undefined
with
-fsanitize-undefined-trap-on-error -fsanitize-trap=undefined -fsanitize=address
For example, this code has a clear problem:
void doit( int* p ) {
*p = 10;
}
int main() {
int* ptr = nullptr;
doit(ptr);
}
Compile it in the optimized way and you get:
$ g++ -O3 test.cpp -o test
$ ./test
Segmentation fault (core dumped)
Not very informative. You can try to run it inside the debugger but no symbols are there to see.
$ g++ -O3 test.cpp -o test
$ gdb ./test
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
...
Reading symbols from ./test...
(No debugging symbols found in ./test)
(gdb) r
Starting program: /tmp/test
Program received signal SIGSEGV, Segmentation fault.
0x0000555555555044 in main ()
(gdb)
That's useless so we can turn on debug symbols with
$ g++ -g3 test.cpp -o test
$ gdb ./test
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
...
Reading symbols from ./test...
(gdb) r
Starting program: /tmp/test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
test.cpp:4:5: runtime error: store to null pointer of type 'int'
Program received signal SIGSEGV, Segmentation fault.
0x0000555555555259 in doit (p=0x0) at test.cpp:4
4 *p = 10;
Then you can inspect inside:
(gdb) p p
$1 = (int *) 0x0
Now, turn on sanitizers to get even more messages without the debugger:
$ g++ -O0 -g3 test.cpp -fsanitize=address,undefined -o test
$ ./test
test.cpp:4:5: runtime error: store to null pointer of type 'int'
AddressSanitizer:DEADLYSIGNAL
=================================================================
==931717==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x563b7b66c259 bp 0x7fffd167c240 sp 0x7fffd167c230 T0)
==931717==The signal is caused by a WRITE memory access.
==931717==Hint: address points to the zero page.
#0 0x563b7b66c258 in doit(int*) /tmp/test.cpp:4
#1 0x563b7b66c281 in main /tmp/test.cpp:9
#2 0x7f36164a9082 in __libc_start_main ../csu/libc-start.c:308
#3 0x563b7b66c12d in _start (/tmp/test+0x112d)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /tmp/test.cpp:4 in doit(int*)
==931717==ABORTING
That is much better!

lli is generating run-time error for clang++ generated IR while the generated executable is not generating run-time error for c++ source code

I am trying to generate a bit code from a c++ source code and running through the just-in-time compiler. When I compile through the clang++ and generate binary executable it runs perfectly but when I generated the bitcode and tried running through the JIT with lli command it generates run-time error. Could you please help me understanding what's going on.
For example: Let example.cpp contains the following code:
#include <iostream>
int main(){
std::cout << "\nHello World!";
return 0;
}
I am using the following command to generate executable which runs perfectly fine.
clang++ example.cpp
I am using the following command to generate the bitcode:
clang++ -S -emit-llvm example.cpp
And then running through the JIT using the following command which generates run-time error:
lli example.ll
I am getting the following access violation error:
Stack dump:
0. Program arguments: lli example.ll
#0 0x00000000025fd9af llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/xpc/llvm/llvm-project1/llvm-project/llvm/lib/Support/Unix/Signals.inc:564:0
#1 0x00000000025fda42 PrintStackTraceSignalHandler(void*) /home/xpc/llvm/llvm-project1/llvm-project/llvm/lib/Support/Unix/Signals.inc:625:0
#2 0x00000000025fb7ca llvm::sys::RunSignalHandlers() /home/xpc/llvm/llvm-project1/llvm-project/llvm/lib/Support/Signals.cpp:68:0
#3 0x00000000025fd329 SignalHandler(int) /home/xpc/llvm/llvm-project1/llvm-project/llvm/lib/Support/Unix/Signals.inc:406:0
#4 0x00007fa75dbdc390 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x11390)
Segmentation fault (core dumped)
Try compiling with clang++ -S -emit-llvm -fno-use-cxa-atexit example.cpp.
I think it's probably because clang and gcc try prefer __cxa_atexit to
atexit by default (these are functions used for cleaning up global objects when a program exits). Meaning you'll get a linker error if your libc implementation doesn't support the former. So disabling the use-cxa-atexit flag should work.
It seems that it's a library linking issue. Try to use the followings:
Try lli -force-interpreter example.ll instead of lli example.ll
When JIT can't work properly, you should try interpreter.
if the following err:
LLVM ERROR: Could not resolve external global address: __dso_handle
Then add -fno-use-cxa-atexit to clang flags, reasons from [LLVMdev] MCJIT/interpreter and iostream.
After retry lli -force-interpreter example.ll if still have some errors like LLVM ERROR: Tried to execute an unknown external function: .... Then you should recompile LLVM with ffi library enabled. See the following for reasons: Advice on Packaging LLVM¶
and [LLVMdev] lli --force-interpreter does not find external function

clang and clang++ with ASAN generate different output

I'm trying to add ASAN (Google's/Clang's address sanitize) to our project and stuck at this problem.
For example, we have this simple C++ code
#include <iostream>
int main() {
std::cout << "Started Program\n";
int* i = new int();
*i = 42;
std::cout << "Expected i: " << *i << std::endl;
}
Then, I build it with clang++
clang++-3.8 -o memory-leak++ memory_leak.cpp -fsanitize=address -fno-omit-frame-pointer -g
The program gives this output
Started Program
Expected i: 42
=================================================================
==14891==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x4f2040 in operator new(unsigned long) (memory-leak+++0x4f2040)
#1 0x4f4f00 in main memory_leak.cpp:4:11
#2 0x7fae13ce6f44 in __libc_start_main /build/eglibc-SvCtMH/eglibc-2.19/csu/libc-start.c:287
SUMMARY: AddressSanitizer: 4 byte(s) leaked in 1 allocation(s).
Cool, it works and symbolizer gives meaningful information too.
Now, I build this with clang
clang-3.8 -o memory-leak memory_leak.cpp -std=c++11 -fsanitize=address -fno-omit-frame-pointer -g -lstdc++
And the program gives this output
Started Program
Expected i: 42
=================================================================
==14922==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x4c3bc8 in malloc (memory-leak+0x4c3bc8)
#1 0x7f024a8e4dac in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x5edac)
#2 0x7f0249998f44 in __libc_start_main /build/eglibc-SvCtMH/eglibc-2.19/csu/libc-start.c:287
SUMMARY: AddressSanitizer: 4 byte(s) leaked in 1 allocation(s).
Ok, it detects memory leak, but the stack trace looks strange and it doesn't really include the memory_leak.cpp:4:11 line.
I've spent quite a while trying to narrow down this problem in our codebase and eventually, the only difference, is clang vs clang++.
Why it's event a problem, can't we use clang++?
We use bazel, which uses CC compiler instead of CXX for some blah-balh reasons. We cannot blindly force to use it CXX because we have CC dependencies which cannot be build by CXX. So...
Any idea how to get the same ASAN output when used with clang and clang++? Or, how to make Bazel to use clang++ for C++ targets and clang for C targets?
This seems to be a bug in Clang, could you file bug report in their tracker? (EDIT: this was [resolved as not-a-bug](Asan developers https://github.com/google/sanitizers/issues/872) so probly needs to be fixed by Bazel developers instead).
Some details: when you use ordinary clang, it decides not to link C++ part of Asan runtime as can be seen in Tools.cpp:
if (SanArgs.linkCXXRuntimes())
StaticRuntimes.push_back("asan_cxx");
and SanitizerArgs.cpp:
LinkCXXRuntimes =
Args.hasArg(options::OPT_fsanitize_link_cxx_runtime) || D.CCCIsCXX();
(note the D.CCCIsCXX part, it checks for clang vs. clang++ whereas instead they need to check the file type).
C++ part of the runtime contains interceptor for operator new so this would explain why it's missing when you link with clang instead of clang++. On a positive side, you should be able to work around this by adding -fsanitize-link-c++-runtime to your flags.
As for the borked stack, by default Asan unwinds stack with frame pointer based unwinder which has problems unwinding through code which wasn't built with -fno-omit-frame-pointer (like libstdc++.so in your case). See e.g. this answer for another example of such behavior and available workarounds.

lli: LLVM ERROR: Cannot select: X86ISD::WrapperRIP TargetGlobalTLSAddress:i64

Running the following code with clang++ -S -emit-llvm main.cpp && lli main.ll on Linux(Debian)
#include <future>
int main () {
return std::async([]{return 1;}).get();
}
fails to run on lli due to the following error:
LLVM ERROR: Cannot select: 0xd012e0:
i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i8** #_ZSt15__once_callable> 0 [TF=10]
0xd020c0: i64 = TargetGlobalTLSAddress<i8** #_ZSt15__once_callable> 0 [TF=10]
In function: _ZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_
Questions:
What does it mean?
Are there any compiler-flags that fix this problem?
using -stdlib=libc++ compiles and runs successfully*; what specific features is libstdc++ using that cause this issue ?
EDIT:
The motivation behind this question is to understand the differences between libc++ and libstdc++ that leads to this specific error message (on Linux) in llvm's orcjit.
On OSX gcc has been deprecated and clang uses by default libc++.
To reproduce this error on OSX you probably have to install gcc & use -stdlib=libstdc++.
Here is the llvm-ir (it's unfortunately to big to embed it here directly)
EDIT:
The error turned out to be caused by the lack of TLS support in the JITer. This answer describes another problem concerned with linking and lli.
If you have a look at the generated IR from clang++ -std=c++11 -S -emit-llvm test.cpp, you will find that many of the symbols, e.g. _ZNSt6futureIiE3getEv, are only declared, but never defined. The linker is never called, since -S "Only run[s] preprocess and compilation steps" (clang --help).
lli only executes the IR Module and does no "implicit" linking, how is it supposed to know which libraries to link in?
There are different solutions to this, depending on why you are using lli:
compile and link the IR Module: llc main.cpp && clang++ -lpthread main.s (pthread is required s. What is the correct link options to use std::thread in GCC under linux?)
(unconfirmed) use LD_PRELOAD="x.so y.so" to force-load the libraries before running lli
JIT the module programmatically and use LoadLibraryPermanently(nullptr) (adds symbols of the program into the search space) and LoadLibraryPermanently(file, err) for additional libs (s. http://llvm.org/docs/doxygen/html/classllvm_1_1sys_1_1DynamicLibrary.html)
I can only guess as to why libc++ works for you since it fails on my machine, but presumably it's the case because it is loaded into lli already and lli calls sys::DynamicLibrary::LoadLibraryPermanently(nullptr) to add the program's symbols to its JIT search space (s. https://github.com/llvm-mirror/llvm/blob/release_40/tools/lli/OrcLazyJIT.cpp#L110).
The LLVM-dev mailinglist pointed out:
What does it mean?
The llvm-backend in orcjit does currently not support thread-local storage(TLS)
a minimal example is:
extern thread_local int tls;
int main() {
tls = 42;
return 0;
}
using -stdlib=libc++ compiles and runs successfully*; what specific features is libstdc++ using that cause this issue ?
this works because libc++ future::get implementation does not use thread_local keyword.
Are there any compiler-flags that fix this problem?
currently there is no solution.
Using lli -relocation-model=pic trades this problem with a relocation failure.

Using -fsanitize=memory with clang on linux with libstdc++

With the system supplied libstdc++ the clang memory sanitizer is basically unusable due to false positives - eg the code below fails.
#include <iostream>
#include <fstream>
int main(int argc, char **argv)
{
double foo = 1.2;
std::ofstream out("/tmp/junk");
auto prev = out.flags(); //false positive here
out.setf(std::ios::scientific);
out << foo << std::endl;
out.setf(prev);
}
Building libstdc++ as described here:
https://code.google.com/p/memory-sanitizer/wiki/InstrumentingLibstdcxx
and running it like so:
LD_LIBRARY_PATH=~/goog-gcc-4_8/build/src/.libs ./msan_example
gives the foolowing output
/usr/bin/llvm-symbolizer: symbol lookup error: /home/hal/goog-gcc-4_8/build/src/.libs/libstdc++.so.6: undefined symbol: __msan_init
Both centos7 (epel clang) and ubuntu fail in this manner.
Is there something I'm doing wrong?
Previous stack overflow question Using memory sanitizer with libstdc++
edit
Using #eugins suggestion, compile command line for this code is:
clang++ -std=c++11 -fPIC -O3 -g -fsanitize=memory -fsanitize-memory-track-origins -fPIE -fno-omit-frame-pointer -Wall -Werror -Xlinker --build-id -fsanitize=memory -fPIE -pie -Wl,-rpath,/home/hal/goog-gcc-4_8/build/src/.libs/libstdc++.so test_msan.cpp -o test_msan
$ ./test_msan
==19027== WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7f66df377d4e in main /home/hal/tradingsystems/test_msan.cpp:9
#1 0x7f66ddefaaf4 in __libc_start_main (/lib64/libc.so.6+0x21af4)
#2 0x7f66df37780c in _start (/home/hal/tradingsystems/test_msan+0x7380c)
Uninitialized value was created by an allocation of 'out' in the stack frame of function 'main'
#0 0x7f66df377900 in main /home/hal/tradingsystems/test_msan.cpp:6
SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/hal/tradingsystems/test_msan.cpp:9 main
Exiting
MSan spawns llvm-symbolizer process to translate stack trace PCs into function names and file/line numbers. Because of the LD_LIBRARY_PATH setting, the instrumented libstdc++ is loaded into both the main MSan process (which is good) and the llvm-symbolizer process (which won't work).
Preferred way of dealing with it is though RPATH setting (at link time):
-Wl,-rpath,/path/to/libstdcxx_msan
You could also check this msan/libc++ guide which is more detailed and up-to-date:
https://code.google.com/p/memory-sanitizer/wiki/LibcxxHowTo