I have been attempting to cross compile mesa for windows. I've been following roughly this tutorial, but using MSYS2 rather than a full linux OS. I have successfully compiled a functioning opengl32.dll in all but one aspect.
I'm getting an access violation on compilation of a shader when it contains one of any number of standard built-in functions. These include the following:
getType max(genType, getType);
getType min(genType, getType);
float dot(genType x, genType y);
genType normalize(genType v);
This isn't exhaustive. Most (possibly all) builtin functions fail. Without any of these functions, the shaders compile and run fine. I can't get a compilation log or anything like that as the access violation means no code after glCompileShader executes. Here is my vertex shader:
#version 130
in vec2 position;
void main()
{
gl_Position = vec4(position, 0.0, 1.0);
// Meaningless tests ...
float a = 3.0, b = 4.0, c;
c = max(a, b); // <-- COMPILES AND RUNS OK WITHOUT THIS LINE
}
The C++ code is a simple native windows application which follows this tutorial. I can post the full code if people think it is relevant, but most of it is just window setup and similar. The actual shader compilation bit looks like this:
// Load the shader source code from a file
std::string filePath("Shaders/vertex");
std::ifstream stream(filePath, std::ios::in);
std::string code = "", line = "";
while(getline(stream, line)) code += "\n" + line;
stream.close();
const char* code_c_str = code.c_str();
GLint code_length = code.size();
// Create a shader and compile the loaded source code
GLuint vertexShaderID = glCreateShader(GL_VERTEX_SHADER);
glShaderSource(vertexShaderID, 1, &code_c_str, &code_length);
glCompileShader(vertexShaderID);
The program functions fine when linked to the hardware library (NVIDIA, in my case). The program also functions when linked to a Mesa build done by an ex-employee of my company. Unfortunately, the individual left no documentation on how that Mesa binary was built. I have tried building Mesa both with and without LLVM, with the same result each time.
Does anyone know why my build of Mesa might be failing to compile simple builtin glsl functions so spectacularly?
Update
I did a debug build, as suggested, and the problem went away. It seems to be an optimiser issue. I've since ascertained that the release build works with -O0 or -O1 optimisation, and fails on -O2 or -O3. Still a bit of a pain, as performance is important for this application, but at least I now have a working DLL and an idea of where to go.
For reference, I'm using the default gcc/g++ cross compiler on msys2, which appears to be 4.9.2. I haven't been able to get much out of the debugger, because the DLL is built with mingw, but the test solution is in Visual Studio. I did get this, however, from gdb:
#0 0x00000008 in ?? ()
#1 0x630e51a0 in (anonymous namespace)::builtin_builder::new_sig(glsl_type const*, bool (*)(_mesa_glsl_parse_state const*), int, ...) [clone .constprop.166]
() at src/glsl/list.h:440
#2 0x630e51a0 in (anonymous namespace)::builtin_builder::new_sig(glsl_type const*, bool (*)(_mesa_glsl_parse_state const*), int, ...) [clone .constprop.166]
() at src/glsl/list.h:440
#3 0x630e51a0 in (anonymous namespace)::builtin_builder::new_sig(glsl_type const*, bool (*)(_mesa_glsl_parse_state const*), int, ...) [clone .constprop.166]
() at src/glsl/list.h:440
#4 0x630e51a0 in (anonymous namespace)::builtin_builder::new_sig(glsl_type const*, bool (*)(_mesa_glsl_parse_state const*), int, ...) [clone .constprop.166]
() at src/glsl/list.h:440
#5 0x630e51a0 in (anonymous namespace)::builtin_builder::new_sig(glsl_type const*, bool (*)(_mesa_glsl_parse_state const*), int, ...) [clone .constprop.166]
() at src/glsl/list.h:440
#6 0x630e51a0 in (anonymous namespace)::builtin_builder::new_sig(glsl_type const*, bool (*)(_mesa_glsl_parse_state const*), int, ...) [clone .constprop.166]
() at src/glsl/list.h:440
#7 0x630e51a0 in (anonymous namespace)::builtin_builder::new_sig(glsl_type const*, bool (*)(_mesa_glsl_parse_state const*), int, ...) [clone .constprop.166]
() at src/glsl/list.h:440
#8 0x630e51a0 in (anonymous namespace)::builtin_builder::new_sig(glsl_type const*, bool (*)(_mesa_glsl_parse_state const*), int, ...) [clone .constprop.166]
() at src/glsl/list.h:440
#9 0x630e51a0 in (anonymous namespace)::builtin_builder::new_sig(glsl_type const*, bool (*)(_mesa_glsl_parse_state const*), int, ...) [clone .constprop.166]
() at src/glsl/list.h:440
#10 0x630e51a0 in (anonymous namespace)::builtin_builder::new_sig(glsl_type const*, bool (*)(_mesa_glsl_parse_state const*), int, ...) [clone .constprop.166]
() at src/glsl/list.h:440
#11 0x644395c0 in glsl_type::_struct_gl_DepthRangeParameters_type ()
from C:\Users\will\Documents\Visual Studio 2012\Projects\OpenGLMinimalTest\Debug\opengl32.dll
#12 0x048c0964 in ?? ()
#13 0x00000002 in ?? ()
#14 0x01040000 in ?? ()
#15 0x00000000 in ?? ()
There's no information before or after the Mesa code, as those bits are Windows DLLs and Visual Studio compiled respectively. The error is in the builtin function builder, which makes sense, given it's builtin functions that generate it. It's not immediately obvious (to me at least) what the reason for this error is under optimisation.
It's not a complete answer, but I think it's enough...
The access violation problem was happening in MinGW builds only at higher levels of optimisation. O2 and above didn't work. The problem is in shader builtin functions, but we never worked out exactly what it was or how to fix it.
Eventually, in order to generate an efficient build of Mesa on windows, I abandoned MinGW and built using visual studio. I used the following python script as a starting point.
https://github.com/florianlink/MesaOnWindows
Hopefully that's sufficient information for anyone encountering the same problems.
Related
I have this very simple code snippet using opencv and LibTorch, which does not run for some reason.
#include <iostream>
#include <torch/script.h>
#include <opencv2/core/core.hpp>
int main() {
cv::Mat imgMat = cv::Mat::zeros(640, 640, CV_8UC3);
at::Tensor tensorImg = torch::from_blob(imgMat.data, {1, imgMat.rows, imgMat.cols, imgMat.channels()});
std::cout << tensorImg << "\n"; // problem here
return 0;
}
I have tried to compile it with clang and added undefined behaviour sanitizer, which gives the following errors:
UndefinedBehaviorSanitizer:DEADLYSIGNAL
==11549==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x7fffde2fa000 (pc 0x7fffe4039d08 bp 0x7fffdd7b4ed0 sp 0x7fffdd7b4e20 T11570)
==11549==The signal is caused by a READ memory access.
UndefinedBehaviorSanitizer:DEADLYSIGNAL
UndefinedBehaviorSanitizer:DEADLYSIGNAL
#0 0x7fffe4039d08 in void c10::function_ref<void (char**, long const*, long, long)>::callback_fn<auto at::TensorIteratorBase::loop_2d_from_1d<at::native::AVX2::copy_kernel(at::TensorIterator&, bool)::'lambda'()::operator()() const::'lambda10'()::operator()() const::'lambda'()::operator()() const::'lambda12'()::operator()() const::'lambda'(char**, long const*, long)>(at::native::AVX2::copy_kernel(at::TensorIterator&, bool)::'lambda'()::operator()() const::'lambda10'()::operator()() const::'lambda'()::operator()() const::'lambda12'()::operator()() const::'lambda'(char**, long const*, long) const&)::'lambda'(char**, long const*, long, long)>(long, char**, long const*, long, long) (/home/dani/Desktop/test/build/libtorch/lib/libtorch_cpu.so+0x54bed08) (BuildId: e03155c98263c3ef83236051d8610270872897af)
#1 0x7fffdfecf96f in at::TensorIteratorBase::serial_for_each(c10::function_ref<void (char**, long const*, long, long)>, at::Range) const (/home/dani/Desktop/test/build/libtorch/lib/libtorch_cpu.so+0x135496f) (BuildId: e03155c98263c3ef83236051d8610270872897af)
#2 0x7fffdfecfb2d in void at::internal::invoke_parallel<at::TensorIteratorBase::for_each(c10::function_ref<void (char**, long const*, long, long)>, long)::'lambda'(long, long)>(long, long, long, at::TensorIteratorBase::for_each(c10::function_ref<void (char**, long const*, long, long)>, long)::'lambda'(long, long) const&) (._omp_fn.0) (/home/dani/Desktop/test/build/libtorch/lib/libtorch_cpu.so+0x1354b2d) (BuildId: e03155c98263c3ef83236051d8610270872897af)
#3 0x7fffde41696d (/home/dani/Desktop/test/build/libtorch/lib/libgomp-52f2fd74.so.1+0x1696d) (BuildId: 9afb2d23e5127e68ba5ef6031eefc9d25b9b672b)
#4 0x7fffde79db42 in start_thread nptl/./nptl/pthread_create.c:442:8
#5 0x7fffde82f9ff misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV (/home/dani/Desktop/test/build/libtorch/lib/libtorch_cpu.so+0x54bed08) (BuildId: e03155c98263c3ef83236051d8610270872897af) in void c10::function_ref<void (char**, long const*, long, long)>::callback_fn<auto at::TensorIteratorBase::loop_2d_from_1d<at::native::AVX2::copy_kernel(at::TensorIterator&, bool)::'lambda'()::operator()() const::'lambda10'()::operator()() const::'lambda'()::operator()() const::'lambda12'()::operator()() const::'lambda'(char**, long const*, long)>(at::native::AVX2::copy_kernel(at::TensorIterator&, bool)::'lambda'()::operator()() const::'lambda10'()::operator()() const::'lambda'()::operator()() const::'lambda12'()::operator()() const::'lambda'(char**, long const*, long) const&)::'lambda'(char**, long const*, long, long)>(long, char**, long const*, long, long)
==11549==ABORTING
Any idea what am I doing wrong?
As it turned out I was missing the option parameter at::kByte from the function torch::from_blob().
Edit:
Without this parameter LibTorch could not interpret the tensor, and gave a deadly signal. See the actual reason in #Dan Mašek's comment.
Based on the documentation
The TensorOptions specify additional configuration options for the
returned tensor, such as what type to interpret the data as.
The correct line is:
at::Tensor tensorImg = torch::from_blob(imgMat.data, {1, imgMat.rows, imgMat.cols, imgMat.channels()}, at::kByte);
I build with brew version of clang++ and use adress sanitizer to look for memory leaks and it gives memory leak on every program even programs without any leak.
clang++ -fsanitize=thread main.cpp -g
int main() {
auto *p = new int;
delete p; // no leak
return 0;
}
I have been using the following commands. I expect there shouldn’t be any leaks however it shows the leaks from system libraries couldn't make any sense.
clang++ -fsanitize=address main.cpp -g
export ASAN_OPTIONS=detect_leaks=1
export MallocNanoZone=0
./a.out
=================================================================
==74341==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 1952 byte(s) in 61 object(s) allocated from:
#0 0x1066d25e5 in wrap_calloc+0xa5 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x475e5) (BuildId: e487ca41363b3ac1b8e9e49fecb969fb2400000010000000000a0a0000000d00)
#1 0x7ff81bb972ee in realizeClassWithoutSwift(objc_class*, objc_class*)+0x85 (libobjc.A.dylib:x86_64h+0x52ee) (BuildId: aca7ef61285336998c1f1c0ab93ad6be32000000200000000100000000000d00)
#2 0x7ff81bb95646 in map_images_nolock+0x160e (libobjc.A.dylib:x86_64h+0x3646) (BuildId: aca7ef61285336998c1f1c0ab93ad6be32000000200000000100000000000d00)
#3 0x7ff81bb93fda in map_images+0x42 (libobjc.A.dylib:x86_64h+0x1fda) (BuildId: aca7ef61285336998c1f1c0ab93ad6be32000000200000000100000000000d00)
#4 0x7ff81bbe04c2 in invocation function for block in dyld4::RuntimeState::setObjCNotifiers(void (*)(unsigned int, char const* const*, mach_header const* const*), void (*)(char const*, mach_header const*), void (*)(char const*, mach_header const*), void (*)(mach_header const*, void*, mach_header const*, void const*), void (*)(unsigned int, _dyld_objc_notify_mapped_info const*))+0x27c (dyld:x86_64+0xfffffffffff7e4c2) (BuildId: 28fd207157f3387387bfe4f674a82de632000000200000000100000000000d00)
#5 0x7ff81bbdaffe in dyld4::RuntimeState::withLoadersReadLock(void () block_pointer)+0x2e
Direct leak of 1952 byte(s) in 61 object(s) allocated from:
#0 0x1066d25e5 in wrap_calloc+0xa5 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x475e5) (BuildId: e487ca41363b3ac1b8e9e49fecb969fb2400000010000000000a0a0000000d00)
#1 0x7ff81bb972ee in realizeClassWithoutSwift(objc_class*, objc_class*)+0x85 (libobjc.A.dylib:x86_64h+0x52ee) (BuildId: aca7ef61285336998c1f1c0ab93ad6be32000000200000000100000000000d00)
#2 0x7ff81bb973ac in realizeClassWithoutSwift(objc_class*, objc_class*)+0x143 (libobjc.A.dylib:x86_64h+0x53ac) (BuildId: aca7ef61285336998c1f1c0ab93ad6be32000000200000000100000000000d00)
#3 0x7ff81bb95646 in map_images_nolock+0x160e (libobjc.A.dylib:x86_64h+0x3646) (BuildId: aca7ef61285336998c1f1c0ab93ad6be32000000200000000100000000000d00)
#4 0x7ff81bb93fda in map_images+0x42 (libobjc.A.dylib:x86_64h+0x1fda) (BuildId: aca7ef61285336998c1f1c0ab93ad6be32000000200000000100000000000d00)
#5 0x7ff81bbe04c2 in invocation function for block in dyld4::RuntimeState::setObjCNotifiers(void (*)(unsigned int, char const* const*, mach_header const* const*), void (*)(char const*, mach_header const*), void (*)(char const*, mach_header const*), void (*)(mach_header const*, void*, mach_header const*, void const*), void (*)(unsigned int, _dyld_objc_notify_mapped_info const*))+0x27c (dyld:x86_64+0xfffffffffff7e4c2) (BuildId: 28fd207157f3387387bfe4f674a82de632000000200000000100000000000d00)
#6 0x7ff81bbdaffe in dyld4::RuntimeState::withLoadersReadLock(void () block_pointer)+0x2e (dyld:x86_64+0xfffffffffff78ffe) (BuildId: 28fd207157f3387387bfe4f674a82de632000000200000000100000000000d00)
#7 0x7ff81bbe023f in dyld4::RuntimeState::setObjCNotifiers(void (*)(unsigned int, char const* const*, mach_header const* const*), void (*)(char const*, mach_header const*), void (*)(char const*, mach_header const*), void (*)(mach_header const*, void*, mach_header const*, void const*), void (*)(unsigned int, _dyld_objc_notify_mapped_info const*))+0x5f (dyld:x86_64+0xfffffffffff7e23f) (BuildId: 28fd207157f3387387bfe4f674a82de632000000200000000100000000000d00)
#8 0x7ff81bc045e3 in dyld4::APIs::_dyld_objc_register_callbacks(_dyld_objc_callbacks const*)+0x89 (dyld:x86_64+0xfffffffffffa25e3) (BuildId: 28fd207157f3387387bfe4f674a82de632000000200000000100000000000d00)
#9 0x7ff81bb93e3e in _objc_init+0x4f6 (libobjc.A.dylib:x86_64h+0x1e3e) (BuildId: aca7ef61285336998c1f1c0ab93ad6be32000000200000000100000000000d00)
#10 0x7ff81bd850bf in _os_object_init+0xc (libdispatch.dylib:x86_64+0x20bf) (BuildId: 817339a1d03e3e549c47acacf69f619332000000200000000100000000000d00)
#11 0x7ff81bd92d34 in libdispatch_init+0x16a (libdispatch.dylib:x86_64+0xfd34) (BuildId: 817339a1d03e3e549c47acacf69f619332000000200000000100000000000d00)
#12 0x7ff827b2d894 in libSystem_initializer+0xed (libSystem.B.dylib:x86_64+0x1894) (BuildId: 862b6758852e3e89a4fed564a7163e2532000000200000000100000000000d00)
#13 0x7ff81bbea617 in invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const+0xab (dyld:x86_64+0xfffffffffff88617) (BuildId: 28fd207157f3387387bfe4f674a82de632000000200000000100000000000d00)
#14 0x7ff81bc29de8 in invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const+0xf1 (dyld:x86_64+0xfffffffffffc7de8) (BuildId: 28fd207157f3387387bfe4f674a82de632000000200000000100000000000d00)
#15 0x7ff81bc1def6 in invocation function for block in dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const+0x22c (dyld:x86_64+0xfffffffffffbbef6) (BuildId: 28fd207157f3387387bfe4f674a82de632000000200000000100000000000d00)
...
#23 0x7ff81bbd5368 in dyld4::prepare(dyld4::APIs&, dyld3::MachOAnalyzer const*)+0xe9e (dyld:x86_64+0xfffffffffff73368) (BuildId: 28fd207157f3387387bfe4f674a82de632000000200000000100000000000d00)
#24 0x7ff81bbd4280 in start+0x8f0 (dyld:x86_64+0xfffffffffff72280) (BuildId: 28fd207157f3387387bfe4f674a82de632000000200000000100000000000d00)
SUMMARY: AddressSanitizer: 4288 byte(s) leaked in 134 allocation(s).
and my clang++ version:
Homebrew clang version 15.0.6
Target: x86_64-apple-darwin22.1.0
Thread model: posix
InstalledDir: /usr/local/opt/llvm/bin
on macOS 13.0.1
Same situation here as the OP. I don't know what the root cause is (false positive vs. true memory leak in realizeClassWithoutSwift()), but you can configure LeakSanitizer to suppress detected leaks in the memory leak report:
Create a file lsan.supp with
leak:realizeClassWithoutSwift
and then use it when running your binary:
ASAN_OPTIONS=detect_leaks=1 LSAN_OPTIONS=suppressions=lsan.supp my_binary
More details at https://clang.llvm.org/docs/AddressSanitizer.html#suppressing-memory-leaks.
Answer courtesy to GitHub user's willmcpherson2 comment at google/sanitizers/issues/1501.
gRPC v1.30.0
I created a grpc service and tried to run it. The execution goes smooth till the last return statement at server side.
Status theService(ServerContext *context, const Request* req, Response* res)
{
Status status = actualLogic(req,res);
//execution goes fine till here
return status;
}
Status actualLogic(req,res)
{
Response_NestedMsg msg;
msg.set_something(...);
res->mutable_nestedmsg()->CopyFrom(msg);
return Status::OK
}
//server startup code
ServerBuilder builder;
builder.AddListeningPort((address),grpc::InsecureServerCredentials());
builder.RegisterService(&serviceClassObj);
std::unique_ptr<Server> server(builder.BuildAndStart());
server->Wait();
Running this code, I get following runtime error
==14394==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x61b00000fcc8 in thread T5 (grpcpp_sync_ser)
#0 0x7fe9d35602c0 in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xe12c0)
#1 0x55cb87299afd in __gnu_cxx::new_allocator<std::_List_node<grpc_impl::Server const*> >::deallocate(std::_List_node<grpc_impl::Server const*>*, unsigned long) (/home/john/Desktop/my_executable+0xd0afd)
#2 0x55cb87297ba1 in std::allocator_traits<std::allocator<std::_List_node<grpc_impl::Server const*> > >::deallocate(std::allocator<std::_List_node<grpc_impl::Server const*> >&, std::_List_node<grpc_impl::Server const*>*, unsigned long) (/home/john/Desktop/my_executable+0xceba1)
#3 0x55cb8729448d in std::__cxx11::_List_base<grpc_impl::Server const*, std::allocator<grpc_impl::Server const*> >::_M_put_node(std::_List_node<grpc_impl::Server const*>*) (/home/john/Desktop/my_executable+0xcb48d)
#4 0x55cb8728bb5a in std::__cxx11::_List_base<grpc_impl::Server const*, std::allocator<grpc_impl::Server const*> >::_M_clear() (/home/john/Desktop/my_executable+0xc2b5a)
#5 0x55cb87287307 in std::__cxx11::_List_base<grpc_impl::Server const*, std::allocator<grpc_impl::Server const*> >::~_List_base() (/home/john/Desktop/my_executable+0xbe307)
#6 0x55cb87278d29 in std::__cxx11::list<grpc_impl::Server const*, std::allocator<grpc_impl::Server const*> >::~list() (/home/john/Desktop/my_executable+0xafd29)
#7 0x55cb87278e2c in grpc_impl::CompletionQueue::~CompletionQueue() (/home/john/Desktop/my_executable+0xafe2c)
#8 0x7fe9d1826998 in grpc_impl::Server::SyncRequest::CallData::ContinueRunAfterInterception() (/usr/local/lib/libgrpc++.so.1+0x6f998)
#9 0x7fe9d18278ee in grpc_impl::Server::SyncRequestThreadManager::DoWork(void*, bool, bool) (/usr/local/lib/libgrpc++.so.1+0x708ee)
#10 0x7fe9d182c4ca in grpc::ThreadManager::MainWorkLoop() (/usr/local/lib/libgrpc++.so.1+0x754ca)
#11 0x7fe9d182c68b in grpc::ThreadManager::WorkerThread::Run() (/usr/local/lib/libgrpc++.so.1+0x7568b)
#12 0x7fe9cf5a78d2 in grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::{lambda(void*)#1}::_FUN(void*) (/usr/local/lib/libgpr.so.10+0x118d2)
#13 0x7fe9d1eef6da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
#14 0x7fe9d0ba8a3e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x121a3e)
0x61b00000fcc8 is located 72 bytes inside of 1448-byte region [0x61b00000fc80,0x61b000010228)
allocated by thread T5 (grpcpp_sync_ser) here:
#0 0x7fe9d355f448 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xe0448)
#1 0x7fe9d18274f2 in grpc_impl::Server::SyncRequestThreadManager::DoWork(void*, bool, bool) (/usr/local/lib/libgrpc++.so.1+0x704f2)
Thread T5 (grpcpp_sync_ser) created by T0 here:
#0 0x7fe9d34b6d2f in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x37d2f)
#1 0x7fe9cf5a7a92 in grpc_core::Thread::Thread(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&) (/usr/local/lib/libgpr.so.10+0x11a92)
SUMMARY: AddressSanitizer: bad-free (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xe12c0) in operator delete(void*)
==14394==ABORTING
None of my code tries to free any pointer and error seems to be coming from some auto generated file only. Please suggest if some more code/details needed.
I briefly checked the error message and code but it looks strange to me because both allocation and destruction were done by C++ new & delete. This is also consistent with your error message.
### Destruction (with operator delete)
#0 0x7fe9d35602c0 in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xe12c0)
### Allocation (with operator new)
#0 0x7fe9d355f448 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xe0448)
This might be caused by other issues like buggy ASAN or customized memory allocator.
I'm getting a very weird bug when defining a test suite with boost like this:
BOOST_AUTO_TEST_SUITE(zerocoin_implementation_tests)
The error looks like this:
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::_M_create
Here's the relevant backtrace:
#5 0x00007ffff5ce6fe8 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff5ce2875 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x00007ffff5d7c949 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long) ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x00007ffff70afe15 in boost::unit_test::test_unit::test_unit(boost::unit_test::basic_cstring<char const>, boost::unit_test::basic_cstring<char const>, unsigned long, boost::unit_test::test_unit_type) () from /usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.65.1
#9 0x00007ffff70b0456 in boost::unit_test::test_suite::test_suite(boost::unit_test::basic_cstring<char const>, boost::unit_test::basic_cstring<char const>, unsigned long) () from /usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.65.1
#10 0x00007ffff70b0612 in boost::unit_test::ut_detail::auto_test_unit_registrar::auto_test_unit_registrar(boost::unit_test::basic_cstring<char const>, boost::unit_test::basic_cstring<char const>, unsigned long, boost::unit_test::decorator::collector&) ()
from /usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.65.1
From what I can tell, this has to do with Boost trying to create a max length string. I'd like to see exactly what it is doing. What's the best way of expanding boost macros to see the pre-compiled version?
Side Note
Weirdly, if I change the line very slightly to:
BOOST_AUTO_TEST_SUITE(zerocsoin_implementation_tests)
I get the following error:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
And backtrace:
#6 0x00007ffff5ce7594 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x00007ffff70afe15 in boost::unit_test::test_unit::test_unit(boost::unit_test::basic_cstring<char const>, boost::unit_test::basic_cstring<char const>, unsigned long, boost::unit_test::test_unit_type) () from /usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.65.1
#8 0x00007ffff70b0456 in boost::unit_test::test_suite::test_suite(boost::unit_test::basic_cstring<char const>, boost::unit_test::basic_cstring<char const>, unsigned long) () from /usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.65.1
#9 0x00007ffff70b0612 in boost::unit_test::ut_detail::auto_test_unit_registrar::auto_test_unit_registrar(boost::unit_test::basic_cstring<char const>, boost::unit_test::basic_cstring<char const>, unsigned long, boost::unit_test::decorator::collector&) ()
from /usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.65.1
The source code for the file (and the rest of the project) can be found here: https://github.com/phoreproject/Phore/blob/segwit/src/test/zerocoin_implementation_tests.cpp
Diff that probably caused the bug: https://github.com/phoreproject/phore/compare/master...segwit#diff-bb4f094cc636d668944ed6af9b72c0d9
Two approaches:
Exception Breakpoints
Just start the test in the debugger and catch the exception.
In gdb you could do
(gdb) catch throw
Catchpoint 2 (throw)
which act like a general breakpoint. Visual Studio has a Manage Exeptions dialog.¹
Boost Test Breakpoints
For debugging Boost Test I like to set a break at the test_method member of the specific test case class I want to break at. E.g. with a test_runner that has a few nested suites like:
./test_runner --list_content
import*
utility*
xml*
xml_utilities*
child_text_test*
loggable_xml_path_test*
And we run these 3 tests like:
./test_runner -t import/utility/xml
Running 3 test cases...
*** No errors detected
To debug them with gdb I'd do
gdb ./test_runner
start -t import/utility/xml
Which stops at main, then I type:
break import::utility::xml
Auto completion helps, so to get the exact names, you can just pick from the completions:
xml
xml::as_element(xmlpp::Node const&)
xml::attr_value(xmlpp::Element const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
xml::attr_value(xmlpp::Node const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
xml::child_text[abi:cxx11](xmlpp::Element const&, char const*)
xml::child_text_test
xml::child_text_test::test_method()
xml::child_text_test_invoker()
xml::child_text_test_registrar62
xml::end_suite94_registrar94
xml::first_child(xmlpp::Element const&, char const*)
xml::get_content[abi:cxx11](xmlpp::Element const&)
xml::get_content[abi:cxx11](xmlpp::Node const*)
xml::is_attr_value(xmlpp::Node const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
xml::loggable_xml_path[abi:cxx11](xmlpp::Node const&)
xml::loggable_xml_path_test
xml::loggable_xml_path_test::test_method()
xml::loggable_xml_path_test_invoker()
xml::loggable_xml_path_test_registrar77
xml::trace_xml(xmlpp::Element const&, LogSource::LogTx)
xml::trace_xml_formatted(xmlpp::Element const&, LogSource::LogTx)
xml::xml_registrar20
xml::xml_utilities
xml::xml_utilities::test_method()
xml::xml_utilities_invoker()
xml::xml_utilities_registrar22
Pick the ones named test_method(), e.g.
break import::utility::xml::child_text_test::test_method()
Breakpoint 2 at 0x730762: file /path/src/import/utility/xml_tests.cpp, line 62.
Now you can continue execution and the debugger will automatic pause execution at the start of your unit test.
¹ see also
Make Visual Studio break on User (std::exception) Exceptions?
How do I make VC++'s debugger break on exceptions?
This question already has answers here:
Segfaults in malloc() and malloc_consolidate()
(2 answers)
Closed 7 years ago.
My program goes in segmentation faults, and I cannot find the cause.
The worst part is, the function in question does not always lead to segfault.
GDB confirms the bug and yields this backtrace:
Program received signal SIGSEGV, Segmentation fault.
0xb7da6d6e in malloc_consolidate (av=<value optimized out>) at malloc.c:5169
5169 malloc.c: No such file or directory.
in malloc.c
(gdb) bt
#0 0xb7da6d6e in malloc_consolidate (av=<value optimized out>) at malloc.c:5169
#1 0xb7da9035 in _int_malloc (av=<value optimized out>, bytes=<value optimized out>) at malloc.c:4373
#2 0xb7dab4ac in __libc_malloc (bytes=525) at malloc.c:3660
#3 0xb7f8dc15 in operator new(unsigned int) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#4 0xb7f72db5 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_S_create(unsigned int, unsigned int, std::allocator<char> const&) ()
from /usr/lib/i386-linux-gnu/libstdc++.so.6
#5 0xb7f740bf in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_M_clone(std::allocator<char> const&, unsigned int) ()
from /usr/lib/i386-linux-gnu/libstdc++.so.6
#6 0xb7f741f1 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::reserve(unsigned int) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#7 0xb7f6bfec in std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::overflow(int) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#8 0xb7f70e1c in std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, int) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#9 0xb7f5b498 in std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<unsigned long>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#10 0xb7f5b753 in std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::do_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#11 0xb7f676ac in std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<unsigned long>(unsigned long) ()
from /usr/lib/i386-linux-gnu/libstdc++.so.6
#12 0xb7f67833 in std::basic_ostream<char, std::char_traits<char> >::operator<<(unsigned int) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#13 0x08049c42 in sim::Address::GetS (this=0xbfffec40) at address.cc:27
#14 0x0806a499 in sim::UserGenerator::ProcessEvent (this=0x80a1af0, e=...) at user-generator.cc:59
#15 0x0806694b in sim::Simulator::CommunicateEvent (this=0x809f970, e=...) at simulator.cc:144
#16 0x0806685d in sim::Simulator::ProcessNextEvent (this=0x809f970) at simulator.cc:133
#17 0x08065d76 in sim::Simulator::Run (seed=0) at simulator.cc:53
#18 0x0807ce85 in main (argc=1, argv=0xbffff454) at main.cc:75
(gdb) f 13
#13 0x08049c42 in sim::Address::GetS (this=0xbfffec40) at address.cc:27
27 oss << m_address;
(gdb) p this->m_address
$1 = 1
Method GetS of class Address translates a number (uint32_t m_address) into a string and returns it. The code (very simple) is the following:
std::string
Address::GetS () const
{
std::ostringstream oss;
oss << m_address;
return oss.str ();
}
Besides, as can be seen in the backtrace, m_address is properly defined.
Now, I have tried to run my program using valgrind.
The program doesn't crash, likely due to the fact that valgrind replaces malloc () among other functions.
The error summary shows no memory leaking:
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 4,367 bytes in 196 blocks
still reachable: 9,160 bytes in 198 blocks
suppressed: 0 bytes in 0 blocks
All possibly lost refer to backtraces like this:
80 bytes in 5 blocks are possibly lost in loss record 3 of 26
at 0x4024B64: operator new(unsigned int) (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
by 0x40DBDB4: std::string::_Rep::_S_create(unsigned int, unsigned int, std::allocator<char> const&) (in /usr/lib/i386-linux-gnu/libstdc++.so.6.0.16)
by 0x40DE077: char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) (in /usr/lib/i386-linux-gnu/libstdc++.so.6.0.16)
by 0x40DE1E5: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib/i386-linux-gnu/libstdc++.so.6.0.16)
by 0x806AF62: sim::UserGenerator::CreateUser(unsigned int) (user-generator.cc:152)
I don't think this is related to the bug. However, the code in question can be found following this link.
I am thinking of a bug in libstdc++. However, how likely would that be?
I have also upgraded such library. Here's the versions currently installed on my system.
$ dpkg -l | grep libstdc
ii libstdc++5 1:3.3.6-23 The GNU Standard C++ Library v3
ii libstdc++6 4.6.1-1 GNU Standard C++ Library v3
ii libstdc++6-4.1-dev 4.1.2-27 The GNU Standard C++ Library v3 (development files)
ii libstdc++6-4.3-dev 4.3.5-4 The GNU Standard C++ Library v3 (development files)
ii libstdc++6-4.4-dev 4.4.6-6 GNU Standard C++ Library v3 (development files)
ii libstdc++6-4.5-dev 4.5.3-3 The GNU Standard C++ Library v3 (development files)
ii libstdc++6-4.6-dev 4.6.1-1 GNU Standard C++ Library v3 (development files)
Now the thing is, I am not sure which version g++ uses, and whether there's some means to enforce the use of a particular version.
What I am pondering is to modify GetS. But this is the only method I know. Do you suggest any alternative?
Eventually, I am even considering to replace std::string with simpler char*.
Maybe a little drastic, but I wouldn't set it aside.
Any thought in merit?
Thank you all in advance.
Best,
Jir
Ok. This is NOT the problem:
I am thinking of a bug in libstdc++
The problem is that you overwrote some memory buffer and corrupted one of the structures used by the memory manager. The hard part is going to be finding it. Does not valgrind give you information about writting past the end of an allocated piece of memory.
Don't do this:
Eventually, I am even considering to replace std::string with simpler char*. Maybe a little drastic, but I wouldn't set it aside.
You already have enough problems with memory management. This will just add more problems. There is absolutely NOTHING wrong with std::string or the memory management routines. They are heavily tested and used. If there was something wrong people all over the world would start screaming (it would be big news).
Reading your code at http://mercurial.intuxication.org/hg/lte_sim/file/c2ef6e0b6d41/src/ it seems like you are still stuck in a C style of writting code (C with Classes). So you have the power of C++ to automate (the blowing up of your code) but still have all the problems associated with C.
You need to re-look at your code in terms of ownership. You pass things around by pointer way too much. As a result it is hard to follow the ownership of the pointer (and thus who is responsible for deleting it).
I think you best bet at finding the bug is to write unit tests for each class. Then run the unit tests through val-grind. I know its a pain (but you should have done it to start with now you have the pain all in one go).