Segfault from Opencv Mat::create - c++

I am getting a Segmentation fault from the following call to Mat::Create
void PoissonBlend::blend(Mat& src, Mat& dst, Mat& mask, Mat& out){
Mat outer(mask.rows, mask.cols, CV_8U);
When I run my program in gdb I can see that both rows and cols are valid, and I have tried several different data types, but no matter what I get a Segfault on this line.
My program defines several other Mats in main(), before the call to blend, and all of them work perfectly fine. Has anyone else ever run into this before? This error is driving me crazy, I cant find any difference between this call to create and any of the others in my program, yet this one fails every time.
My gdb output is:
Program received signal SIGSEGV, Segmentation fault.
_int_malloc (av=0x7ffff6fbe740 <main_arena>, bytes=307228) at malloc.c:3879
#0 _int_malloc (av=0x7ffff6fbe740 <main_arena>, bytes=307228) at malloc.c:3879
#1 0x00007ffff6c88fc5 in __GI___libc_malloc (bytes=307228) at malloc.c:2924
#2 0x00007ffff791594d in cv::fastMalloc(unsigned long) () from /usr/lib/libopencv_core.so.2.3
#3 0x00007ffff78884bc in cv::Mat::create(int, int const*, int) () from /usr/lib/libopencv_core.so.2.3
#4 0x00000000004243da in cv::Mat::create (this=0x7fffffffdab0, _rows=480, _cols=640, _type=0) at /usr/include/opencv2/core/mat.hpp:368
#5 0x0000000000427608 in cv::Mat::Mat (this=0x7fffffffdab0, _rows=480, _cols=640, _type=0) at /usr/include/opencv2/core/mat.hpp:68
#6 0x00000000004255a7 in PoissonBlend::blend (this=0x7fffffffdd13, src=..., dst=..., mask=..., out=...)
at /home/adam/WorkingCode/rasc/trunk/src/Poisson.cpp:95
#7 0x0000000000423eb2 in main () at /home/adam/WorkingCode/rasc/trunk/src/PoissonTest.cpp:45

Since the crash is within malloc.c, I suspect that you probably have memory corruption. Try running the program under Valgrind to detect this.

Related

How does a unsigned char vector deallocation crash a program with a segfault...? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I'm literally deallocating a vector of unsigned chars during just normal object deallocation, and it crashes with a segfault at the vector_base deallocation free():
[Switching to Thread 17648.0x3528]
0x00007ff9ba0a9606 in ntdll!RtlAllocateHeap () from C:\WINDOWS\SYSTEM32\ntdll.dll
(gdb) back
#0 0x00007ff9ba0a9606 in ntdll!RtlAllocateHeap () from C:\WINDOWS\SYSTEM32\ntdll.dll
#1 0x00007ff9ba0a5d21 in ntdll!RtlFreeHeap () from C:\WINDOWS\SYSTEM32\ntdll.dll
#2 0x00007ff9b9839c9c in msvcrt!free () from C:\WINDOWS\System32\msvcrt.dll
#3 0x00000000004bc540 in __gnu_cxx::new_allocator<unsigned char>::deallocate(unsigned char*, unsigned long long) ()
#4 0x00000000004ea87b in std::allocator_traits<std::allocator<unsigned char> >::deallocate(std::allocator<unsigned char>&, unsigned char*, unsigned long long) ()
#5 0x00000000004df392 in std::_Vector_base<unsigned char, std::allocator<unsigned char> >::_M_deallocate(unsigned char*, unsigned long long) ()
#6 0x00000000004df436 in std::_Vector_base<unsigned char, std::allocator<unsigned char> >::~_Vector_base() ()
#7 0x000000000050110d in std::vector<unsigned char, std::allocator<unsigned char> >::~vector() ()
#8 0x0000000000420dd4 in Text::~Text() ()
#9 0x000000000041c9f7 in Scene::clearOnScreenText() ()
#10 0x0000000000410a52 in Application::NextScene() ()
#11 0x0000000000412a41 in Application::update() ()
#12 0x00000000004119a8 in Application::Run()::{lambda()#2}::operator()() const ()
#13 0x000000000041501a in std::_Function_handler<void (), Application::Run()::{lambda()#2}>::_M_invoke(std::_Any_data const&) ()
#14 0x00000000004cad92 in std::function<void ()>::operator()() const ()
#15 0x000000000051fba9 in std::intervalThread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::function<void ()>, std::function<void ()>, std::function<void ()>, std::function<bool ()>, long long, std::vector<long long, std::allocator<long long> >*, bool)::{lambda()#1}::operator()() const ()
Just three questions:
How is this even possible?
What did I monumentally do wrong...?
Most importantly, how does one fix this?
Side note:
I have had nothing but problems with deallocating memory recently (specificly in this program), so would there possibly be something wrong with MinGW, or is possibly GDB not reading the stack correctly? All debugging symbols are off, and optimation is at 0;
How is this even possible?
With undefined behavior, anything is possible. :) More helpfully, what's probably happening is that your heap has been corrupted (e.g. by a bad memory write somewhere), and the new_allocator<unsigned char>::deallocate() method tried to dereference a bad pointer in the heap's metadata, which caused the crash... but the damage had been silently done sometime earlier in your program's execution.
Another possibility is that clearOnScreenText() tried to call delete on an invalid (but non-NULL) Text * pointer, and so when Text::~Text() tries to run the destructor of the std::vector<char> member variable, it's trying to destroy a "vector object" that is really just arbitrary bytes that are not a valid state for a vector, with catastrophic consequences.
Most importantly, how does one fix this?
If you can run your code on Linux, valgrind is a valuable tool in situations like this. Under Windows, there are similar tools (I think one is called Electric Fence, but I forget what else there is out there). Short of that, you might have to just start playing "twenty questions" with the code, by commenting out various parts of the program until the crash goes away, then adding them back in until the crash comes back, and repeating until you have a better understanding of which parts of the code are required to execute in order to reproduce the crash. Once you've figure out what code to look at, you can start trying figure out what is wrong in the suspect code. Very tedious, but sometimes that is the only way.

Why does gmp crash with "invalid next size" to realloc here?

I have a simple function using the gmp C++ bindings:
#include <inttypes.h>
#include <memory>
#include <gmpxx.h>
mpz_class f(uint64_t n){
std::unique_ptr<mpz_class[]> m = std::make_unique<mpz_class[]>(n + 1);
m[0] = 0;
m[1] = 1;
for(uint64_t i = 2; i <= n; ++i){
m[i] = m[i-1] + m[i-2];
}
return m[n];
}
int main(){
mpz_class fn;
for(uint64_t n = 0;; n += 1){
fn = f(n);
}
}
Presumably make_unique should allocate a fresh array and free it when the function returns since the unique pointer owning it has its lifetime end. Presumably the mpz_class object returned should be a copy and not affected by this array getting deleted. The program crashes with the error:
realloc(): invalid next size
and if I look at the core dump in gdb I get the stack trace:
#0 raise()
#1 abort()
#2 __libc_message()
#3 malloc_printerr()
#4 _int_realloc()
#5 realloc()
#6 __gmp_default_reallocate()
#7 __gmpz_realloc()
#8 __gmpz_add()
#9 __gmp_binary_plus::eval(v, w, z)
#10 __gmp_expr<...>::eval(this, this, p)
#11 __gmp_set_expr<...>(expr, z)
#12 __gmp_expr<...>::operator=<...>(expr, this)
#13 f(n)
#14 main(argc, argv)
This isn't helpful to me, except that it suggests maybe the problem is coming from gmpxx using expression templates (stack frames 9-12 indicate this, valgrind and stack frame 12 put the last line of my code executed before the error at m[1] = 1;). Valgrind says there is an invalid read of size 8 at this line but lists stack entries corresponding to the rest of the trace after it, and then says there is an invalid write at the next instruction. The invalid read is 8 bytes after "a block of size 24 alloc'd [by make_unique]" while the invalid write is to null. Obviously this line should not cause either though as it should only be reading a pointer and then writing to part of the buffer it points to which definitely does not have address 0x0. I decided to use the C++ bindings even though I always use gmp from C because I thought it would be faster to write but this error ensured that was not the case. Is this a problem with gmp or am I allocating the array wrong? I get similar errors if I used new and delete directly or if I manually inline the function call. I feel like the problem may have to do with mpz_class actually storing an expression template and not a proper concretized value.
I'm using GCC 9.2.0 with g++ -std=c++17 -O2 -g -Wall ... and GMP 6.1.2-3.
Neither Clang nor GCC report any errors.
If we run under Valgrind, we see:
==1948514== Invalid read of size 8
==1948514== at 0x489B0F0: __gmpz_set_si (in /usr/lib/x86_64-linux-gnu/libgmp.so.10.3.2)
==1948514== by 0x10945E: __gmp_expr<__mpz_struct [1], __mpz_struct [1]>::assign_si(long) (gmpxx.h:1453)
==1948514== by 0x1094E3: __gmp_expr<__mpz_struct [1], __mpz_struct [1]>::operator=(int) (gmpxx.h:1538)
==1948514== by 0x109248: f(unsigned long) (59678712.cpp:8)
==1948514== by 0x109351: main (59678712.cpp:18)
==1948514== Address 0x4e08ca0 is 8 bytes after a block of size 24 alloc'd
==1948514== at 0x483650F: operator new[](unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1948514== by 0x10953F: std::_MakeUniq<__gmp_expr<__mpz_struct [1], __mpz_struct [1]> []>::__array std::make_unique<__gmp_expr<__mpz_struct [1], __mpz_struct [1]> []>(unsigned long) (unique_ptr.h:855)
==1948514== by 0x10920C: f(unsigned long) (59678712.cpp:6)
==1948514== by 0x109351: main (59678712.cpp:18)
This demonstrates that when we call f(0), we write to m[1], which is out of bounds. That's undefined behaviour, so anything could happen. Luckily you got a crash, rather than something more subtle.
Simple fix:
mpz_class f(uint64_t n) {
if (!n) return 0;
BTW, prefer <cstdint> to <inttypes.h>, and write as std::uint64_t etc.

C/C++ - Why is the heap so big when I'm allocating space for a single int?

I'm currently using gdb to see the effects of low level code. Right now I'm doing the following:
int* pointer = (int*)calloc(1, sizeof(int));
yet when I examine the memory using info proc mappings in gdb, I see the following after what I presume is the .text section (since Objfile shows the name of the binary I'm debugging):
...
Start Addr End Addr Size Offset Objfile
0x602000 0x623000 0x21000 0x0 [heap]
How come the heap is that big when all I did was allocating space for a single int?
The weirdest thing is, even when I'm doing calloc(1000, sizeof(int)) the size of the heap remains the same.
PS: I'm running Ubuntu 14.04 on an x86_64 machine. I'm compiling the source using g++ (yes, I know I shouldn't use calloc in C++, this is just a test).
How come the heap is that big when all I did was allocating space for a single int?
I did a simple test on Linux. When one calls calloc glibc calls at some point sbrk() to get memory from OS:
(gdb) bt
#0 0x0000003a1d8e0a0a in brk () from /lib64/libc.so.6
#1 0x0000003a1d8e0ad7 in sbrk () from /lib64/libc.so.6
#2 0x0000003a1d87da49 in __default_morecore () from /lib64/libc.so.6
#3 0x0000003a1d87a0aa in _int_malloc () from /lib64/libc.so.6
#4 0x0000003a1d87a991 in malloc () from /lib64/libc.so.6
#5 0x0000003a1d87a89a in calloc () from /lib64/libc.so.6
#6 0x000000000040053a in main () at main.c:6
But glibc does not ask OS to get exactly 4 bytes that you have asked. glibc calculates its own size. This is how it is done in glibc:
/* Request enough space for nb + pad + overhead */
size = nb + mp_.top_pad + MINSIZE;
mp_.top_pad is by default 128*1024 bytes so it is the main reason why when you ask for 4 bytes the system allocates 0x21000 bytes.
You can adjust mp_.top_pad with call to mallopt. This is from mallopt's doc:
M_TOP_PAD
This parameter defines the amount of padding to employ when
calling sbrk(2) to modify the program break. (The measurement
unit for this parameter is bytes.) This parameter has an
effect in the following circumstances:
* When the program break is increased, then M_TOP_PAD bytes
are added to the sbrk(2) request.
In either case, the amount of padding is always rounded to a
system page boundary.
So I changed you progam and added mallopt:
#include <stdlib.h>
#include <malloc.h>
int main()
{
mallopt(M_TOP_PAD, 1);
int* pointer = (int*)calloc(1, sizeof(int));
return 0;
}
I set 1 byte padding and according to doc it must be be always rounded to a system page boundary.
So this is what gdb tells me for my program:
Start Addr End Addr Size Offset objfile
0x601000 0x602000 0x1000 0x0 [heap]
So now the heap is 4096 bytes. Exactly the size of my page:
(gdb) !getconf PAGE_SIZE
4096
Useful links:
http://man7.org/linux/man-pages/man3/mallopt.3.html
Since you have mentioned, C/C++, better use the following construct:
int* pointer = new int(1);

What does this std::ostream related stack trace mean?

I'm trying to build a C++ library for an integrated/mobile platform. The platform has a decent set of libs, including stdc++. The library I'm trying to build uses ofstream and whenever it attempts to use a class that depends on ofstream, I get a 'bad_cast' exception.
0 0xb082d9b1 in SignalKill ()
from /home/preet/bbndk-2.0.1/target/qnx6/x86/lib/libc.so.3
1 0xb081aa7e in raise ()
from /home/preet/bbndk-2.0.1/target/qnx6/x86/lib/libc.so.3
2 0xb0818cb8 in abort ()
from /home/preet/bbndk-2.0.1/target/qnx6/x86/lib/libc.so.3
3 0xb87c48bf in __gnu_cxx::__verbose_terminate_handler ()
at ../../../../../libstdc++-v3/libsupc++/vterminate.cc:93
4 0xb87c23d6 in __cxxabiv1::__terminate (
handler=0xb87c47c0 <__gnu_cxx::__verbose_terminate_handler()>)
at ../../../../../libstdc++-v3/libsupc++/eh_terminate.cc:38
5 0xb87c2421 in std::terminate ()
at ../../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
6 0xb87c2563 in __cxxabiv1::__cxa_throw (obj=0x859e710, tinfo=0xb87f4c24,
dest=0xb87c0670 <std::bad_cast::~bad_cast()>)
at ../../../../../libstdc++-v3/libsupc++/eh_throw.cc:83
7 0xb875e88c in std::__throw_bad_cast ()
at ../../../../../libstdc++-v3/src/functexcept.cc:52
8 0xb8798c0d in __check_facet<std::ctype<char> > (__f=<optimized out>)
at /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntox86/i486-pc-nto-qnx6.5.0/pic/libstdc++-v3/include/bits/basic_ios.h:49
9 widen (__c=<optimized out>, this=<optimized out>)
at /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntox86/i486-pc-nto-qnx6.5.0/pic/libstdc++-v3/include/bits/basic_ios.h:440
10 std::endl<char, std::char_traits<char> > (__os=...)
at /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntox86/i486-pc-nto-qnx6.5.0/pic/libstdc++-v3/include/ostream:539
11 0xb8793c2d in std::ostream::operator<< (this=0x84db220,
__pf=0x804f64c <_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_#plt>)
at /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntox86/i486-pc-nto-qnx6.5.0/pic/libstdc++-v3/include/ostream:113
12 0x0805240d in QDecViewport::QDecViewport (this=0x86da6c0, parent=0x0)
at ../qml_osg_viewport/qdecviewport.cpp:12
13 0x08051cca in QDeclarativePrivate::QDeclarativeElement<QDecViewport>::QDeclarativeElement (this=0x86da6c0)
at /usr/local/Trolltech/QtLighthouse-4.8.2-i386/include/QtDeclarative/qdeclarativeprivate.h:83
14 0x08051d3c in QDeclarativePrivate::createInto<QDecViewport> (
memory=0x86da6c0)
at /usr/local/Trolltech/QtLighthouse-4.8.2-i386/include/QtDeclarative/qdeclarativeprivate.h:91
15 0xb8ad5ec5 in ?? ()
16 0x086da6c0 in ?? ()
17 0x00000000 in ?? ()
Frames 7-11 are relevant and the ones I need help understanding. The line of code frame 12 is referring to is just:
OSG_INFO << "Hello OSG" << std::endl;
Where OSG_INFO is a stream redirector used for logging. I'm able to use std::cout in the same way without any issue. Unmangling frame 11 gives me:
__pf=0x804f64c <std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)#plt>)
Which is still pretty cryptic... and I'd understand things going crazy if maybe I was trying to pass something really strange to the ofstream output operator, but its just text. Does anyone have any suggestions?
std::endl has the following behavior, citing C++11 §27.7.3.8/1:
Calls os.put(os.widen('\n')), then os.flush().
Frame 9 says that endl's call to widen is failing, i.e. that OSG_INFO.widen('\n') is failing. widen, in turn, has the following behavior (§27.5.5.3/12):
Returns: use_facet< ctype<char_type> >(getloc()).widen(c)
use_facet itself will throw bad_cast if the facet is not present in the imbued locale (§22.3.2/3), but your stack trace doesn't indicate that this is the case. (Then again, I haven't dug through the libstdc++ internals to verify that it's doing things by the book...)
I assume that __check_facet is called before use_facet (or use_facet was inlined and disappeared from the stack trace), with the same net effect; this implies that OSG_INFO has been imbued with some locale that does not have the std::ctype<char> facet present – bad times!
Alternatively, it may have been imbued with some locale with a facet present that simply doesn't handle widen('\n') gracefully. But there's no way to know for sure, and nothing else we can tell you, without knowing what OSG_INFO is and/or how it's implemented.

What does the second column in the gdb stack trace mean?

I have a stack trace that looks like this:
#3 0x00007fffde86c206 in GetMedia (p_ml=0xb91560, id=<value optimized out>, select=ML_MEDIA, reload=<value optimized out>) at ../../../modules/media_library/sql_media_library.c:1170
#4 0x00007fffde86a7d0 in GetInputItemFromMedia (p_ml=0xb91560, i_media=12276000) at ../../../modules/media_library/sql_media_library.c:1204
#5 0x00007ffff6765eab in ml_CreateInputItem (this=0x7784f0) at ../../../../include/vlc_media_library.h:887
#6 MLModel::popupInfo (this=0x7784f0) at ../../../../modules/gui/qt4/components/playlist/media_library/ml_model.cpp:528
#7 0x00007ffff67a7204 in MLModel::qt_metacall (this=0x7784f0, _c=<value optimized out>, _id=17710, _a=<value optimized out>) at components/playlist/media_library/ml_model.moc.cpp:79
#8 0x00007ffff4ec8e3f in QMetaObject::activate(QObject*, QMetaObject const*, int, void**) () from /usr/lib/libQtCore.so.4
I'm wondering that the second column signifies. Also, what does the lack of it signify? As can be seen, frame #6 does not have this address, and I believe my problem( a segfault ) is being caused due to something related.
That column contains the return address from the called function just above to the caller function on that line. Its lack probably means that the function was inlined.