I have the following fragment of code, that accumulates BSON documents from a query cursor, to be processed afterwards:
// Accumulate
std::vector<BSONObj> results;
while (cursor->more()) {
BSONObj r = cursor->nextSafe();
results.push_back(r);
}
...
// Process it (example)
for (unsigned int ix = 0; ix < results.size(); ix++) {
BSONElement be = results[ix].getField("_id");
// Do somtething with 'be'
...
}
This code has worked well from a time (months) but we have recently found that with large documents in DB (around 1.1MB) the results[ix].getField("_id") statement crashes with segfault. This is the top of backtrace:
(gdb) bt
#0 readNative<int> (offset=0, t=<synthetic pointer>, this=<optimized out>) at src/mongo/base/data_view.h:46
#1 readNative<int> (offset=0, this=<optimized out>) at src/mongo/base/data_view.h:53
#2 readLE<int> (offset=0, this=<optimized out>) at src/mongo/base/data_view.h:59
#3 objsize (this=0x7f74340022e0) at src/mongo/bson/bsonobj.h:309
#4 BSONObjIterator (jso=..., this=<synthetic pointer>) at src/mongo/bson/bsonobjiterator.h:42
#5 mongo::BSONObj::getField (this=0x7f74340022e0, name=...) at src/mongo/bson/bsonobj.cpp:635
...
I have solved the problem using results.push_back(r.copy()) instead of results.push_back(r). Thus, probably the error was caused when the r object is destroyed at the end of the while block scope, leaving the copy pushed back in the vector in an unstable state. Pushing back a copy of r without hitting the block scope as a new variable seems to solve the problem.
So, I have the following questions:
What is the best way of storing BSONObj got from a query result in a std::vector? I think I have found a reasonable solution, but not sure if this is the best one.
Why the code using push_back(r) works with small documents? If the right way is to use r.copy() to avoid problems destroying r at the end of while block scope, I understand it should fail always, not only in the case of objects of around 1.1MB.
I'm using MongoDB C++ driver legacy-1.0.7 (in the case it may help or the problem could be related with specific versions of MongoDB C++ driver).
The BSONObj objects returned by nextSafe do not own their data, and are invalidated by subsequent calls to nextSafe.
So, your vector becomes populated with invalid BSONObj objects.
Instead, call BSONObj::getOwned() on the cursor result before pushing back in the vector.
If you run your program under AddressSanitizer or valgrind, you will almost certainly see use-after-free type errors.
Related
The program(C++) I am working on is for Service proxy. It is using AsyncGenericService of grpc in its implementation. The program is crashing(segmentation fault) just after calling the RequestCall() method.
Some pieces of code lines are follows:
::grpc::AsyncGenericService service_; // a member variable
auto req = std::make_shared<Request>(); // Request is a struct with context and stream members
service_.RequestCall(&req->getContext(), &req->getStream(), cqueue_.get(), cqueue_.get(), tag); // the program is giving seg fault at this line
Attempt:
I eliminated the possibility that a buggy service to be proxied is causing this.
I am wondering how to proceed debugging after that. As the RequestCall() is inside GRPC, so I want to ask what would be the next step of getting closer to the bug.
Update:
The stacktrace is aligned with the observed crash. As you can see in the frame-1 it is calling the RequestCall(). Below the frame 1, it is program internal functions.
(gdb) bt
#0 0x00007ffff37c61c6 in grpc::ServerInterface::GenericAsyncRequest::GenericAsyncRequest(grpc::ServerInterface*, grpc::GenericServerContext*, grpc::internal::ServerAsyncStreamingInterface*, grpc_impl::CompletionQueue*, grpc_impl::ServerCompletionQueue*, void*, bool) () from /opt/third/grpc/1.28.1/lib/libgrpc++.so.1
#1 0x00007ffff37b58c5 in grpc::AsyncGenericService::RequestCall(grpc::GenericServerContext*, grpc_impl::ServerAsyncReaderWriter<grpc::ByteBuffer, grpc::ByteBuffer>*, grpc_impl::CompletionQueue*, grpc_impl::ServerCompletionQueue*, void*) ()
from /opt/third/grpc/1.28.1/lib/libgrpc++.so.1
Having a callstack for the crash would be helpful to see what's going wrong.
I'm programming on my Raspberry Pi 3B+, with a Raspbian OS installed.
I'm having trouble with this line of code:
clusterPoints.insert(clusterPoints.begin(), cloudPtPointer);
Where clusterPoints is a vector of the type cloudPoint pointer, that I created, and cloudPtPointer is the pointer to the cloudPoint that I want to insert. This is my cloudPoint struct:
struct cloudPoint {
double realHeight;
cv::Point3f point;
cloudPoint* nextPoint;
} ;
This code has been working for at least 2 weeks. Then I changed some stuff on another part of the project, and it started giving me this error
"malloc(): invalid next size (unsorted)"
This error doesn't occur in the first 8 times that line of code is executed. I checked my variables, memory address and vector, and everything is working as intended, but I can't figure out what's causing a memory corruption.
EDIT:
I cannot paste my code, but this are the lines of code that use/have influence on the vector. Sorry in advance for this, but its the only way I can provide you more information. The code has a while (loop 4#) inside a for(loop #3) inside a for(loop #2) inside a for (loop # 1). Loop #2 and #3 are used to go through a 2d map, and Loop 4# is used to add all connected points to a vector:
std::vector<cloudPoint*> clusterPoints; --> In the beggining of the function
clusterPoints.clear(); --> In the beggining of Loop #1
clusterPoints.insert(clusterPoints.begin(), cloudPtPointer); --> At Loop 3#
clusterPoints.insert(clusterPoints.begin(), connectedPointsSet.begin(), connectedPointsSet.end()); ---> after Loop #4 (inside loop #3).
That connectedPointsSet, is a vector of cloudPoints. To construct that vector I push back a set of points that are pointing to the next point, until there's nothing pointing to.
EDIT 2:
The change that caused the error was in 2 functions. I had to change its headers so the date type would be compatible. This are the old headers:
std::vector<Segmentation::cloudPoint>build2DTo3D(cv::Mat&);
bool transform2GlobalRef(std::vector<cv::Point3f>&);
And This are the new ones:
std::vector<cv::Point3f> build2DTo3D(cv::Mat&);
std::vector<Segmentation::cloudPoint> transform2GlobalRef(std::vector<cv::Point3f>&)
I'm running into a weird bus error when trying to create an object in C++. This is my gdb backtrace when the program crashes:
#0 0xff146ff4 in _malloc_unlocked () from /usr/lib/libc.so.1
#1 0xff146e40 in malloc () from /usr/lib/libc.so.1
#2 0x24430 in __builtin_new (sz=128) at /usr/local/src/gcc-2.95.1/gcc/cp/new1.cc:84
#3 0x1e71c in FileHeader::Allocate (this=0x3f5d8, freeMap=0x3eea0, fileSize=5719)
at ../filesys/filehdr.cc:63
#4 0x1f61c in FileSystem::Create (this=0x3d8b8, name=0xffbff8f3 "test", initialSize=5719)
at ../filesys/filesys.cc:200
#5 0x1ffac in Copy (from=0xffbff8e4 "assignment 2.c", to=0xffbff8f3 "test")
at ../filesys/fstest.cc:52
#6 0x15150 in main (argc=3, argv=0xffbff768) at ../threads/main.cc:116
The relevant line of code from filehdr.cc is:
IndirectHeader * s;
s = new IndirectHeader;
It crashes on the second line. I thought it might be that I wasn't explicitly using my own constructor, but adding one didn't seem to help. It seems to me like there's some other simple problem i'm not noticing but i haven't been able to find it.. Any advice would be appreciated.
What you're seeing in the backtrace is a crash allocating the memory to back your IndirectHeader. It hasn't even started constructing the object yet because it's still trying to allocate memory for it. Most likely there is a bug earlier in your program, that has corrupted the heap.
When analyzing a core dumped after a SIGABRT, gdb says that my last line of code executed (before entering library code) is a NULL assignment to a char pointer, as shown below:
gdb:
(gdb) bt full
#0 0x006337a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
No symbol table info available.
#1 0x00674815 in raise () from /lib/tls/libc.so.6
No symbol table info available.
#2 0x00676279 in abort () from /lib/tls/libc.so.6
No symbol table info available.
#3 0x006a8cca in __libc_message () from /lib/tls/libc.so.6
No symbol table info available.
#4 0x006af55f in _int_free () from /lib/tls/libc.so.6
No symbol table info available.
#5 0x006af93a in free () from /lib/tls/libc.so.6
No symbol table info available.
#6 0x00d0b14e in __builtin_delete () from /usr/lib/libstdc++-libc6.1-1.so.2
No symbol table info available.
#7 0x0808181c in MyObject::~MyObject (this=0x84f4db0, __in_chrg=3) at ./MyObject.cpp:16
this = (MyObject *) 0x84f4db0
MyObject.cpp:16 listing:
12: ...
13: MyObject::~MyObject() {
14: if (this->string != NULL) {
15: delete this->string;
16: this->string = NULL;
17: }
18: }
19: ...
First of all, I do not understand why the line 16 would result in that call stack. It would make more sense if it was a result of the execution of line 15, the one with the delete operator (unless "line 16" represents code executed after the destructor's code to free the memory allocated for that object; just guessing here).
Other than that, can anyone point the way to correctly debug that core?
What type does this->string have? Is it a char array? Then you should use delete [] this->string. Is it a pointer to an object? Then that object is either already deleted and the pointer was not nulled, or the object has never been created and the pointer was left unitialized.
The actual crash happened on this line:
15: delete this->string;
The crash happened due to to call to abort inside __libc_message. That last routine printed a message to your standard error, and the message looked something like
*** glbc detected: double free or heap corruption at ... ***
Use Valgrind or AddressSanitizer: they'll point you straight at the problem.
I do not understand why the line 16 would result in that call stack.
When you are looking at call stack that led to the raise system call, you need to understand that the CALL instruction puts the address of the next instruction to be executed on the stack, before transferring control to the called procedure, and it is that next instruction that GDB shows you in the backtrace (all debuggers do that). That next instruction may be on the current line, the next line, or 20 lines down.
It points to the next line that is about to be executed, which is line 16 in your case, the last executed statement/expression was line 15 and it crashed on that line.
Hard to tell from your posting what is wrong here though.
I'm working on an embedded platform (architecture is SH4), and my program crashed a few minutes ago with a SIGABRT.
Luckily, I was running under gdbserver, and the thread that was interrupted by this signal has this stack dump:
#0 0x2a7f1678 in raise () from /home/[user]/target/lib/libc.so.6
#1 0x2a7f2a4c in abort () from /home/[user]/target/lib/libc.so.6
#2 0x2a81ade0 in __libc_message () from /home/[user]/target/lib/libc.so.6
#3 0x2a81f3a8 in malloc_printerr () from /home/[user]/target/lib/libc.so.6
#4 0x2a8c3700 in _IO_wide_data_2 () from /home/[user]/target/lib/libc.so.6
Do you know what happened here? A bad free()? bad delete ? bad malloc?
What's "_IO_wide_data_2" supposed to do?
I see the malloc_printerr() call that I don't understand either.
Google gives me 234 results on this, but all of them are simply because the guys have that "function" in their backtrace.
It is a stream to stderr for wide character support.
You can break it down into various parts:
_IO : Input/Output.
wide_data : Wide data
2 : stderr
You also have;
_IO_wide_data_0 : stdin
_IO_wide_data_1 : stdout
They are chained as 2->1->0.
malloc_printerr() is used to print various error messages when there is something bad happening/caught in dynamic memory management. But your trace looks capped (have you removed anything?).
It could be a write to stderr where you try to write something not in memory, in corrupted memory, in …
Or it could be lower stack point causing write to stderr.
Or …
A bad free()? bad delete ? bad malloc?
Yes I think it's one of these.
If the bug is easy reproducible, put a breakpoint in malloc.c, malloc_printerr. When debugger stops there, You'll probably get full call stack and find the buggy place in Your code. I still don't know why it happens, that after entering __libc_message, the call stack gets broken.
There is how I found this strange behaviour.
Simple app that deletes the same buffer twice:
void main()
{
char * buf = new char[4*1024];
delete[] buf;
delete[] buf;
}
Inside malloc_printerr the call stack looks like this:
#0 malloc_printerr (action=3, str=0x297d0b5c "double free or corruption (top)", ptr=<value optimized out>) at malloc.c:5887
#1 0x29750be8 in __libc_free (mem=0x411008) at malloc.c:3622
#2 0x29612c70 in operator delete (ptr=<value optimized out>) at ../../../../libstdc++-v3/libsupc++/del_op.cc:49
#3 0x29612cc2 in operator delete[] (ptr=<value optimized out>) at ../../../../libstdc++-v3/libsupc++/del_opv.cc:37
#4 0x0040068a in main (argc=1, argv=0x7bb26814) at double_free.cpp:47
After entering __libc_message:
#0 __libc_message (do_abort=2, fmt=0x297d09c8 "*** glibc detected *** %s: %s: 0x%s *** ") at ../sysdeps/unix/sysv/linux/libc_fatal.c:50
#1 0x2974f3a8 in malloc_printerr (action=3, str=0x297d0b5c "double free or corruption (top)", ptr=<value optimized out>) at malloc.c:5887
#2 0x297f3700 in _IO_wide_data_2 () from /cygdrive/c/STM/SH4-Linux-gcc/opt/STM/STLinux2.3/devkit/sh4/target/lib/libc.so.6
Backtrace stopped: frame did not save the PC
Maybe it has something to do with attribute((noreturn)) and compiler optimization?
Can you reproduce this error while running under GDB? You might get more stack trace information using the various "Stack" commands found here:
GDB Cheat Sheet
You might need to move up or down a few stack frames to determine what happened.