The program(C++) I am working on is for Service proxy. It is using AsyncGenericService of grpc in its implementation. The program is crashing(segmentation fault) just after calling the RequestCall() method.
Some pieces of code lines are follows:
::grpc::AsyncGenericService service_; // a member variable
auto req = std::make_shared<Request>(); // Request is a struct with context and stream members
service_.RequestCall(&req->getContext(), &req->getStream(), cqueue_.get(), cqueue_.get(), tag); // the program is giving seg fault at this line
Attempt:
I eliminated the possibility that a buggy service to be proxied is causing this.
I am wondering how to proceed debugging after that. As the RequestCall() is inside GRPC, so I want to ask what would be the next step of getting closer to the bug.
Update:
The stacktrace is aligned with the observed crash. As you can see in the frame-1 it is calling the RequestCall(). Below the frame 1, it is program internal functions.
(gdb) bt
#0 0x00007ffff37c61c6 in grpc::ServerInterface::GenericAsyncRequest::GenericAsyncRequest(grpc::ServerInterface*, grpc::GenericServerContext*, grpc::internal::ServerAsyncStreamingInterface*, grpc_impl::CompletionQueue*, grpc_impl::ServerCompletionQueue*, void*, bool) () from /opt/third/grpc/1.28.1/lib/libgrpc++.so.1
#1 0x00007ffff37b58c5 in grpc::AsyncGenericService::RequestCall(grpc::GenericServerContext*, grpc_impl::ServerAsyncReaderWriter<grpc::ByteBuffer, grpc::ByteBuffer>*, grpc_impl::CompletionQueue*, grpc_impl::ServerCompletionQueue*, void*) ()
from /opt/third/grpc/1.28.1/lib/libgrpc++.so.1
Having a callstack for the crash would be helpful to see what's going wrong.
Related
I'm using assert from <cassert> to check invariants in my multithreaded C++11 program. When the assertion fails, I'd like to be able to inspect the state of the failing function, along with still-intact backtrace, variable state, etc. at the time of the failed assertion. The issue seems be some interaction between SIGABRT and my threads, as my std::threads are pthread_killed, presumably by some default signal handler. How can I pause gdb right at the time of the failed assertion?
Here are some things I've tried:
set a catchpoint on SIGABRT. This catch does occur, but it's too late (in __pthread_kill).
defined __assert_fail, which is extern declared in <assert.h>, and set a gdb breakpoint on it. This is never caught so presumably the pthread is being killed before this is called (?).
What's the recommended approach here?
I did the following:
Example programm:
#include <cassert>
void f2()
{
assert(0);
}
void f1()
{
f2();
}
int main()
{
f1();
}
Now I set a breakpoint to f2 in hope I can step down to the assert with stepi later:
gdb > break f2
gdb > run
Breakpoint 11, f2 () at main.cpp:5
gdb > stepi // several times!!!!
0x080484b0 in __assert_fail#plt ()
Ahhh! As we can see stepi goes to symbol which tells us that there is a function with that name. So set simply a breakpoint for __assert_fail#plt
gdb > break __assert_fail#plt
gdb > run
Breakpoint 11, f2 () at main.cpp:5
(gdb) bt
#0 0x080484b0 in __assert_fail#plt ()
#1 0x080485f7 in f2 () at main.cpp:5
#2 0x08048602 in f1 () at main.cpp:10
#3 0x0804861b in main () at main.cpp:15
Works for me!
If you need a breakpoint on assert for some reason, Klaus's answer to break on __assert_fail is absolutely correct.
However, it turns out that setting a breakpoint to see stack traces in gdb on multithreaded programs is simply not necessary at all, as gdb already breaks on SIGABRT and switches the the aborting thread. In my case I had a misconfigured set of libraries that lead to this red herring. If you are trying to see stack traces from aborted code (SIGABRT) in gdb using multithreaded programs, you do not need to do anything in gdb, assuming the default signal handlers are in place.
FYI, you can see the default signal handlers by running info signals, and the same for just SIGABRT by running info signals SIGABRT. On my machine, I see this, which shows that the program will be stopped, etc. If for some reason your SIGABRT signal handler is not set up to stop on SIGABRT, you need to change that setting. More info at https://sourceware.org/gdb/onlinedocs/gdb/Signals.html.
(gdb) info signals SIGABRT
Signal Stop Print Pass to program Description
SIGABRT Yes Yes Yes Aborted
I am getting a segmentation fault when I try running gtest by mocking a method that accepts pointer to a object as the argument. I identified the mock method that is creating the trouble.
class NvmControllerMockApp : NvmController_API
{
public:
MOCK_METHOD1(registerAccessor, bool(NVM_Accessor *accessor));
MOCK_METHOD0(update, void());
}
This is the o/p produced by gtest:
Running main() from gmock_main.cc
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from MeterTamperAppTest
[ RUN ] MeterTamperAppTest.NeutralDisturbanceCheck
Segmentation fault (core dumped)
The MOCK_METHOD1 is what is creating the segmentation fault. If that method is excluded from the file that is to be tested then things seem to work fine. As a word of caution the NVM_Accessor class deals with some pointers. I have tried debugging the error using GDB and the following is the backtrace message at the point of segmentation fault :
Program received signal SIGSEGV, Segmentation fault.
0x00000000004168d3 in testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith (this=0x67f188, untyped_args=0x7fffffffdca0)
at ../src/gmock-spec-builders.cc:363
363 this->UntypedDescribeUninterestingCall(untyped_args, &ss);
(gdb) backtrace
#0 0x00000000004168d3 in testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith (this=0x67f188, untyped_args=0x7fffffffdca0)
at ../src/gmock-spec-builders.cc:363
#1 0x0000000000410fc9 in testing::internal::FunctionMockerBase<bool (NVM_Accessor*)>::InvokeWith(std::tr1::tuple<NVM_Accessor*> const&) (
this=0x67f188, args=...) at /home/sudeep/GramPower/gmock-1.7.0/include/gmock/gmock-spec-builders.h:1530
#2 0x0000000000410c56 in testing::internal::FunctionMocker<bool (NVM_Accessor*)>::Invoke(NVM_Accessor*) (this=0x67f188, a1=0x67f148)
at /home/sudeep/GramPower/gmock-1.7.0/include/gmock/gmock-generated-function-mockers.h:97
#3 0x000000000041076f in NvmControllerMockApp::registerAccessor (this=0x67f180, gmock_a1=0x67f148)
at /home/sudeep/GramPower/gpos_fw/gpos/apps/nvm_controller/mocks/nvm_controller_mock_app.h:26
#4 0x0000000000413470 in MeterTamperApp::MeterTamperApp (this=0x67f128, env_=0x67ee90) at apps/meter_tamper/meter_tamper_app.cpp:31
#5 0x0000000000410989 in MeterTamperAppMockEnvironment::MeterTamperAppMockEnvironment (this=0x67ee90)
at apps/meter_tamper/tests/../mocks/meter_tamper_app_mock_environment.h:23
#6 0x0000000000410a3e in MeterTamperAppTest::MeterTamperAppTest (this=0x67ee80) at apps/meter_tamper/tests/meter_tamper_app_dtest.cpp:30
#7 0x0000000000410b10 in MeterTamperAppTest_NeutralDisturbanceCheck_Test::MeterTamperAppTest_NeutralDisturbanceCheck_Test (this=0x67ee80)
at apps/meter_tamper/tests/meter_tamper_app_dtest.cpp:36
I had a similar issue - segmentation fault on instantiation of mock classes. I build gmock and gtest as static libraries.
The problem has been solved by passing the -Dgtest_disable_pthreads=OFF option to cmake.
Hope this will help someone else.
The solution is quite easy: Use the current git version.
Related comments and what was wrong with the 1.7.0 version of gmock can be found here:
gcc 6.1.0 segmentation fault - gcc bug?
and the bug report for google test can be found here:
https://github.com/google/googletest/issues/705
The last link also provides a fix which can be merged into 1.7.0 without checking out the current git repo.
Probably your object files were generated wrong. Remove all object files and compile from scratch.
I faced the same issue.
In my case this happened because "EXPECT_EQ" is not interrupted test execution:
std::vector<int> ret = some_call(); //here the empty vector intializing "ret"
EXPECT_EQ(ret.size(), 1); //here is failure
EXPECT_EQ(ret[0], expectedResult); //here is segmentation. Author expected test termination one line above
.. I'm going to dive deep into gtest docs..
EDIT: juan.facorro pointed me to the real issue, which is that when the server isn't running, mytransport->open() calls GlobalOutput.perror("error code") in TSocket.cpp. But in my code, mytransport->open() was called before GlobalOutput was initialized
see this link for more info
I have a shared_ptr called mytransport, and I declare it like so:
shared_ptr<TTransport> mytransport(new TBufferedTransport(socket));
but when I call mytransport->open(); I get a segmentation fault, and the top of the stack trace says:
#0 0x00000000 in ?? ()
#1 0x08068281 in apache::thrift::TOutput::perror (this=0x807a44c, message=0x9dc0e14 "TSocket::open() connect() <Host: localhost Port: 9090>", errno_copy=111) at src/thrift/Thrift.cpp:65
#2 0x080670eb in perror (errno_copy=<optimized out>, message=..., this=<optimized out>) at ./src/thrift/Thrift.h:123
#3 apache::thrift::transport::TSocket::openConnection (this=0xbfe69ea0, res=0xbfe69e9c) at src/thrift/transport/TSocket.cpp:277
I don't quite understand the "->" operator, but it seems like mytransport is pointing to a NULL object. Any ideas?
EDIT: If I put the code into the main class, it runs normally and gives me the error I want:
TSocket::open() connect() <Host: localhost Port: 9090>Connection refused
(see #1 on the stack trace). However, when I put the code into a class inside a library (that the main class uses), that's when I get the segmentation fault. So it might be some sort of scope issue?
Based on the stacktrace and after doing some research on the code for the TSocket.cpp, line 182 shows the exact same error message on the openConnection() method. errno_copy get its value from errno that has the value 111. According to this, that value corresponds to ECONNREFUSED. So I would check the connection on the other end.
I'm running into a weird bus error when trying to create an object in C++. This is my gdb backtrace when the program crashes:
#0 0xff146ff4 in _malloc_unlocked () from /usr/lib/libc.so.1
#1 0xff146e40 in malloc () from /usr/lib/libc.so.1
#2 0x24430 in __builtin_new (sz=128) at /usr/local/src/gcc-2.95.1/gcc/cp/new1.cc:84
#3 0x1e71c in FileHeader::Allocate (this=0x3f5d8, freeMap=0x3eea0, fileSize=5719)
at ../filesys/filehdr.cc:63
#4 0x1f61c in FileSystem::Create (this=0x3d8b8, name=0xffbff8f3 "test", initialSize=5719)
at ../filesys/filesys.cc:200
#5 0x1ffac in Copy (from=0xffbff8e4 "assignment 2.c", to=0xffbff8f3 "test")
at ../filesys/fstest.cc:52
#6 0x15150 in main (argc=3, argv=0xffbff768) at ../threads/main.cc:116
The relevant line of code from filehdr.cc is:
IndirectHeader * s;
s = new IndirectHeader;
It crashes on the second line. I thought it might be that I wasn't explicitly using my own constructor, but adding one didn't seem to help. It seems to me like there's some other simple problem i'm not noticing but i haven't been able to find it.. Any advice would be appreciated.
What you're seeing in the backtrace is a crash allocating the memory to back your IndirectHeader. It hasn't even started constructing the object yet because it's still trying to allocate memory for it. Most likely there is a bug earlier in your program, that has corrupted the heap.
When I look through a linux kernel OOPS output, the EIP and other code address have values in the range of 0xC01-----. In my System.map and objdump -S vmlinux output, all the code addresses are at least above 0xC1------. My vmlinux has debug symbols included (CONFIG_DEBUG_INFO).
When I debug over a serial connection (kgdb), and I load gdb with gdb ./vmlinux, again I have the same issue that I cannot reconcile $eip with what I have in System.map and objdump output. When I run where in gdb, I get a jumbled mess on the stack:
#0 0xC01----- in ?? ()
#1 0xC01----- in ?? ()
#2 0xC01----- in ?? ()
...
Can anyone make any suggestions on how to resolve this/these issues? My main concern is how I actually map an eip value from an OOPS to System.map or objdump -S vmlinux. I know that the OOPS will give me the function name and offset into the object code, but I am more concerned about the previously mentioned issue and why gdb can't correctly display a stack backtrace.
Looks like the OOPS is because you jumped into a place that's not a function.
This would easily cause a crash, and would also prevent the debugger from resolving the address as a symbol.
You can check this by disassembling the area around this EIP. If I'm correct, it won't make sense as machine code.
There are generally two causes for such things:
1. Function call using a corrupt function pointer. In this case, the stack frame before the last should show the caller. But you don't have this frame, so it may be the other reason.
2. Stack overrun - your return address is corrupt, so you've returned to a bad location. If it's so, the data ESP points to should contain the address in EIP. Debugging stack overruns is hard, because the most important source of information is missing. You can try to print the stack in "raw" format (x/xa addr), and try to make sense of it.