I am trying to debug my ros code in gdb, how ever, when I start the node in gdb, it always gives me:
Starting program: /home/uav/catkin_ws/devel/lib/my_package/my_node
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-
gnu/libthread_db.so.1".
[New Thread 0x7ffff13b9700 (LWP 28089)]
[New Thread 0x7ffff0bb8700 (LWP 28090)]
[New Thread 0x7fffebfff700 (LWP 28091)]
[New Thread 0x7fffeb7fe700 (LWP 28096)]
[New Thread 0x7fffeaffd700 (LWP 28098)]
[New Thread 0x7fffea7fc700 (LWP 28121)]
and hangs forever. I don't see this issue before and have no clue why it always start in multi-threads mode. My main function looks like this:
int main(int argc, char** argv)
{
ros::init(argc, argv, "my_node");
ros::NodeHandle nodeHandle("~");
ros::Rate rate(10);
while (ros::ok()) {
// Do Something
ros::spinOnce();
rate.sleep();
}
return 0;
}
How can I fix this problem? I am thinking I should make my node to a single thread version and debug it, how am I supposed to do that?
There are two ways to talk about multithreading in ROS.
The whole node
The callbackQueue.
In the first case, when you use:
ros::NodeHandle nodeHandle("~");
ros::ok()
...
It does some requests to the ROS Master. And with roscpp the network communication is handled with multithreading. There is no way to change that. If single threading is very important for you, you should try rospy, rosnodejs or others.
"roscpp does not try to specify a threading model for your application. This means that while roscpp may use threads behind the scenes to do network management, scheduling etc., it will never expose its threads to your application."
In the second case, we talk about handling topics, services and actions with multithreading. By default, a topic is handled with one thread. But if you send more data that your node can handle it, you can use multithreading.
"roscpp does, however, allow your callbacks to be called from any number of threads if that's what you want."
For mode details, see:
http://wiki.ros.org/roscpp/Overview/Callbacks%20and%20Spinning
Related
So I fire up my c++ application in GDB, and when it quits, I basically get:
[Thread 0x7fff76e07700 (LWP 6170) exited]
[Thread 0x7fff76f08700 (LWP 6169) exited]
[Thread 0x7fff77009700 (LWP 6168) exited]
...
Program terminated with signal SIGKILL, Killed. The program no longer exists.
(gdb)
I literally have no idea why this is occuring, why can't I do a backtrace to see how it exited? Anyone have any ideas? It should never end :(
Thanks!
I literally have no idea why this is occuring,
This usually means that either
some other process executed a kill -9 <your-pid>, or
the kernel OOM killer decided that your process consumed too many resources, and terminated it (effectively the kernel executed kill -9 for it). You should look in /var/log/messages (/var/log/syslog on Ubuntu variants) for traces of that -- the kernel usually logs a message when it OOMs some process.
why can't I do a backtrace to see how it exited?
Because in order to see a backtrace, the process must exist. If it doesn't exist, it doesn't have stack, and so can't have backtrace.
If you are using Unix/Linux you should also be able to type dmesg on your terminal and see the cause of the process terminating. In my case it was indeed OOM. here is a screenshot of my kernel log shortly after the termination
It is possible that the process ran into the cpu time ulimit. Check with ulimit -a from the environment where the process is actually started if "cpu time" is set to anything other than "unlimited"
In my case was a crash (AV). Even with GDB attached I couldn't catch this violation.
Hope it helps
I need to run my client application (written in c++ with gRPC) in an operating system (which only support single thread).
However, I noticed that grpc::InsecureChannelCredentials(); is trying to create multiple threads. Here is the output from debugger after calling that gRPC function in my host machine:
[New Thread 0x7ffff524a700 (LWP 3709)]
[New Thread 0x7ffff524a700 (LWP 3710)]
[New Thread 0x7ffff524a700 (LWP 3711)]
This will cause the program crash inside the single thread OS.
My question is: is there a way to configure gPRC using only single thread, or make cpp executable run only with single thread? Thanks in advance.
btw, here is the link to the os mentioned above and the issue explains why it only support single thread.
https://github.com/lsds/sgx-lkl/issues/1
EDIT:
It's actually not allowing multi-process instead of multi-thread applicaiton. gRPC seems like doing fork inside its core lib. I'm wondering if there is a way to configure gRPC to disable process forking.
I have the following problem: I want to recover control of gdb when a process enters a blocking situation i.e. a blocking function or a pooling loop.
Lets illustrate it with an example: I have process A which forks process B. B does its work and then gets stuck waiting for an event from A. I want to switch GDB to A so I can run it separately until the event generation. However, I can not recover control of GDB from B. Of course I can ctrl+C in B which generates a SIGINT signal, and then change to A, but when I go back to B, even if I handle pass SIGINT, B finishes.
Log:
Program received signal SIGINT, Interrupt.
[Switching to Thread 0xb68feb40 (LWP 3177)]
0xb7fdeb0c in ?? ()
(gdb) handle SIGINT pass
SIGINT is used by the debugger.
Are you sure you want to change it? (y or n) y
Signal Stop Print Pass to program Description
SIGINT Yes Yes Yes Interrupt
(gdb) c
Continuing.
[Thread 0xb7abcb40 (LWP 3178) exited]
[Thread 0xb68feb40 (LWP 3177) exited]
Couldn't get registers: No such process.
(gdb) info inferiors
Num Description
* 2 <null>
1 process 3168
Is there a way to recover control of GDB and switch process without killing it?
So I fire up my c++ application in GDB, and when it quits, I basically get:
[Thread 0x7fff76e07700 (LWP 6170) exited]
[Thread 0x7fff76f08700 (LWP 6169) exited]
[Thread 0x7fff77009700 (LWP 6168) exited]
...
Program terminated with signal SIGKILL, Killed. The program no longer exists.
(gdb)
I literally have no idea why this is occuring, why can't I do a backtrace to see how it exited? Anyone have any ideas? It should never end :(
Thanks!
I literally have no idea why this is occuring,
This usually means that either
some other process executed a kill -9 <your-pid>, or
the kernel OOM killer decided that your process consumed too many resources, and terminated it (effectively the kernel executed kill -9 for it). You should look in /var/log/messages (/var/log/syslog on Ubuntu variants) for traces of that -- the kernel usually logs a message when it OOMs some process.
why can't I do a backtrace to see how it exited?
Because in order to see a backtrace, the process must exist. If it doesn't exist, it doesn't have stack, and so can't have backtrace.
If you are using Unix/Linux you should also be able to type dmesg on your terminal and see the cause of the process terminating. In my case it was indeed OOM. here is a screenshot of my kernel log shortly after the termination
It is possible that the process ran into the cpu time ulimit. Check with ulimit -a from the environment where the process is actually started if "cpu time" is set to anything other than "unlimited"
In my case was a crash (AV). Even with GDB attached I couldn't catch this violation.
Hope it helps
I'm experiencing an inconsistent behavior of a program that's parallelized using OpenMP.
When I run it, it prints out its current stage, so the expected output is: "2 3 4 5" etc.
Time between the first few stages is usually 1 to 2 seconds (when running in parallel on 4 cores).
However, without recompiling, or altering anything, sometimes when I run the software it hangs right after printing 2 (which is printed before the first parallel code is executed);
It doesn't become slow, it literally stops computing. I've run this under gdb and confirmed that it hangs inside of OpenMP:
(there are more than 4 threads because of hyperthreading)
[New Thread 0x7ffff6c78700 (LWP 25878)]
[New Thread 0x7ffff6477700 (LWP 25879)]
[New Thread 0x7ffff5c76700 (LWP 25880)]
[New Thread 0x7ffff5475700 (LWP 25881)]
[New Thread 0x7ffff4c74700 (LWP 25882)]
[New Thread 0x7ffff4473700 (LWP 25883)]
[New Thread 0x7ffff3c72700 (LWP 25884)]
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7641fd4 in ?? () from /usr/lib/libgomp.so.1
(gdb) up
#1 0x00007ffff7640a9e in ?? () from /usr/lib/libgomp.so.1
(gdb)
#2 0x0000000000408ae8 in Redcraft::createStructures (this=0x7fffffffd8d0) at source/redcraft.cpp:512
512 #pragma omp parallel for private(node)
Originally the pragma specified schedule(dynamic) but having that or removing that doesn't change the consistency of this hangup.
Lastly, I tried enabling/disabling omp_set_dynamic() and that had no effect either.
Any suggestions for debugging?
This usually happens when there is data race.You'll have to post the code block that is being parallelized.Basically what is to be found out is how the threads are using the data.Rerunning without compiling doesn't guarantee the same thread execution sequence hence these kind of problems arise.Are you working with files?You'll have to close them before rerunning.