switching block focus in cuda-cdb - gdb

Pretty simple... I want to change focus in cuda-gdb. I can change to a different thread within the current block (block 0), but not to a different block. I'm using cuda/cuda-gdb 3.0
The way in the 3.0 manual:
(cuda-gdb) cuda block
Current CUDA focus: block (0,0).
(cuda-gdb) cuda block (9,0)
CUDA focus unchanged.
(cuda-gdb) cuda thread (9,0,0)
New CUDA focus: device 0, sm 1, warp 0, lane 9, grid 42672, block (0,0), thread (9,0,0).
or the other way (from the 3.2 manual):
(cuda-gdb) thread
[Current Thread 2 (Thread 140272898447104 (LWP 28681))]
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
(cuda-gdb) thread <<<(9),(10)>>>
Switching to <<<(9,0),(10,0,0)>>> 0x000000000246a5c8 in my_kernel
<<<(16,1),(128,1,1)>>> ...
(cuda-gdb) thread
[Current Thread 2 (Thread 140272898447104 (LWP 28681))]
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
(cuda-gdb) thread <<<20>>>
Switching to <<<(0,0),(20,0,0)>>> 0x000000000246a5c8 in my_kernel
<<<(16,1),(128,1,1)>>> ...
(cuda-gdb) thread
[Current Thread 2 (Thread 140272898447104 (LWP 28681))]
[Current CUDA Thread <<<(0,0),(20,0,0)>>>]
What am I doing wrong?
cuda 3.0 | ubuntu 9.04 | gtx 480

If you run info cuda sm (IIRC) you can see the currently active blocks. It's not possible to switch to a block (or a warp within a block) that has already completed execution.
If you want to look at a specific block then you should be able to break on the kernel function itself, then change focus, then continue the debugging session.

Related

Avoid reusing thread ids in C++

I noticed that based on this, Linux reuses the thread ids of terminated threads instead of generating new ones. For some reason, I need to avoid this behavior. How can I make sure that newly created threads, will have a freshly generated thread id instead of reusing the old ones?
(Update for interested people: I'm working on a DNN scheduler for GPU using PyTorch's C++ API, I need to create a new thread to call each layer/operation, and whenever the newly created thread shares the thread id with a terminated thread, I get CUDNN_STATUS_MAPPING_ERROR. I have reached this after a long time and if I can create threads with unique ids, I might be able to track down the main reason behind this.)
Update 2: POSIX Thread avoids generating new thread ids (thread objects in glibc implementation) as long as there are terminated threads to reuse, I want to avoid this behavior. Maybe somehow deallocating terminated thread would solve this problem. But I don't know how.
Update 3: Based on lines 84-97 in link, Linux tends to reuse previously allocated but terminated threads. Is it somehow possible to deallocate these threads to prevent from reusing previous thread ids?
There would be a way to avoid the stack allocation for terminated threads to be reused, you will have to self-allocate the stack memories. The pthread_attr_setstack could help. Notice that it add complexity to handle the buffer overflowed and the responsibilities now belong to API users
Following is some tests that I have made by play arround the POSIX thread library
Created thread id: 139938008069888 in __pthread_create_2
Created thread id: 139937999677184 in __pthread_create_2
Created thread id: 139937999677184 in __pthread_create_2
Created thread id: 139938008069888 in __pthread_create_2
Thread 1 : db42f700
Thread 2 : dac2e700
Thread 3 : dac2e700
Thread 4 : db42f700
As result, the stack is preserved for thread 3,4
With the self-allocated stack
Created thread id: 139891916830464 in __pthread_create_2
Set stackaddr to 139891916849184
Set stacksize to 32768
Created thread id: 139891916879616 in __pthread_create_2
Set stackaddr to 139891916898352
Set stacksize to 32768
Created thread id: 139891916928768 in __pthread_create_2
Set stackaddr to 139891916947520
Set stacksize to 32768
Created thread id: 139891916977984 in __pthread_create_2
Thread 1 : 139891916830464
Thread 2 : 139891916879616
Thread 3 : 139891916928768
Thread 4 : 139891916977984

gdb stack trace shows no backtrace - Selected thread is running

I have a program with multiple threads. This program has been compiled with the -g option and has debug symbols.
(gdb) i threads
Id Target Id Frame
1 Thread 0x7fbb5256c9c0 (LWP 15799) "main" 0x00007fbb40dfdc93 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
2 Thread 0x7fbb149ff700 (LWP 15858) "IOService" (running)
3 Thread 0x7fbb135ff700 (LWP 15860) "EvoAftManBt-mai" (running)
4 Thread 0x7fbb0cfff700 (LWP 15868) "main" (running)
5 Thread 0x7fbaecfff700 (LWP 15873) "myTimer-0" (running)
* 6 Thread 0x7fbadffff700 (LWP 15882) "stats-root" (running)>>>>>
What seems odd is the status at the end of each stack frame which says running...
Now if I switch to the thread I am interested in:
(gdb) thread 6
[Switching to thread 6 (Thread 0x7fbadffff700 (LWP 15882))](running)
(gdb) bt
Selected thread is running
No backtrace is printed.
However, for the same binary if I strip off the symbols, I can see the stack trace which is baffling.
Any pointers?

using logical core in program

Consider this code for setting thread affinity on a specific processor core:
pthread_attr_t attr;
cpu_set_t cpu;
CPU_ZERO(&cpu);
CPU_SET(CoreNumber, &cpu);
pthread_attr_init(&attr);
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
pthread_attr_setaffinity_np(&attr,sizeof(cpu_set_t),&cpu);
pthread_attr_setschedpolicy(&attr,SCHED_FIFO);
pthread_create(Thread,&attr,func,param);
My system has 4 physical cores, and each core has 2 logical cores. With this code when my core-number is 4, every thread runs on a separate core. For instance, thread 1 runs on core 0, thread 2 run on core 2, etc.
I want change the affinity such that two threads are ran on each core. For instance, thread 1 and thread 2 run on core 1's two logical cores, and thread 3 and thread 4 run on core 2's two logical cores.
Is that possible? How should I change above code?

GDB: recover control from blocked process

I have the following problem: I want to recover control of gdb when a process enters a blocking situation i.e. a blocking function or a pooling loop.
Lets illustrate it with an example: I have process A which forks process B. B does its work and then gets stuck waiting for an event from A. I want to switch GDB to A so I can run it separately until the event generation. However, I can not recover control of GDB from B. Of course I can ctrl+C in B which generates a SIGINT signal, and then change to A, but when I go back to B, even if I handle pass SIGINT, B finishes.
Log:
Program received signal SIGINT, Interrupt.
[Switching to Thread 0xb68feb40 (LWP 3177)]
0xb7fdeb0c in ?? ()
(gdb) handle SIGINT pass
SIGINT is used by the debugger.
Are you sure you want to change it? (y or n) y
Signal Stop Print Pass to program Description
SIGINT Yes Yes Yes Interrupt
(gdb) c
Continuing.
[Thread 0xb7abcb40 (LWP 3178) exited]
[Thread 0xb68feb40 (LWP 3177) exited]
Couldn't get registers: No such process.
(gdb) info inferiors
Num Description
* 2 <null>
1 process 3168
Is there a way to recover control of GDB and switch process without killing it?

OpenMP - Hanging during execution

I'm experiencing an inconsistent behavior of a program that's parallelized using OpenMP.
When I run it, it prints out its current stage, so the expected output is: "2 3 4 5" etc.
Time between the first few stages is usually 1 to 2 seconds (when running in parallel on 4 cores).
However, without recompiling, or altering anything, sometimes when I run the software it hangs right after printing 2 (which is printed before the first parallel code is executed);
It doesn't become slow, it literally stops computing. I've run this under gdb and confirmed that it hangs inside of OpenMP:
(there are more than 4 threads because of hyperthreading)
[New Thread 0x7ffff6c78700 (LWP 25878)]
[New Thread 0x7ffff6477700 (LWP 25879)]
[New Thread 0x7ffff5c76700 (LWP 25880)]
[New Thread 0x7ffff5475700 (LWP 25881)]
[New Thread 0x7ffff4c74700 (LWP 25882)]
[New Thread 0x7ffff4473700 (LWP 25883)]
[New Thread 0x7ffff3c72700 (LWP 25884)]
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7641fd4 in ?? () from /usr/lib/libgomp.so.1
(gdb) up
#1 0x00007ffff7640a9e in ?? () from /usr/lib/libgomp.so.1
(gdb)
#2 0x0000000000408ae8 in Redcraft::createStructures (this=0x7fffffffd8d0) at source/redcraft.cpp:512
512 #pragma omp parallel for private(node)
Originally the pragma specified schedule(dynamic) but having that or removing that doesn't change the consistency of this hangup.
Lastly, I tried enabling/disabling omp_set_dynamic() and that had no effect either.
Any suggestions for debugging?
This usually happens when there is data race.You'll have to post the code block that is being parallelized.Basically what is to be found out is how the threads are using the data.Rerunning without compiling doesn't guarantee the same thread execution sequence hence these kind of problems arise.Are you working with files?You'll have to close them before rerunning.