Why arg0 could indicate it is a kernel stack when? - dtrace

What is the difference between arg0 and arg1 when using dtrace.
dtrace -n 'profile-997 /arg0/ { #[stack()] = count() }
dtrace -n 'profile-997 /arg1/ { #[ustack()] = count() }
For example, the two scripts above record number of each stack. I just wonder why arg0 indicate it is a kernel stack while arg1 a user space stack.

That's by design.
The profile provider receives two arguments, arg0 is set to the PC in case the CPU is running in kernel mode while arg1 has the PC in case the CPU is running in userland. The other argument is set to zero so that makes a very simple way to detect the CPU state.

Related

Shared memory "Too many open files" but ipcs doesn't show many allocations

I'm writing unit tests for code which creates shared memory.
I only have a couple of tests. I make 4 allocations of shared memory and then it fails on the fifth.
After calling shmat() perror() says Too many open files:
template <typename T>
bool Attach(T** ptr, const key_type& key)
{
// shmemId was 262151
int32_t shmemId = shmget( key.key( ), ( size_t )0, 0644 );
if (shmemId < 0)
{
perror("Error: ");
return false;
}
*ptr = ( T * ) shmat(shmemId, 0, 0 );
if ( ( int64_t ) * ptr < 0 )
{
// Problem is here. perror() says 'Too many open files'
perror( "Error: ");
return false;
}
return true;
}
However, when I check ipcs -m -p I only have a couple of shared memory allocations.
T ID KEY MODE OWNER GROUP CPID LPID
Shared Memory:
m 262151 0x0000a028 --rw-r--r-- 3229 0
m 262152 0x0000a029 --rw-r--r-- 3229 0
In addition, when I check my OS shared memory limits sysctl -A | grep shm I get:
kern.sysv.shmall: 1024
kern.sysv.shmmax: 4194304
kern.sysv.shmmin: 1
kern.sysv.shmmni: 32
kern.sysv.shmseg: 8
security.mac.posixshm_enforce: 1
security.mac.sysvshm_enforce: 1
Are these variables large enough/are they the cause/what values should I have?
I'm sure I edited the file to increase them and restarted machine but perhaps it hasn't accepted (this is on Mac/OSX).
Your problem may be elsewhere.
Edit: This may be a shmmni limit of macOS. See below.
When I run your [simplified] code on my system (linux), the shmget fails.
You didn't specify IPC_CREAT to the third argument. If another process has created the segment, this may be okay.
But, it doesn't/shouldn't like a size of 0. The [linux] man page states that it returns an error (errno set to EINVAL) if the size is less than SHMMIN (which is 1).
That is what happened on my system. So, I adjusted the code to use a size of 1.
This was done [as I mentioned] on linux.
macOS may allow a size of 0, even if that doesn't make practical sense. (e.g.) It may round it up to a page size.
For shmat, it returns (void *) -1.
But, some systems can have valid addresses that have the high bit set. (e.g.) 0xFFE0000000000000 is a valid address, but would fail your if test because casting that to int64_t will test negative.
Better to do:
if ((int64_t) *ptr == (int64_t) -1)
Or [possibly better]:
if ((void *) *ptr == (void *) -1)
Note that errno is not set/changed if the call succeeds.
To verify this, do: errno = 0 before the shmat call. If perror says "Success", then the shmat is okay. And, your current test needs to be adjusted as above--I'd do that change regardless.
You could also do (e.g):
printf("ptr=%p\n",*ptr);
Normally, errno starts as 0.
Note that there are some differences between macOS and linux.
So, if errno is ever set to "too many open files", this can be because the process has too many open files (EMFILE).
It might be because the system-wide limit is reached (ENFILE) but that is "file table overflow".
Note that under linux shmat can not generate EMFILE. However, it appears that under macOS it can.
However, if the number of calls to shmat is limited [as you mention], the shmat should succeed.
The macOS man page is a little vague as to what the limit is based on. However, I checked the FreeBSD man page for shmat and that says it is limited by the sysctl parameter: kern.ipc.shmseg. Your grep should have caught that [if applicable].
It is possible some other syscall elsewhere in the code is opening too many files. And, that syscall is not checking the error return.
Again, I realize you're running macOS.
But, if available, you may want to try your program under linux. For example, it has much larger limits from the sysctl:
kernel.shm_next_id = -1
kernel.shm_rmid_forced = 0
kernel.shmall = 18446744073692774399
kernel.shmmax = 18446744073692774399
kernel.shmmni = 4096
vm.hugetlb_shm_group = 0
Note that shmmni is the system-wide maximum number of shared memory segments.
Note that for macOS, shmmni is 32 (vs. 4096 for linux)!?!?
That means that the entire system can only have 32 open shared memory segments for any/all processes???
That seems very low. You can probably set this to a larger number and see if that helps.
Linux has the strace program and you could use it to monitor the syscalls.
But, macOS has dtruss: How to trace system calls of a program in Mac OS X?

Is it possible the argument for frame is passed as an argument to another frame in backtrace even when data types are different

In my application which is deployed on vxWorks following backtrace (generated on xrtc file) is generated:
d475c (Frame 1) process_host_invoke+274: d3bac (ffffff00, 3e8)
d3cac (Frame 2) stop_feature_monitor+904: receive_invoke_reply ()
d0318 (Frame 3) receive_invoke_reply+7c : wait_reply_msg (3e8, 7950818, ffffff00)
cc6a4 (Frame 4) wait_reply_msg+98 : semTake (ffffff00, 3e8)
859690 (Frame 5) semTake +28 : vxWorks_semTake (ffffff00, 3e8)
Following are the prototype of methods invoked above:
void process_host_invoke(X *param);
int wait_reply_msg(uchar **msg,int* size,int wait_tick );
STATUS semTake(
SEM_ID semId, /* semaphore ID to take */
int timeout /* timeout in ticks */
);
Is it possible that argument of type uchar **msg from frame 3 is passed on to frame 5 as an argument even when the data types are different.
I am new to vxWorks environment. Is it possible that function arguments are passed incorrectly on stack.
FFFFFF00 (H)--> 4,294,967,040‬ this value is passed as SEM_ID in semTake‬
I got S_objLib_OBJ_ID_ERROR when semTake is invoked.
PS: I don't have dump file
Is it possible that function arguments are passed incorrectly on stack.
Yes, definitely possible. I've seen such behavior for task trace when working with old version of VxWorks running on MIPS32 processors. MIPS calling convetions doesn't require putting function arguments on stack, they are passed through registers and may not appear on stack at all. Not sure if it was the reason, but function call parameters were never accurate in traces for me.

How to run record instruction-history and function-call-history in GDB?

(EDIT: per the first answer below the current "trick" seems to be using an Atom processor. But I hope some gdb guru can answer if this is a fundamental limitation, or whether there adding support for other processors is on the roadmap?)
Reverse execution seems to be working in my environment: I can reverse-continue, see a plausible record log, and move around within it:
(gdb) start
...Temporary breakpoint 5 at 0x8048460: file bang.cpp, line 13.
Starting program: /home/thomasg/temp/./bang
Temporary breakpoint 5, main () at bang.cpp:13
13 f(1000);
(gdb) record
(gdb) continue
Continuing.
Breakpoint 3, f (d=900) at bang.cpp:5
5 if(d) {
(gdb) info record
Active record target: record-full
Record mode:
Lowest recorded instruction number is 1.
Highest recorded instruction number is 1005.
Log contains 1005 instructions.
Max logged instructions is 200000.
(gdb) reverse-continue
Continuing.
Breakpoint 3, f (d=901) at bang.cpp:5
5 if(d) {
(gdb) record goto end
Go forward to insn number 1005
#0 f (d=900) at bang.cpp:5
5 if(d) {
However the instruction and function histories aren't available:
(gdb) record instruction-history
You can't do that when your target is `record-full'
(gdb) record function-call-history
You can't do that when your target is `record-full'
And the only target type available is full, the other documented type "btrace" fails with "Target does not support branch tracing."
So quite possibly it just isn't supported for this target, but as it's a mainstream modern one (gdb 7.6.1-ubuntu, on amd64 Linux Mint "Petra" running an "Intel(R) Core(TM) i5-3570") I'm hoping that I've overlooked a crucial step or config?
It seems that there is no other solution except a CPU that supports it.
More precisely, your kernel has to support Intel Processor Tracing (Intel PT). This can be checked in Linux with:
grep intel_pt /proc/cpuinfo
See also: https://unix.stackexchange.com/questions/43539/what-do-the-flags-in-proc-cpuinfo-mean
The commands only works in record btrace mode.
In the GDB source commit beab5d9, it is nat/linux-btrace.c:kernel_supports_pt that checks if we can enter btrace. The following checks are carried out:
check if /sys/bus/event_source/devices/intel_pt/type exists and read the type
do a syscall (SYS_perf_event_open, &attr, child, -1, -1, 0); with the read type, and see if it returns >=0. TODO: why not use the C wrapper?
The first check fails for me: the file does not exist.
Kernel side
cd into the kernel 4.1 source and:
git grep '"intel_pt"'
we find arch/x86/kernel/cpu/perf_event_intel_pt.c which sets up that file. In particular, it does:
if (!test_cpu_cap(&boot_cpu_data, X86_FEATURE_INTEL_PT))
goto fail;
so intel_pt is a pre-requisite.
How I've found kernel_supports_pt
First grep for:
git grep 'Target does not support branch tracing.'
which leads us to btrace.c:btrace_enable. After a quick debug with:
gdb -q -ex start -ex 'b btrace_enable' -ex c --args /home/ciro/git/binutils-gdb/install/bin/gdb --batch -ex start -ex 'record btrace' ./hello_world.out
Virtual box does not support it either: Extract execution log from gdb record in a VirtualBox VM
Intel SDE
Intel SDE 7.21 already has this CPU feature, checked with:
./sde64 -- cpuid | grep 'Intel processor trace'
But I'm not sure if the Linux kernel can be run on it: https://superuser.com/questions/950992/how-to-run-the-linux-kernel-on-intel-software-development-emulator-sde
Other GDB methods
More generic questions, with less efficient software solutions:
call graph: List of all function calls made in an application
instruction trace: Displaying each assembly instruction executed in gdb
At least a partial answer (for the "am I doing it wrong" aspect) - from gdb-7.6.50.20140108/gdb/NEWS
* A new record target "record-btrace" has been added. The new target
uses hardware support to record the control-flow of a process. It
does not support replaying the execution, but it implements the
below new commands for investigating the recorded execution log.
This new recording method can be enabled using:
record btrace
The "record-btrace" target is only available on Intel Atom processors
and requires a Linux kernel 2.6.32 or later.
* Two new commands have been added for record/replay to give information
about the recorded execution without having to replay the execution.
The commands are only supported by "record btrace".
record instruction-history prints the execution history at
instruction granularity
record function-call-history prints the execution history at
function granularity
It's not often that I envy the owner of an Atom processor ;-)
I'll edit the question to refocus upon the question of workarounds or plans for future support.

Linux - Detecting idleness

I need to detect when a computer is idle for a certain time period. My definition of idleness is:
No users logged in, either by remote methods or on the local machine
X server inactivity, with no movement of mouse or key presses
TTY keyboard inactivity (hopefully)
Since the majority of distros have now moved to logind, I should be able to use its DBUS interface to find out if users are logged in, and also to monitor logins/logouts. I have used xautolock to detect X idleness before, and I could continue using that, but xscreensaver is also available. Preferably however I want to move away from any specific dependencies like the screensaver due to different desktop environments using different components.
Ideally, I would also be able to base idleness on TTY keyboard inactivity, however this isn't my biggest concern. According to this answer, I should be able to directly query the /dev/input/* interfaces, however I have no clue how to go about this.
My previous attempts at making such a monitor have used Bash, due to the ease of changing a plain text script file, howver I am happy using C++ in case more advanced methods are required to accomplish this.
From a purely shell standpoint (since you tagged this bash), you can get really close to what you want.
#!/bin/sh
users_are_logged_in() {
who |grep -q .
return $?
}
x_is_blanked() {
local DISPLAY=:0
if xscreensaver-command -time |grep -q 'screen blanked'; then
return 0 # we found a blanked xscreensaver: return true
fi
# no blanked xscreensaver. Look for DPMS modes
xset -q |awk '
/DPMS is Enabled/ { dpms = 1 } # DPMS is enabled
/Monitor is On$/ { monitor = 1 } # The monitor is on
END { if(dpms && !monitor) { exit 0 } else { exit 1 } }'
return $? # true when DPMS is enabled and the monitor is not on
}
nobody_here() {
! users_are_logged_in && x_is_blanked
return $?
}
if nobody_here; then
sleep 2m
if nobody_here; then
# ...
fi
fi
This assumes that a user can log in in two minutes and that otherwise, there is no TTY keyboard activity.
You should verify that the who |grep works on your system (i.e. no headers). I had originally grepped for / but then it won't work on FreeBSD. If who has headers, maybe try [ $(who |grep -c .) -gt 1 ] which will tell you that the number of lines that who outputs is more than one.
I share your worry about the screensaver part; xscreensaver likely isn't running in the login manager (any other form of X would involve a user logged in, which who would detect), e.g. GDM uses gnome-screensaver, whose syntax would be slightly different. The DPMS part may be good enough, giving a far larger buffer for graphical logins than the two minutes for console login.
Using return $? in the last line of a function is redundant. I used it to clarify that we're actually using the return value from the previous line. nobody_here short circuits, so if no users are logged in, there is no need to run the more expensive check for the status of X.
Side note: Be careful about using the term "idle" as it more typically refers to resource (hardware, that is) consumption (e.g. CPU load). See the uptime command for load averages for the most common way of determining system (resource) idleness. (This is why I named my function nobody_here instead of e.g. is_idle)

Efficient variable watching in C/C++

I'm currently writing a multi-threaded, high efficient and scalable algorithm. Because I have to guess a parameter for the code and I'm not sure how the calculation performs on a specific data set, I would like to watch a variable. The test only works with a real world, huge data set. It is possible to analyze the collected data after profiling. Imagine the following, simple code example (real code can contain multiple watch points:
// function get's called by loops of multiple threads
void payload(data_t* data, double threshold) {
double value = calc(data);
// here I want to watch the value
if (value < threshold) {
doSomething(data);
} else {
doSomethingElse(data);
}
}
I thought about the following approaches:
Using cout or other system outputs
Use a binary output (file, network)
Set a breakpoint via gdb/lldb
Use variable watching + logging via gdb/lldb
I'm not happy with the results because: To use 1. and 2. I have to change the code, but this is a debugging/evaluating task. Furthermore 1. requires locking and 1.+2. requires I/O operations, which heavily slows down the entire code and makes testing with real data nearly impossible. 3. is also too slow. To use 4., I have to know the variable address because it's not a global variable, but because threads get created by a dynamic scheduler, this would require breaking + stepping for each thread.
So my conclusion is, that I need a profiler/debugger that works at machine code level and dumps/logs/watches the variable without double->string conversion and is highly efficient, or to sum up with other words: I would like to profile the internal state of my algorithm without heavy slow-down and without doing deep modification. Does anybody know a tool that is able to this?
OK, this took some time but now I'm able to present a solution for my problem. It's called tracepoints. Instead of breaking the program every time, it's more lightweight and (ideally) doesn't change performance/timing too much. It does not require code changes. Here is an explanation how to use them using gdb:
Make sure you compiled your program with debugging symbols (using the -g flag). Now, start the gdb server and provide a network port (e.g. 10000) and the program arguments:
gdbserver :10000 ./program --parameters you --want --to use
Now, switch to a second console and start gdb (program parameters are not required here):
gdb ./program
All following commands are entered in the gdb command line interface. So let's connect to the server:
target remote :10000
After you got the connection confirmation, use trace or ftrace to set a tracepoint to a specific source location (try ftrace first, it should be faster but doesn't work on all platforms):
trace source.c:127
This should create tracepoint #1. Now you can setup an action for this tracepoint. Here I want to collect the data from myVariable
action 1
collect myVariable
end
If expect much data or want to use the data later (after restart), you can set a binary trace file:
tsave trace.bin
Now, start tracing and run the program:
tstart
continue
You can wait for program exit or interrupt your program using CTRL-C (still on gdb console, not on server side). Continue by telling gdb that you want to stop tracing:
tstop
Now we come the tricky part and I'm not really happy with the following code because it's really slow:
set pagination off
set logging file trace.txt
tfind start
while ($trace_frame != -1)
set logging on
printf "%f\n", myVariable
set logging off
tfind
end
This dumps all variable data to a text file. You can add some filter or preparation here. Now you're done and you can exit gdb. This will also shutdown the server:
quit
For detailed documentation especially for explanation of filtering and more advanced tracepoint positions, you can visit the following document: http://sourceware.org/gdb/onlinedocs/gdb/Tracepoints.html
To isolate trace file writing from your program execution, you can use cgroups or another network connected computer. When using another computer, you have to add the host to the port information (e.g. 192.168.1.37:10000). To load a binary trace file later, just start gdb as shown above (forget the server) and change the target:
gdb ./program
target tfile trace.bin
you can set hardware watchpoint using gdb debugger, for example if you have
bool b;
variable and you want to be notified every time the value of it has chenged (by any thread)
you would declare a watchpoint like this:
(gdb) watch *(bool*)0x7fffffffe344
example:
root#comp:~# gdb prog
GNU gdb (GDB) 7.5-ubuntu
Copyright ...
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /dist/Debug/GNU-Linux-x86/cppapp_socket5_ipaddresses...done.
(gdb) watch *(bool*)0x7fffffffe344
Hardware watchpoint 1: *(bool*)0x7fffffffe344
(gdb) start
Temporary breakpoint 2 at 0x40079f: file main.cpp, line 26.
Starting program: /dist/Debug/GNU-Linux-x86/cppapp_socket5_ipaddresses
Hardware watchpoint 1: *(bool*)0x7fffffffe344
Old value = true
New value = false
main () at main.cpp:50
50 if (strcmp(mask, "255.0.0.0") != 0) {
(gdb) c
Continuing.
Hardware watchpoint 1: *(bool*)0x7fffffffe344
Old value = false
New value = true
main () at main.cpp:41
41 if (ifa ->ifa_addr->sa_family == AF_INET) { // check it is IP4
(gdb) c
Continuing.
mask:255.255.255.0
eth0 IP Address 192.168.1.5
[Inferior 1 (process 18146) exited normally]
(gdb) q