Calling C function from DTrace scripts - dtrace

DTrace is impressive, powerful tracing system originally from Solaris, but it is ported to FreeBSD and Mac OSX.
DTrace uses a high-level language called D not unlike AWK or C. Here is an example:
io:::start
/pid == $1/
{
printf("file %s offset %d size %d block %llu\n", args[2]->fi_pathname,
args[2]->fi_offset, args[0]->b_bcount, args[0]->b_blkno);
}
Using the command line sudo dtrace -q -s <name>.d <pid> all IOs originated from that process are logged.
My question is if and how it is possible to call custom C functions from a DTrace script to do advanced operations with that tracing data during the tracing itself.

DTrace explicity prevents you from doing anything like this for the same reason that you cannot write a loop in D: if you screw it up in any way, shape, or form, you crash the entire system. When a D probe fires, you are in KERNEL mode, not userland. Let me quote from the "Linux Kernel Module Programming Guide:"
So, you want to write a kernel module. You know C, you've written a number of
normal programs to run as processes, and now you want to get to where the real
action is, to where a single wild pointer can wipe out your file system and a
core dump means a reboot.
That's why you don't want to be playing cowboy in a D probe and why D's restrictions are good for you. =]

you should be able to atleast filter the output of dtrace after each probe fires with pipes.
sudo dtrace -n 'proc:::exec-success { trace(curpsinfo->pr_psargs); }' | perl myscript.pl
myscript.pl: #!/usr/bin/perl
while (<>){
print $_;
print "another application launched, do something!";
}

It's not possible to call arbitrary C from inside of your probes for the reasons that #Sniggerfardimungus mentions, but presumably you just want to do some operations with the data that's being collected (store it in a database / make some calculations or visualizations with it / etc), and that's totally possible from C (and via wrappers around C in a few other languages).
To do this, use libdtrace (the header is in /usr/include/dtrace.h on my Mac OS X box) or a wrapper for it such as node-libdtrace. The basic idea is that you can build your own consumer of DTrace data (in effect, replacing the dtrace(1m) command line tool), which receives output from whatever script is being run. Once you have the data, you can do whatever you want with it.

Too late to edit my original answer, but you can also use the system() command inside of a DTrace script to spawn a subprocess that runs arbitrary code when an event occurs in DTrace. This is a potentially destructive action, so you have to use the -w command line option or #pragma D option destructive directive in the D script. Note that destructive actions can hang, infinite loop, kill, and otherwise destroy the processes you're probing if you don't use them carefully. (And I would not recommend using the kernel destructive actions unless you really don't care if your system goes down when you mess it up accidentally.)
You could use the script run by system() to call your arbitrary C code (or send a signal to another process to invoke it, etc).

Related

C++ execute many commands in shell

I have a C++ program from which I want to execute multiple commands in a shell.
My current solution use the system() function and looks like this:
return_value = system(SETUP_ENVIRONMENT; RUN_USEFUL_APP_1);
... do_something_else ...
return_value = system(SETUP_ENVIRONMENT; RUN_USEFUL_APP_2);
... do_something_else ...
return_value = system(SETUP_ENVIRONMENT; RUN_USEFUL_APP_3);
...
It works, but SETUP_ENVIRONMENT takes a few seconds making the program really slow. But I have to run it every time since system() runs in a new shell each time.
I want to be able to setup my shell once and then run all commands in it.
execute_in_shell(SETUP_ENVIRONMENT);
return_value = execute_in_shell(RUN_USEFUL_APP_1);
... do_something_else ...
return_value = execute_in_shell(RUN_USEFUL_APP_2);
... do_something_else ...
return_value = execute_in_shell(RUN_USEFUL_APP_3);
...
How do I do that?
I'm on Linux.
Alternatively to answer 1, you could also use your program to create a shell script which will run all your useful programs and execute this script at once. Then the shell won't be started each time for each particular useful program.
You have three reasonable options for doing this, depending on your specific need.
If the various calls you make to external tools are part of coherent routine, then you can – and probably should – follow #dmi's advice and write a short shell script that you can call from your C++ program.
If you instead need to start procedures here and there, you might be interested into running the shell as an inferior process and attaching your program to it – so that instead of talking with your terminal, the shell process talks to your C++ program.
This method is not very difficult but has a few gotchas (for instance, some programs like ssh, sudo or docker may expect to be attached to a tty). It is very well covered in most introductions to system programming (look for inter process communication and subprocesses) for any Unix variant. Let me outline that procedure:
use the pipe system call to create pipes (stdin_r, stdin_w)
use the pipe system call to create pipes (stdout_r, stdout_w)
use the pipe system call to create pipes (stderr_r, stderr_w)
use the fork system call to duplicate your program
In the child, you close stdin_w, stdout_r, stderr_r, and use the
exec system call parametrised by stdin_r, stdout_w, stderr_w to
run the shell.
In the parent, you close stdin_r, stdout_w, stderr_w, and you
can now write commands in stdin_w, and read command output from
stdout_r and stderr_r.
(This intentionally very sketchy, I included the outline only so that you are sure you found the right place in your favourite textbook).
There are third party libraries implementing all that low-level stuff for you. You can use boost::process (which is not yet an official part of boost now) whose usage is illustrated with a full tutorial. There are plenty of alternatives such as pstreams.
The third option would be to avoid using the shell and executing directly shell commands you use. This is the approach followed by Rashell, an OCaml library defining primitives allowing to reliably compose sub-processes, which you can use for your own inspiration.

How can I determine whether the current process has a UI open?

I am writing some library code which will rerun the current process as an administrator/root. The problem is (for linux at least) that if the calling code is a command line application the best way would be to call sudo whereas if it is a gui application, gksudo is appropriate. For completeness sake though, solutions (or pointers to solutions) for other os's are also welcome.
Also, this is useful so that for GUI apps, I can turn off printf statements.
I'd run gksu(do) if the environment variable DISPLAY is set. It doesn't really matter if the application has a GUI or not, if there's an X-server running, and we can use it, why shouldn't we?
Doesn't allow you to determine if you should disable stdout output. However, stderr output is generally captured in .xsession-errors, even if no terminal is connected, so you might not want to disable that output afterall.

Access data from terminal

I have to write a program that intercepts data from terminal and i have to parse it. After processing when the data, i have to parse it before it goes to stdout.
I can't use tee or commands like prog > file 2>&1 as the program is going to be interactive.
For example :
If the user types ls in the terminal i have to parse it then it should go operating system and then when I get the result after processing I ll have to again parse it before it's displayed in the terminal.
I did my research and I think I can achieve it through pseudo terminal interfaces ( pty ).
Please let me know if there is a better way to achieve it.
I am using cpp and bash and the platform is *nix.
Update:
I can also use libexpect from expect.
I am not sure what do you mean here - you mean interactive program as "working in another terminal communicating with user" or even displaying GUI?
How does it specify the terminal? It is probably important what is program layout here (which program starts which).
If your application uses GUI to communicate with user, then I would simply do it this way:
start bash with sdtin and stdout attached to pipes,
your program reads & writes to it's end's of those pipes, parses data, and reads/writes on it's own stdin&stdout - so it appears on it's terminal.
If you mean controlling different terminal than your application's, it gets though since system generally does not expect program operating on multiple terminals. I don't think it's possible to filter communication between terminal and already working application attached to it. Starting another process spawning another terminal might be an option - to have basically two terminals working in sync. But then you will have to synchronize both processes by some other means (named pipes, network connection or some other IPC).
If you provide more detail on your program I might provide more directed help.
PS Don't tell me that you are writing some terminal keylogger ')
EDIT:
Your program is probably GUI based then - what i would recommend would be something similar to answer linked by banuj.
Best option will probably be to create three pipes, then fork, and in child process assign corresponding ends of pipes to stdin, stdout and stderr. Then child process should exec into shell - probably bash, although I am not sure if other shells would sound better if read out loud ;) Main process will be able to read/write other ends of mentioned pipes, parsing both inputs and outputs to bash and programs it runs.
You could also exec directly to commands user specifies, but that forces you to take over tedious job of a shell - managing current directory, environment variables, job control and so on.
Using above method might however cause some trouble - some programs (usually in security related contexts - eg. su(do) asking for password) will try to bypass stdin/stdout anyway and read directly from terminal device. I am not sure what can you do in such case - programing your own terminal emulator would be an option, but I don't know if you want to go this deep into system programming for this.
If you want some code snippet's, if you don't know how to do above, just ask ;)

Is stdout Ever Anything Other Than a Console Window?

From http://www.cplusplus.com/reference/iostream/cout/:
By default, most systems have their standard output set to the console, where text messages are shown, although this can generally be redirected.
I've never heard of a system where stdout is anything other than a console window, by default or otherwise. I can see how redirecting it might be beneficial in systems where printing is an expensive operation, but that shouldn't be an issue in modern computers, right?
Of course it could be. I may want to redirect standard out to a text file, another process, a socket, whatever.
By default it is the Console, but the are a variety of reasons to redirect it, the most useful (in step with the Unix philosophy) being the redirection of the output of one program to the input of another program. This allows one to create many small, lightweight programs that feed into one another and work as discrete parts of a larger system.
Basically, it's just a simple yet powerful mechanism for sharing data. It is more popular on *nix systems for the reason I mention above, but it applies to Windows as well.
On most systems you can redirect the standard input/output/error to other file descriptors or locations.
For example (on Unix):
./appname > output
Redirects the stdout from appname to a file named output.
./appname 2> errors > output
Redirects stdout to a file named output, and all errors from stderr to a file named errors.
On unix systems you can also have a program open a file descriptor and point it at stdin, such as this:
echo "input" > input
cat input | ./appname
This will cause the program to read from the pipe for stdin.
This is how in unix you can "pipe" various different utilities together to create one larger tool.
find . -type f | ./appname | grep -iv "search"
This will run the find command, and take its output and pipe it into ./appname, then appname's output will be sent to grep's input which then searches for the word "search", displaying just the results that match.
It allows many small utilities to have a very powerful effect.
Think of the >, <, and | like plumbing.
> is like the drain in a sink, it accepts data and stores it where you want to put it. When a shell encounters the > it will open a file.
> file
When the shell sees the above, it will open the file using a standard system call, and remember that file descriptor. In the above case since there is no input it will create an empty file and allow you to type more commands.
banner Hello
This command writes Hello in really big letters to the console, and will cause it to scroll (I am using Unix here since it is what I know best). The output is simply written to standard out. Using a "sink" (>) we can control where the output goes, so
banner Hello > bannerout
will cause all of the data from banner's standard output to be redirected to the file descriptor the shell has opened and thus be written to a file named bannerout.
Pipes work similarly to >'s in that they help control the flow of where the data goes. Pipes however can't write to files, and can only be used to help the flow of data go from one point to another.
For example, here is water flowing through several substations and waste cleaning:
pump --from lake | treatment --cleanse-water | pump | reservoir | pump > glass
The water flows from the lake, through a pipe to the water treatment plant, from the plant back into a pump that moves it to a reservoir, then it is pumped once more into the municipal water pipes and through your sink into your glass.
Notice that the pipes simply connect all of the outputs together, ultimately it ends up in your glass.
It is the same way with commands and processing them in a shell on Linux. It also follows a path to get to an end result.
Now there is one final thing that I hadn't discussed yet in my previous statements, that is the < input character. What it does is read from a file and output it to stdin on programs.
cat < bannerout
Will simply print what was stored in bannerout. This can be used if you have a file you want to process, but don't want to prepend cat <file> because of not wanting to run an extra command in the chain.
So try this:
echo "Hello" > bannerinput
banner < bannerinput
This will first put the string "Hello" in the file bannerinput, and then when your run banner it will read from the file bannerinput.
I hope this helps you understand how redirection and pipping works on Unix (some if not most will apply to Windows as well).
So far all of the answers have been in the context of the thing (shell, whatever) that invokes the program. The program itself can make stdout something other than the terminal. The C standard library provides freopen which lets the programmer redirect stdout in any compliant environment. POSIX provides a number of other mechanisms (popen, fdopen, ...) that gives the programmer even more control. I suspect Windows provides analogous mechanisms.
Any number of things can happen to the three standard file descriptors 0, 1 and 2. Anyone can launch a new process with the file descriptors attached to anything they like.
For instance, GNU screen puts the output into a pipe and allows dynamic reattaching of a session. SSH takes the output and returns it to the other end. And of course all the numerous shell redirectors regularly make use of manipulating the file descriptors.
For a program to have stdout it must be running on a hosted implementation (one with an Operating System), or on a free-standing implementation with extras.
I'm having a hard time figuring such an implementation without a console of some kind, but let's suppose for a moment that Mars Rover has a full OS and is programmed in C (or C++) and doesn't have that console
/* 2001-07-15: JPL: stdout is the headquarters */
puts("Help. I'm stuck.");
might have sent the message to NASA headquarters.
Both Windows and Linux will redirect stdout to a file if you run the program like this:
my_program > some_file
This is the most common case, but many other types of redirection are possible. On Linux, you can redirect stdout to anything that supports the "file descriptor" interface, such as a pipe, socket, file, and various other things.
One simple example of a case where one might want to redirect stdout is when passing the information to another program. The Unix/Linux command ps generates a list of processes that belong to the current user. If this list was long and you wanted to search for a particular process, you could enter
ps | grep thing
which would redirect the stdout of ps to the stdin of grep thing.

Communication with a script from a C++ program

I have a c++ program (very complicated, and lengthy both in code and execution time).
Once in a while this program stops and calls a user-specified shell script.
Before calling the script, my program creates a .out file with current data. I call the script via system() command. The script then reads the .out file, and creates its own script.out file and exits.
Then the system() function call ends, and my program reads and parses the script.out file.
Question: is there a better way to execute communication between my c++ program and a random shell script?
My intent is to have full communication between the two. Script could virtually "ask" the program "What data do you have right now?" and the program would reply with some strict convention. Then the script could say "Add this data...", or "delete all your previous data" etc.etc.
The reason I need this is because the shell script tells the program to modify its data. The exact data that was put in the original .out file. So after the modification is done -- the actual data held by the program does not correspond to the data written in the .out file.
Thanks!
P.S.
I swear I've searched around, but everyone suggests an intermediate file.
There are certainly ways to do that without intermediate files. The most common approach is to use command line arguments for input, and pipes for standard output; others also use pipes for input. The most straight-forward alternative to system then is to use popen.
On a unix-like system? Perhaps pipe (2) will work for you?
From the man page (Mac OS X 10.5 version):
SYNOPSIS
#include <unistd.h>
int pipe(int fildes[2]);
DESCRIPTION
The pipe() function creates a pipe (an object that allows unidirectional
data flow) and allocates a pair of file descriptors. The first descrip-
tor connects to the read end of the pipe; the second connects to the
write end.
You will, of course, have to follow the creation of the pipes with a fork and exec pair. Probably this has already been answered in detail, and now you know what to search on...
It's been a while since I did this, but:
In the main process, before forking the sub-process you call pipe twice. Now you have two pipes and control both ends of both of them.
You fork.
The main process will read from one pipe and write from the other. It doesn't matter which is which, but you need to be clear about this.
The child process will call one of the exec family of function to replace it's image with that of the shell you want to run but first you will use dup2 to replace it's standard input and output with the ends of the two pipes (again, this is where you need to be clear about which pipe is which).
At his point you have two processes, the main process can send things into one pipe ad they will be received on the standard input of the script, and anything the script writes to it's standard output will be sent up the other pipe to the controlling process. So they take turns, just like interacting with the shell.
You can use pipes or (maybe more convenient) sockets - for example frontends to gdb, or expect do that. It would require changes to your shell scripts, and switching from system() to more low-level fork() and exec().
It's rather complicated so please, be more specific about your environment and what you need to clarify.
You are asking the question on Interprocess Communication (IPC).
There are a lot of ways to do that. You can do a simply search and Internet will return you most answers.
If I am not wrong, Google chrome uses a technique called Named Pipe.
Anyway, I think the most "portable way" is probably a file. But if you know you are working on which operating system, you can definitely use most of the IPC techniques.