Instrumentation (diagnostic) library for C++

Instrumentation (diagnostic) library for C++ - c++

I'm thinking about adding code to my application that would gather diagnostic information for later examination. Is there any C++ library created for such purpose? What I'm trying to do is similar to profiling, but it's not the same, because gathered data will be used more for debugging than profiling.
EDIT:
Platform: Linux
Diagnostic information to gather: information resulting from application logic, various asserts and statistics.

You might also want to check out libcwd:
Libcwd is a thread-safe, full-featured debugging support library for C++
developers. It includes ostream-based debug output with custom debug
channels and devices, powerful memory allocation debugging support, as well
as run-time support for printing source file:line number information
and demangled type names.
List of features
Tutorial
Quick Reference
Reference Manual
Also, another interesting logging library is pantheios:
Pantheios is an Open Source C/C++ Logging API library, offering an
optimal combination of 100% type-safety, efficiency, genericity
and extensibility. It is simple to use and extend, highly-portable (platform
and compiler-independent) and, best of all, it upholds the C tradition of you
only pay for what you use.

I tend to use logging for this purpose. Log4cxx works like a charm.

If debugging is what you're doing, perhaps use a debugger. GDB scripts are pretty easy to write up and use. Maintaining them in parallel to your code might be challenging.
Edit - Appending Annecdote:
The software I maintain includes a home-grown instrumentation system. Macros are used to queue log messages and configuration options control what classes of messages are logged and the level of detail to be logged. A thread processes the logging queue, flushing messages to file and rotating files as they become too large (which they commonly do). The system provides a lot of detail, but often all too often it provides huge files our support engineers must wade through for hours to find anything useful.
Now, I've only used GDB to diagnose bugs a few times, but for those issues it had a few nice advantages over the logging system. GDB scripting allowed me to gather new instrumentation data without adding new instrumentation lines and deploying a new build of my software to the client. GDB can generate messages from third-party libraries (needed to debug into openssl at one point). GDB adds no run-time impact to the software when not in use. GDB does a pretty good job of printing the contents of objects; the code-level logging system requires new macros to be written when new objects need to have their states logged.
One of the drawbacks was that the gdb scripts I generated had no explicit relationship to the source code; the source file and the gdb script were developed independently. Ideally, changes to the source file should impact and update the gdb script. One thought is to put specially-formatted comments in code and have a scripting language make a pass on the source files to generate the debugger script file for the source file. Finally, have the makefile execute this script during the build cycle.
It's a fun exercise to think about the potential of using GDB for this purpose, but I must admit that there are probably better code-level solutions out there.

If you execute your application in Linux, you can use "ulimit" to generate a core when your application crash (or assert(false), or kill -6 ), later, you can debug with gdb (gdb -c core_file binary_file) and analyze the stack.
Salu2.
PD. for profiling, use gprof

Related

Best practice to build C++ program with optimization while keeping debugability in production env

I am writing a C++ server program that will be deployed to *nix systems (Linux/macOS). The program sometimes runs into segfault in production environment, and I would like to collect some information like core dump file. But I have no idea what the best practice is for doing this:
I would like to make the program perform best.
I would like to analyze the core dump in production env offline if it really happens.
I learn that there are some things I could try:
There is RelWithDebInfo and Release for CMAKE_BUILD_TYPE, but it seems they have different optimization level. So I assume RelWithDebInfo build performs not as well as Release build. (“RelWithDebInfo uses -O2, but Release uses -O3“ according to here What are CMAKE_BUILD_TYPE: Debug, Release, RelWithDebInfo and MinSizeRel?)
Tools like objectcopy/strip allow you to strip debug information from a binary (How to generate gcc debug symbol outside the build target?)
Printing stack trace when handling SIGSEGV signal (How to automatically generate a stacktrace when my program crashes)
I am new to deploy a production C++ server program, and I would like to know the answers for the following questions:
What is the recommended build type to use in this case, RelWithDebInfo or Release?
Compared to choosing different build type, when do I need to use tools like strip?
If I create a Release build binary for production deployment, when the Release build generates a core dump in production environment, can I later use the using the same revision of source code to build a RelWithDebInfo binary and use (gdb + RelWithDebInfo binary + Release build core dump) for core dump analysis?
Is it common to turn on core dump in production env? If it is not a good practice, what could be the recommended approach for collecting information for troubleshooting, printing the stack trace when crashing?
In general, I would like to know how C++ programs is recommended to be build for production, allowing it to be best optimized while I am still able to troubleshoot it. Thanks so much.

This will be a rather general answer.
If you are still having reliability issues then go with RelWithDebInfo. Alternatively, you can override -O2 optimization to get compiler to optimize all the way.
You need to strip when debug info is present. This doesn't change any of the stuff that actually gets executed, but removes things that make debugging much easier. You can still debug stripped executables, but it is more difficult to understand what is going on.
No, due to different optimization levels. If the only difference between the two was that one was stripped and the other not, then yes. But with different optimization levels the resulting assembly will actually be different.
Enabling core dumps in production environments is usually advised against, mostly for security reasons. For example, core dump may contain plain-text passwords, session tokens, etc. These are all things that others should not see. If you have total control of the machine where this is running then this concern is somewhat smaller. Another concern is disk space usage. Core dump can be huge, depending on what your program is doing. If you have fixed core file name then at least there will enver be more than one file, but if you have a name that includes timestamp and/or PID, then you can have multiple files, each of them taking lots (meaning n GB) of space. This can again lead to problems.
The general rule is (or should be) that you consider release environment as hostile. Sometimes that is true, sometimes it is not -- here generality can't apply because only you know your specific situation.
I always deploy my stuff fully optimized. I will only include debug info if a program is particularly problematic, because it makes it easy to either run it with or attach to it using gdb.
The downside of full optimization is that things sometimes look a bit different than the code you wrote. Order of things may change, some things may not happen at all, and you may observe that some functions don't really exist as proper standalone functions because compiler decides they are better of being inlined. These are changes I observed, but there are probably others as well.

Recently I learn that there are tools like Google Breakpad which will generate crash report with minidump format that could be collected and analyzed in production environment. I don't give it a try yet but it could be useful for this exact purpose.

How to Prevent I/O Access in C++ or Native Compiled Code

I know this may be impossible but I really hope there's a way to pull it off. Please tell me if there's any way.
I want to write a sandbox application in C++ and allow other developers to write native plugins that can be loaded right into the application on the fly. I'd probably want to do this via DLLs on Windows, but I also want to support Linux and hopefully Mac.
My issue is that I want to be able to prevent the plugins from doing I/O access on their own. I want to require them to use my wrapped routines so that I can ensure none of the plugins write malicious code that starts harming the user's files on disk or doing things undesireable on the network.
My best guess on how to pull off something like this would be to include a compiler with the application and require the source code for the plugins to be distributed and compiled right on the end-user platform. Then I'd need an code scanner that could search the plugin uncompiled code for signatures that would show up in I/O operations for hard disk or network or other storage media.
My understanding is that the STD libaries like fstream wrap platform-specific functions so I would think that simply scanning all the code that will be compiled for platform-specific functions would let me accomplish the task. Because ultimately, any C native code can't do any I/O unless it talks to the OS using one of the OS's provided methods, right??
If my line of thinking is correct on this, does anyone have a book or resource recommendation on where I could find the nuts and bolts of this stuff for Windows, Linux, and Mac?
If my line of thinking is incorrect and its impossible for me to really prevent native code (compiled or uncompiled) from doing I/O operations on its own, please tell me so I don't create an application that I think is secure but really isn't.
In an absolutely ideal world, I don't want to require the plugins to distribute uncompiled code. I'd like to allow the developers to compile and keep their code to themselves. Perhaps I could scan the binaries for signatures that pertain to I/O access????

Sandboxing a program executing code is certainly harder than merely scanning the code for specific accesses! For example, the program could synthesize assembler statements doing system calls.
The original approach on UNIXes is to chroot() the program but I think there are problems with that approach, too. Another approach is a secured environment like selinux, possible combined with chroot(). The modern approach used to do things like that seems to run the program in a virtual machine: upon start of the program fire up a suitable snapshot of a VM. Upon termination just rewind to tbe snaphot. That merely requires that the allowed accesses are somehow channeled somewhere.

Even a VM doesn't block I/O. It can block network traffic very easily though.
If you want to make sure the plugin doesn't do I/O you can scan it's DLL for all it's import functions and run the function list against a blacklist of I/O functions.
Windows has the dumpbin util and Linux has nm. Both can be run via a system() function call and the output of the tools be directed to files.
Of course, you can write your own analyzer but it's much harder.
User code can't do I/O on it's own. Only the kernel. If youre worried about the plugin gaining ring0/kernel privileges than you need to scan the ASM of the DLL for I/O instructions.

Adding source instrumentation code - Is source-to-source compiler right approach? How to build one?

I am working on a project where I need to track changes to particular set of variables in any given application code to model memory access patterns.
I can think of two approaches mainly, please give your thoughts on them.
My initial thought is to do it like many profilers like gprof would do, where I add instrumentation code in the target application code before compilation and analyze the log generated by this instrumentation code to the get required information.
To accomplish, I can only think of some sort of source-to-source compiler where it parses given code and injects instrumentation code (Same language source-source compiler) into application which I can later compile and run to get the required logs.
Does this seem right or am I over-engineering? If not, are there tools that let me build a source-source compiler (relatively) easily?
I read about GDB's support for python, so, I am thinking if I can write a python script to get set of variables as config file and set watchpoints and log everytime there is a write to variables being watched. I tried to use this GDB feature but on my Ubuntu machine it doesn't seem to be working for now.
http://sourceware.org/gdb/onlinedocs/gdb/Python.html#Python
And, the language of applications is going to be nesC (I guess nesC is converted to C in the process of compilation) (and applications are going to run on TOSSIM like native apps on my computer).

See my paper on instrumenting codes using a program transformation systems (PTS) (PTS is a very general kind of "source-to-source compiler).
It shows how to install probes in code in a pretty straightforward way, once you have a grammar for the language of interest. The underlying tool, DMS, makes it fairly easy to define the grammar too.

Difference Between GUI debugger and terminal debuggers

What are some advantages to a GUI debugger like in Eclipse and what are some advantages to using a command line debugger such as gdb? Does industry use command line debuggers? and if so, what situations do people use command line debuggers?

I usually use gdb, but some advantages I can think of off the top of my head:
Being command line, debugging binaries on remote systems is as easy as opening an ssh connection.
Great scripting support, and the ability to run many commands per breakpoint (See the continue keyword)
Much shorter start-up time and a faster development cycle.
Copy&pastable commands and definable functions that let you repeat common commands easier
gdb also speaks a well-defined protocol, so you can debug code running on lots of obscure hardware and kernels.
Typing short commands is shorter and more efficient in the long run than working around a GUI (in my opinion).
However, if you're next to a system or runtime you've never used before, using a visual debugger can be easier to get started from the get-go. Also, having your debugger be tightly integrated with your IDE (if you use one) can be a big boost in productivity.
Visual debugger and command line ones don't have to be completely separate, there are visual front ends for gdb, such as DDD. (I don't use DDD however since it feels ultra kludgy and outdated. It does exist though. XCode also wraps gdb for debugging support)

Command line debugger is good for debugging a remote system (especially when the connection is slow), it is also useful for low performance systems or systems without Xserver/graphic card. CLI debuggers are also used for quick analysis or core dump and SIGSEGVs (they are faster to start). Command-line debuggers are more portable, they are installed almost on every system (or them can be easily installed, or even started from network/flash drive)
I think that command-line can be used for programs without source, and the graphical debuggers are better for projects with complex data structures/classes.
Another situation is that command-line debuggers easier to automatize, e.g. I have a shell script, which do a full call graph logging of program using gdb. It will be very hard to automate a graphic debugger.

It's essentially impossible to compare meaningfully based on the debugger's display. People who like command lines are likely to use text mode, command-driven debuggers. People who like GUIs are likely to use graphical, menu-driven debuggers.
Nearly the only time there's a really strong technical motivation toward one or the other is if you're debugging a windowing system. For example, using a debugger that depends on a having a functional X Server doesn't work very well if what you're trying to debug is the X Server itself.

Finding very similar program executions

I was wondering if its possible / anyone knows any tools out there to compare the execution of two related programs (for example, assignments on a class) to see how similar they are. For example, not to compare the names of functions, but how they use syscalls. One silly case of this would be testing if a C string is printed as (see example below) in more than one case one separate program.
printf("%s",str)
Or as
for (i=0;i<len;i++) printf("%c",str[i]);
I haven´t put much thought into this, but i would imagine that strace / ltrace (maybe even oprofile) would be a good starting point. Particularly, this is for UNIX C / C++ programs.
Thanks.

If you have access to the source code of the two programs, you may build a graph of the functions (each function is a node, and there is an edge from A to B if A calls B()), and compute some graph similarity metrics. This will catch a source code copy made by renaming and reorganizing.

An initial idea would be to use ltrace and strace to log the calls and then use diff on the logs. This would obviously only cover the library an system calls. If you need a more fine granular logging, the oprofile might help.
If you have access to the source code you could instrument your code by compiling it with profiling information and then parse the gcov output after the runs. A pure static source code analysis may be sufficient if your code is not taking different routes depending on external data/state.

I think you can do this kind of thing using valgrind.
A finer-grained version (and depending on what is the access to the program source and what you exactly want in terms of comparison) would be to use kprobes.
Kernel Dynamic Probes (Kprobes) provides a lightweight interface for kernel modules to implant probes and register corresponding probe handlers. A probe is an automated breakpoint that is implanted dynamically in executing (kernel-space) modules without the need to modify their underlying source. Probes are intended to be used as an ad hoc service aid where minimal disruption to the system is required. They are particularly advocated in production environments where the use of interactive debuggers is undesirable. Kprobes also has substantial applicability in test and development environments. During test, faults may be injected or simulated by the probing module. In development, debugging code (for example a printk) may be easily inserted without having to recompile to module under test.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js