How to get a debug flow of execution in C++ - c++

I work on a global trading system which supports many users. Each user can book,amend,edit,delete trades. The system is regulated by a central deal capture service. The deal capture service informs all the user of any updates that occur.
The problem comes when we have crashes, as the production environment is impossible to re-create on a test system, I have to rely on crash dumps and log files.
However this doesn't tell me what the user has been doing.
I'd like a system that would (at the time of crashing) dump out a history of what the user has been doing. Anything that I add has to go into the live environment so it can't impact performance too much.
Ideas wise I was thinking of a MACRO at the top of each function which acted like a stack trace (only I could supply additional user information, like trade id's, user dialog choices, etc ..) The system would record stack traces (on a per thread basis) and keep a history in a cyclic buffer (varying in size, depending on how much history you wanted to capture). Then on crash, I could dump this history stack.
I'd really like to hear if anyone has a better solution, or if anyone knows of an existing framework?
Thanks
Rich

Your solution sounds pretty reasonable, though perhaps rather than relying on viewing your audit trail in the debugger you can trigger it being printed with atexit() handlers. Something as simple as a stack of strings that have __FILE__,__LINE__,pthread_self() in them migth be good enough
You could possibly use some existing undo framework, as its similar to an audit trail, but it's going to be more heavyweight than you want. It will likely be based on the command pattern and expect you to implement execute() methods, though I suppose you could just leave them blank.

Trading systems usually don't suffer the performance hit of instrumentation of that level. C++ based systems, in particular, tend to sacrifice the ease of debugging for performance. Otherwise, more companies would be developing such systems in Java/C#.
I would avoid an attempt to introduce stack traces into C++. I am also not confident that you could introduce such a system in a way that would not affect the behavior of the program in some way (e.g., affect threading behavior).
It might, IMHO, be preferable to log the external inputs (e.g., user GUI actions and message traffic) rather than attempt to capture things internally in the program. In that case, you might have a better chance of replicating the failure and debugging it.
Are you currently logging all network traffic to/from the client? Many FIX based systems record this for regulatory purposes. Can you easily log your I/O?

I suggest creating another (circular) log file that contains your detailed information. Beware that this file will grow exponentially compared to other files.
Another method is to save the last N transactions. Write a program that reads the transaction log and feeds the data into your virtual application. This may help create the cause. I've used this technique with embedded systems before.

Related

Dynamic Linking ~ Limiting a DLL's system access

I know the question might seem a little vague but I will try to explain as clearly as I can.
In C++ there is a way to dynamically link code to your already running program. I am thinking about creating my own plugin system (For learning/research purposes) but I'd like to limit the plugins to specific system access for security purposes.
I would like to give the plugins limited access to for example disk writing such that it can only call functions from API I pass from my application (and write through my predefined interface) Is there a way to enforce this kind of behaviour from the application side?
If not: Are there other language's that support secure dynamically linked modules?
You should think of writing a plugin container (or a sand-box), then coordinate everything through the container, also make sure to drop privileges that you do not need inside the container process before running the plugin. Being run in a process means, you can run the container also as a unique user and not the one who started the process, after that you can limit the user and automatically the process will be limited. Having a dedicated user for a process is the most common and easiest way, it is also the only cross-platform way to limit a process, even on Windows you can use this method to limit a process.
Limiting access to shared resources that OS provides, like disk or RAM or CPU depends heavily on the OS, and you have not specified what OS. While it is doable on most OSes, Linux is the prime choice because it is written with multi-seat and server-use-cases in mind. For example in Linux you can use cgroups here to limit CPU, or RAM easily for each process, then you will only need to apply it for your plugin container process. There is blkio to control disk access, but you can still use the traditional quote mechanism in Linux to limit per-process or per-user share of disk space.
Supporting plugins is an involved process, and the best way to start is reading code that does some of that, Chromium sand-boxing is best place I can suggest, it is very cleanly written, and has nice documentation. Fortunately the code is not very big.
If you prefer less involvement with actual cgroups, there is an even easier mechanism for limiting resources, docker is fairly new but abstracts away low level OS constructs to easily contain applications, without the need to run them in Virtual Machines.
To block some calls, a first idea may be to hook the system calls which are forbidden and others API call which you don't want. You can also hook the dynamic linking calls to prevent your plugins to load another DLLs. Hook disk read/write API to block read/write.
Take a look at this, it may give you an idea to how can you forbid function calls.
You can also try to sandbox your plugins, try to look some open source sandbox and understand how they work. It should help you.
In this case you really have to sandbox the environment in that the DLL runs. Building such a sandbox is not easy at all, and it is something you probably do not want to do at all. System calls can be hidden in strings, or generated through meta programming at execution time, so hard to detect by just analysing the binary. Luckyly people have already build solutions. For example google's project native client with the goal to generally allow C++ code to be run safely in the browser. And when it is safe enough for a browser, it is probably safe enough for you and it might work outside of the browser.

Debugging crashes in production environments

First, I should give you a bit of context. The program in question is
a fairly typical server application implemented in C++. Across the
project, as well as in all of the underlying libraries, error
management is based on C++ exceptions.
My question is pertinent to dealing with unrecoverable errors and/or
programmer errors---the loose equivalent of "unchecked" Java
exceptions, for want of a better parallel. I am especially interested
in common practices for dealing with such conditions in production
environments.
For production environments in particular, two conflicting goals stand
out in the presence of the above class of errors: ease of debugging
and availability (in the sense of operational performance). Each of
these suggests in turn a specific strategy:
Install a top-level exception handler to absorb all uncaught
exceptions, thus ensuring continuous availability. Unfortunately,
this makes error inspection more involved, forcing the programmer to
rely on fine-grained logging or other code "instrumentation"
techniques.
Crash as hard as possible; this enables one to perform a post-mortem
analysis of the condition that led to the error via a core
dump. Naturally, one has to provide a means for the system to resume
operation in a timely manner after the crash, and this may be far
from trivial.
So I end-up with two half-baked solutions; I would like a compromise
between service availability and debugging facilities. What am I
missing ?
Note: I have flagged the question as C++ specific, as I am interested
in solutions and idiosyncrasies that apply to it particular;
nonetheless, I am aware there will be considerable overlap with other
languages/environments.
Disclaimer: Much like the OP I code for servers, thus this entire answer is focused on this specific use case. The strategy for embedded software or deployed applications should probably be widely different, no idea.
First of all, there are two important (and rather different) aspects to this question:
Easing investigation (as much as possible)
Ensuring recovery
Let us treat both separately, as dividing is conquering. And let's start by the tougher bit.
Ensuring Recovery
The main issue with C++/Java style of try/catch is that it is extremely easy to corrupt your environment because try and catch can mutate what is outside their own scope. Note: contrast to Rust and Go in which a task should not share mutable data with other tasks and a fail will kill the whole task without hope of recovery.
As a result, there are 3 recovery situations:
unrecoverable: the process memory is corrupted beyond repairs
recoverable, manually: the process can be salvaged in the top-level handler at the cost of reinitializing a substantial part of its memory (caches, ...)
recoverable, automatically: okay, once we reach the top-level handler, the process is ready to be used again
An completely unrecoverable error is best addressed by crashing. Actually, in a number of cases (such as a pointer outside your process memory), the OS will help in making it crash. Unfortunately, in some cases it won't (a dangling pointer may still point within your process memory), that's how memory corruptions happen. Oops. Valgrind, Asan, Purify, etc... are tools designed to help you catch those unfortunate errors as early as possible; the debugger will assist (somewhat) for those which make it past that stage.
An error that can be recovered, but requires manual cleanup, is annoying. You will forget to clean in some rarely hit cases. Thus it should be statically prevented. A simple transformation (moving caches inside the scope of the top-level handler) allows you to transform this into an automatically recoverable situation.
In the latter case, obviously, you can just catch, log, and resume your process, waiting for the next query. Your goal should be for this to be the only situation occurring in Production (cookie points if it does not even occur).
Easing Investigation
Note: I will take the opportunity to promote a project by Mozilla called rr which could really, really, help investigating once it matures. Check the quick note at the end of this section.
Without surprise, in order to investigate you will need data. Preferably, as much as possible, and well ordered/labelled.
There are two (practiced) ways to obtain data:
continuous logging, so that when an exception occurs, you have as much context as possible
exception logging, so that upon an exception, you log as much as possible
Logging continuously implies performance overhead and (when everything goes right) a flood of useless logs. On the other hand, exception logging implies having enough trust in the system ability to perform some actions in case of exceptions (which in case of bad_alloc... oh well).
In general, I would advise a mix of both.
Continuous Logging
Each log should contain:
a timestamp (as precise as possible)
(possibly) the server name, the process ID and thread ID
(possibly) a query/session correlator
the filename, line number and function name of where this log came from
of course, a message, which should contain dynamic information (if you have a static message, you can probably enrich it with dynamic information)
What is worth logging ?
At least I/O. All inputs, at least, and outputs can help spotting the first deviation from expected behavior. I/O include: inbound query and corresponding response, as well as interactions with other servers, databases, various local caches, timestamps (for time-related decisions), ...
The goal of such logging is to be able to reproduce the issue spotted in a control environment (which can be setup thanks to all this information). As a bonus, it can be useful as crude performance monitor since it gives some check-points during the process (note: I am talking about monitoring and not profiling for a reason, this can allow you to raise alerts and spot where, roughly, time is spent, but you will need more advanced analysis to understand why).
Exception Logging
The other option is to enrich exception. As an example of a crude exception: std::out_of_range yields the follow reason (from what): vector::_M_range_check when thrown from libstdc++'s vector.
This is pretty much useless if, like me, vector is your container of choice and therefore there are about 3,640 locations in your code where this could have been thrown.
The basics, to get a useful exception, are:
a precise message: "access to index 32 in vector of size 4" is slightly more helpful, no ?
a call stack: it requires platform specific code to retrieve it, though, but can be automatically inserted in your base exception constructor, so go for it!
Note: once you have a call-stack in your exceptions, you will quickly find yourself addicted and wrapping lesser-abled 3rd party software into an adapter layer if only to translate their exceptions into yours; we all did it ;)
On top of those basics, there is a very interesting feature of RAII: attaching notes to the current exception during unwinding. A simple handler retaining a reference to a variable and checking whether an exception is unwinding in its destructor costs only a single if check in general, and does all the important logging when unwinding (but then, exception propagation is costly already, so...).
Finally, you can also enrich and rethrow in catch clauses, but this quickly litters the code with try/catch blocks so I advise using RAII instead.
Note: there is a reason that std exceptions do NOT allocate memory, it allows throwing exceptions without the throw being itself preempted by a std::bad_alloc; I advise to consciously pick having richer exceptions in general with the potential of a std::bad_alloc thrown when attempting to create an exception (which I have yet to see happening). You have to make your own choice.
And Delayed Logging ?
The idea behind delayed logging is that instead of calling your log handler, as usual, you will instead defer logging all finer-grained traces and only get to them in case of issue (aka, exception).
The idea, therefore, is to split logging:
important information is logged immediately
finer-grained information is written to a scratch-pad, which can be called to log them in case of exception
Of course, there are questions:
the scratch pad is (mostly) lost in case of crash; you should be able to access it via your debugger if you get a memory dump though it's not as pleasant.
the scratch pad requires a policy: when to discard it ? (end of the session ? end of the transaction ? ...), how much memory ? (as much as it wants ? bounded ? ...)
what of the performance cost: even if not writing the logs to disk/network, it still cost to format them!
I have actually never used such a scratch pad, for now all non-crasher bugs that I ever had were solved solely using I/O logging and rich exceptions. Still, should I implement it I would recommend making it:
transaction local: since I/O is logged, we should not need more insight that this
memory bounded: evicting older traces as we progress
log-level driven: just as regular logging, I would want to be able to only enable some logs to get into the scratch pad
And Conditional / Probabilistic Logging ?
Writing one trace every N is not really interesting; it's actually more confusing than anything. On the other hand, logging in-depth one transaction every N can help!
The idea here is to reduce the amount of logs written, in general, whilst still getting a chance to observe bugs traces in detail in the wild. The reduction is generally driven by the logging infrastructure constraints (there is a cost to transferring and writing all those bytes) or by the performance of the software (formatting the logs slows software down).
The idea of probabilistic logging is to "flip a coin" at the start of each session/transaction to decide whether it'll be a fast one or a slow one :)
A similar idea (conditional logging) is to read a special debug field in a transaction field that initiates a full logging (at the cost of speed).
A quick note on rr
With an overhead of only 20%, and this overhead applying only on the CPU processing, it might actually be worth using rr systematically. If this is not feasible, however, it could be feasible to have 1 out of N servers being launched under rr and used to catch hard to find bugs.
This is similar to A/B testing, but for debugging purposes, and can be driven either by a willing commitment of the client (flag in the transaction) or with a probabilistic approach.
Oh, and in the general case, when you are not hunting down anything, it can be easily deactivated altogether. No sense in paying those 20% then.
That's all folks
I could apologize for the lengthy read, but the truth I probably just skimmed the topic. Error Recovery is hard. I would appreciate comments and remarks, to help improve this answer.
If the error is unrecoverable, by definition there is nothing the application can do in production environment, to recover from the error. In other words, the top-level exception handler is not really a solution. Even if the application displays a friendly message like "access violation", "possible memory corruption", etc, that doesn't actually increase availability.
When the application crashes in a production environment, you should get as much information as possible for post-mortem analysis (your second solution).
That said, if you get unrecoverable errors in a production environment, the main problems are your product QA process (it's lacking), and (much before that), writing unsafe/untested code.
When you finish investigating such a crash, you should not only fix the code, but fix your development process so that such crashes are no longer possible (i.e. if the corruption is an uninitialized pointer write, go over your code base and initialize all pointers and so on).

Detecting process memory injection on windows (anti-hack)

Standard hacking case. Hack file type injects into a started process and writes over process memory using WriteProcessMemory call. In games this is not something you would want because it can provide the hacker to change the portion of the game and give himself an advantage.
There is a possibility to force a user to run a third-party program along with the game and I would need to know what would be the best way to prevent such injection. I already tried to use a function EnumProcessModules which lists all process DLLs with no success. It seems to me that the hacks inject directly into process memory (end of stack?), therefore it is undetected. At the moment I have came down to a few options.
Create a blacklist of files, file patterns, process names and memory patterns of most known public hacks and scan them with the program. The problem with this is that I would need to maintain the blacklist and also create an update of the program to hold all avalible hacks. I also found this usefull answer Detecting memory access to a process but it could be possible that some existing DLL is already using those calls so there could be false positives.
Using ReadProcessMemory to monitor the changes in well known memory offsets (hacks usually use the same offsets to achieve something). I would need to run a few hacks, monitor the behaviour and get samples of hack behaviour when comparing to normal run.
Would it be possible to somehow rearrange the process memory after it starts? Maybe just pushing the process memory down the stack could confuse the hack.
This is an example of the hack call:
WriteProcessMemory(phandler,0xsomeoffset,&datatowrite,...);
So unless the hack is a little more smarter to search for the actual start of the process it would already be a great success. I wonder if there is a system call that could rewrite the memory to another location or somehow insert some null data in front of the stack.
So, what would be the best way to go with this? It is a really interesting and dark area of the programming so I would like to hear as much interesting ideas as possible. The goal is to either prevent the hack from working or detect it.
Best regards
Time after time compute the hash or CRC of application's image stored in memory and compare it with known hash or CRC.
Our service http://activation-cloud.com provides the ability to check integrity of application against the signature stored in database.

C++ Benchmark tool

I have some application, which makes database requests. I guess it doesn't actually matter, what kind of the database I am using, but let's say it's a simple SQLite-driven database.
Now, this application runs as a service and does some amount of requests per minute (this number might actually be huge).
I'm willing to benchmark the queries to retrieve their number, maximal / minimal / average running time for some period and I wish to design my own tool for this (obviously, there are some, but I need my own for some appropriate reasons :).
So - could you advice an approach for this task?
I guess there are several possible cases:
1) I have access to the application source code. Here, obviously, I want to make some sort of cross-application integration, probably using pipes. Could you advice something about how this should be done and (if there is one) any other possible solution?
2) I don't have sources. So, is this even possible to perform some neat injection from my application to benchmark the other one? I hope there is a way, maybe hacky, whatever.
Thanks a lot.
See C++ Code Profiler for a range of profilers.
Or C++ Logging and performance tuning library for rolling your own simple version
My answer is valid just for the case 1).
In my experience profiling it is a fun a difficult task. Using professional tools can be effective but it can take a lot of time to find the right one and learn how to use it properly. I usually start in a very simple way. I have prepared two very simple classes. The first one ProfileHelper the class populate the start time in the constructor and the end time in the destructor. The second class ProfileHelperStatistic is a container with extra statistical capability (a std::multimap + few methods to return average, standard deviation and other funny stuff).
The ProfilerHelper has an reference to the container and before exit the destructor push the data in the container.You can declare the ProfileHelperStatistic in the main and if you create on the stack ProfilerHelper at the beginning of a specific function the job is done. The constructor of the ProfileHelper will store the starting time and the destructor will push the result on the ProfileHelperStatistic.
It is fairly easy to implement and with minor modification can be implemented as cross-platform. The time to create and destroy the object are not recorded, so you will not polluted the result. Calculating the final statistic can be expensive, so I suggest you to run it once at the end.
You can also customize the information that you are going to store in ProfileHelperStatistic adding extra information (like timestamp or memory usage for example).
The implementation is fairly easy, two class that are not bigger than 50 lines each. Just two hints:
1) catch all in the destructor!
2) consider to use collection that take constant time to insert if you are going to store a lot of data.
This is a simple tool and it can help you profiling your application in a very effective way. My suggestion is to start with few macro functions (5-7 logical block) and then increase the granularity. Remember the 80-20 rule: 20% of the source code use 80% of the time.
Last note about database: database tunes the performance dynamically, if you run a query several time at the end the query will be quicker than at the beginning (Oracle does, I guess other database as well). In other word, if you test heavily and artificially the application focusing on just few specific queries you can get too optimistic results.
I guess it doesn't actually matter,
what kind of the database I am using,
but let's say it's a simple
SQLite-driven database.
It's very important what kind of database you use, because the database-manager might have integrated monitoring.
I could speak only about IBM DB/2, but I beliefe that IBM DB/2 is not the only dbm with integrated monitoring tools.
Here for example an short overview what you could monitor in IBM DB/2:
statements (all executed statements, execution count, prepare-time, cpu-time, count of reads/writes: tablerows, bufferpool, logical, physical)
tables (count of reads / writes)
bufferpools (logical and physical reads/writes for data and index, read/write times)
active connections (running statements, count of reads/writes, times)
locks (all locks and type)
and many more
Monitor-data could be accessed via SQL or API from own software, like for example DB2 Monitor does.
Under Unix, you might want to use gprof and its graphical front-end, kprof. Compile your app with the -pg flag (I assume you're using g++) and run it through gprof and observe the results.
Note, however, that this type of profiling will measure the overall performance of an application, not just SQL queries. If it's the performance of queries you want to measure, you should use special tools that are designed for your DBMS - for example, MySQL has a builtin query profiler (for SQLite, see this question: Is there a tool to profile sqlite queries? )
There is a (linux) solution you might find interesting since it could be used on both cases.
It's the LD_PRELOAD trick. It's an environment variable that let's you specify a shared library to be loaded right before your program is executed. The symbols load from this library will override any other available on the system.
The basic idea is to this custom library as a wrapper around the original functions.
There is a bunch of resources available that explain how to use this trick: 1 , 2, 3
Here, obviously, I want to make some sort of cross-application integration, probably using pipes.
I don't think that's obvious at all.
If you have access to the application, I'd suggest dumping all the necessary information to a log file and process that log file later on.
If you want to be able to activate and deactivate this behavior on-the-fly, without re-starting the service, you could use a logging library that supports enabling/disabling log channels on-the-fly.
Then you'd only need to send a message to the service by whatever means (socket connection, ...) to enable/disable logging.
If you don't have access to the application, then I think the best way would be what MacGucky suggested: let the profiling/monitoring tools of the DBMS do it. E.g. MS-SQL has a nice profiler that can capture requests to the server, including all kinds of useful data (CPU time for each request, IO time, wait time etc.).
And if it's really SQLite (plus you don't have access to the source) then your chances are rather low. If the program in question uses SQLite as a DLL, then you could substitute your own version of SQLite, modified to write the necessary log files.
Use the apache jmeter.
To test performances of your sql queries under high load

How to find performance bottlenecks in C++ code

I have a server application written in C++ and deployed in Cent OS. I haven't wrote any part of its code but i need to optimize its performance. Its current performance is acceptable for few amount of users but when the number of users increase the server's performance decrease dramatically.
Are there any tools, techniques or best practices to find out the bottlenecks?
People typically use profilers to determine performance bottlenecks. Earlier SO questions asking for C++ profilers are here and here (depending on the operating system and compiler you use). For Linux, people typically use gprof, just because it comes with the system.
You'll start by building a performance test environment if you don't have one
Production-grade hardware. If you do not have the budget for this, you may as well give up.
Driver program(s) or hardware devices which throw production-like traffic at it at a high rate - as fast or faster than production. Depending on your protocol and use-case this may be easy or difficult. One technique is to sample some requests from production and replay them - but this may be give unrealistic results as it will give higher cache hit rates.
Surrounding infrastructure as similar to production as you can reasonably get
Then reproduce the problem, as it exists in production. Once you've done that, then use a profiler etc, as others have suggested.
This works, without fail.
I like, MIke Dunlavey's answer above (so uptick his if you uptick mine)
I'd like to elaborate for someone in a hurry with two methods:
A quick way for gcc users to sample in that gstack
self inspection with SIGALRM combined with backtrace (driven by you own timer).
Just a few days ago I did something like this
# while true; do gstack $MYPID; sleep 2; done | logger $PARAMS
using PARAMS that go with my syslog routing rules so that my app logs were intermixed with stacks (not a perfect line-up with the events)
The results were on the nose, they pointed me to an area that I thought could be an issue at all but were my bottleneck due to misuse of reference in a tr1::bind
In the alarm method be careful what you do in the signal, don`t use anything that allocates memory (no cout/cerr/boost, and use just simple formats (i.e. "%08X" with printf)