Generate log after every 2 secs - c++

HI,
I have developed a library in c++ which used to keep track of new and delete operator and generate logs for the same. Now i have to add one more functionality that will generate logs for new and delete after 2 secs and everey 2 secs it will refresh the log file. so that if the main program gets core dump then also we can have some logs to track memomy allocation. Any help woild be appreciated.
thanx in advance.

Just write to some buffer and store timestamp of last dump to disk, if more then 2 seconds ago dump the buffer again and reset timestamp. But if you want this log for debug even when a crash occurs I assume you can lose vital information in those 2 seconds, maybe you could write every new/delete without the 2 second delay when running in debug mode.

For a computer program, 2 seconds is an awfully long time and a lot of allocations/deallocations can happen in that time that don't get logged if the main program crashes.
A better alternative would be to log information about every allocation and deallocation to some persistent storage (a file for example). This might result in a huge amount of data being logged, so you should only enable/activate the feature when debugging a potentially memory-related problem, but it has the advantage that a core dump does not cause you to lose that much information (at most one buffer worth, if you are using buffered IO) and you can let some off-line analysis tools loose on the logs to locate potential problems for you (or just to filter out the majority of obviously correct allocation/deallocation pairs).

There are tools for verifying memory access correctness, such as valgrind. Have you looked at them yet? If not, you should--if you plan to do this sort of logging for every single allocation, it's going to slow your program down a lot, just like those already-written tools will do. If you need something a lot more precision-targeted, then maybe writing your own is a good idea.

Related

Extreme performance difference, when reading same files a second time with C

I have to read binary data into char-arrays from large (2GB) binary files in a C++ program. When reading the files for the first time from my SSD, reading takes about 6.4 seconds per file. But when running the same code again or even after running a different dummy-program, which does almost the same before, the next readings take only about 1.4 seconds per file. The Windows Task Manager even shows much less disk-activity on the second, third, fourth… run. So, my guess is Window’s File Caching is sparing me from waiting for data from the SSD, when filling the arrays another time.
Is there any clean option to read the files into file cache before the customer runs the software? Any better option than just already loading the files with fread in advance? And how can I make sure, the data remains in the File Cache until I need it?
Or am I totally wrong with my File Cache assumption? Is there another (better) explanation for these different loading times?
Educated guess here:
You most likely are right with your file cache assumption.
Can you pre load files before the user runs the software?
Not directly. How would your program be supposed to know that it is going to be run in the next few minutes?
So you probably need a helper mechanism or tricks.
The options I see here are:
Indexing mechanisms to provide a faster and better aimed access to your data. This is helpful if you only need small chunks of information from these data at once.
Attempt to parallelize the loading of the data, so even if it does not really get faster, the user has the impression it does because he can start working already with the data he has, while the rest is fetched in the background.
Have a helper tool starting up with the OS and pre-fetching everything, so you already have it in memory when required. Caution: This has serious implications since you reserve either a large chunk of RAM or even SSD-cache (depending on implementation) for your tool from the start. Only consider doing this if the alternative is the apocalypse…
You can also try to combine the first two options. The key to a faster data availability is to figure out what to read in which order instead of trying to load everything at once en-bloc. Divide and Conquer.
Without further details on the problem it is impossible to provide more specific solutions though.

Profiling a multiprocess system

I have a system that i need to profile.
It is comprised of tens of processes, mostly c++, some comprised of several threads, that communicate to the network and to one another though various system calls.
I know there are performance bottlenecks sometimes, but no one has put in the time/effort to check where they are: they may be in userspace code, inefficient use of syscalls, or something else.
What would be the best way to approach profiling a system like this?
I have thought of the following strategy:
Manually logging the roundtrip times of various code sequences (for example processing an incoming packet or a cli command) and seeing which process takes the largest time. After that, profiling that process, fixing the problem and repeating.
This method seems sorta hacky and guess-worky. I dont like it.
How would you suggest to approach this problem?
Are there tools that would help me out (multi-process profiler?)?
What im looking for is more of a strategy than just specific tools.
Should i profile every process separately and look for problems? if so how do i approach this?
Do i try and isolate the problematic processes and go from there? if so, how do i isolate them?
Are there other options?
I don't think there is a single answer to this sort of question. And every type of issue has it's own problems and solutions.
Generally, the first step is to figure out WHERE in the big system is the time spent. Is it CPU-bound or I/O-bound?
If the problem is CPU-bound, a system-wide profiling tool can be useful to determine where in the system the time is spent - the next question is of course whether that time is actually necessary or not, and no automated tool can tell the difference between a badly written piece of code that does a million completely useless processing steps, and one that does a matrix multiplication with a million elements very efficiently - it takes the same amount of CPU-time to do both, but one isn't actually achieving anything. However, knowing which program takes most of the time in a multiprogram system can be a good starting point for figuring out IF that code is well written, or can be improved.
If the system is I/O bound, such as network or disk I/O, then there are tools for analysing disk and network traffic that can help. But again, expecting the tool to point out what packet response or disk access time you should expect is a different matter - if you contact google to search for "kerflerp", or if you contact your local webserver that is a meter away, will have a dramatic impact on the time for a reasonable response.
There are lots of other issues - running two pieces of code in parallel that uses LOTS of memory can cause both to run slower than if they are run in sequence - because the high memory usage causes swapping, or because the OS isn't able to use spare memory for caching file-I/O, for example.
On the other hand, two or more simple processes that use very little memory will benefit quite a lot from running in parallel on a multiprocessor system.
Adding logging to your applications such that you can see WHERE it is spending time is another method that works reasonably well. Particularly if you KNOW what the use-case is where it takes time.
If you have a use-case where you know "this should take no more than X seconds", running regular pre- or post-commit test to check that the code is behaving as expected, and no-one added a lot of code to slow it down would also be a useful thing.

Why does my logging library cause performance tests to run faster?

I have spent the past year developing a logging library in C++ with performance in mind. To evaluate performance I developed a set of benchmarks to compare my code with other libraries, including a base case that performs no logging at all.
In my last benchmark I measure the total running time of a CPU-intensive task while logging is active and when it is not. I can then compare the time to determine how much overhead my library has. This bar chart shows the difference compared to my non-logging base case.
As you can see, my library ("reckless") adds negative overhead (unless all 4 CPU cores are busy). The program runs about half a second faster when logging is enabled than when it is disabled.
I know I should try to isolate this down to a simpler case rather than asking about a 4000-line program. But there are so many venues for what to remove, and without a hypothesis I will just make the problem go away when I try to isolate it. I could probably spend another year just doing this. I'm hoping that the collective expertise of Stack Overflow will make this a much more shallow problem or that the cause will be obvious to someone who has more experience than me.
Some facts about my library and the benchmarks:
The library consists of a front-end API that pushes the log arguments onto a lockless queue (Boost.Lockless) and a back-end thread that performs string formatting and writes the log entries to disk.
The timing is based on simply calling std::chrono::steady_clock::now() at the beginning and end of the program, and printing the difference.
The benchmark is run on a 4-core Intel CPU (i7-3770K).
The benchmark program computes a 1024x1024 Mandelbrot fractal and logs statistics about each pixel, i.e. it writes about one million log entries.
The total running time is about 35 seconds for the single worker-thread case. So the speed increase is about 1.5%.
The benchmark produces an output file (this is not part of the timed code) that contains the generated Mandelbrot fractal. I have verified that the same output is produced when logging is on and off.
The benchmark is run 100 times (with all the benchmarked libraries, this takes about 10 hours). The bar chart shows the average time and the error bars show the interquartile range.
Source code for the Mandelbrot computation
Source code for the benchmark.
Root of the code repository and documentation.
My question is, how can I explain the apparent speed increase when my logging library is enabled?
Edit: This was solved after trying the suggestions given in comments. My log object is created on line 24 of the benchmark test. Apparently when LOG_INIT() touches the log object it triggers a page fault that causes some or all pages of the image buffer to be mapped to physical memory. I'm still not sure why this improves the performance by almost half a second; even without the log object, the first thing that happens in the mandelbrot_thread() function is a write to the bottom of the image buffer, which should have a similar effect. But, in any case, clearing the buffer with a memset() before starting the benchmark makes everything more sane. Current benchmarks are here
Other things that I tried are:
Run it with the oprofile profiler. I was never able to get it to register any time in the locks, even after enlarging the job to make it run for about 10 minutes. Almost all the time was in the inner loop of the Mandelbrot computation. But maybe I would be able to interpret them differently now that I know about the page faults. I didn't think to check whether the image write was taking a disproportionate amount of time.
Removing the locks. This did have a significant effect on performance, but results were still weird and anyway I couldn't do the change in any of the multithreaded variants.
Compare the generated assembly code. There were differences but the logging build was clearly doing more things. Nothing stood out as being an obvious performance killer.
When uninitialised memory is first accessed, page faults will affect timing.
So, before your first call to, std::chrono::steady_clock::now(), initialise the memory by running memset() on your sample_buffer.

Is it better to write data to a file in real time, or at program shutdown?

I'm developing an MMORPG and just want to hear the two different arguments between storing information (in this case game logs) to a file.
Would it be better to output the information as it gets created into a file (we're talking possibly 10-30 entries every couple seconds, or store it in memory until the program is shutting down, and then dump all the information into a file before it terminates.
I can see arguments for both ends mainly being if we keep it in memory, it will begin to take up a lot of resources on the computers end, and if we wait until the program shuts down and for whatever reason the program shuts down via a crash and the information isn't saved this would be quite bad in an MMORPG stand point... As well on the other side, I'm sure writing to a file every that often can't be very efficient either...
I'm doing all this in C++.
It may make sense to cache a few log messages, but storing it all until shutdown is a no-go.
First, MMOs tend to run for hours, and in a game you don't have that kind of memory to waste.
Second, and more importantly, logs are important for debugging. If your game crashes, you want those logs. If you cache them in memory and your game crashes, they're gone.

Finding memory consumed using core file

I am analysing high memory consumption problem in our software. I have a core file corresponding to this high memory consumption(this core file is generated by killing our application which generates core file). But I am not able to view the actual memory consumption using this core file. I used Totalview and gdb...using these two I am not getting a snapshot of the total memory consumed by my process and which library is eating up all the memory.
This memory consumption is hitting us over 10 to 20 days of time and hence I am trying to find out what has caused this high memory consumption.
Can valgrind help me in analysing this core file?
Any input/suggestion is highly appreciated.
#suresh,
Hi, I'm the product manager for TotalView at Rogue Wave Software.
Can you describe the scenario a bit more? Is the program running along with "normal memory consumption" for a long time and then suddenly the memory consumption goes through the roof? Or is the program slowly and steadily consuming memory till it exhausts available resources?
When it is crashing is it crashing because it literally runs out of memory or are you killing it because it has started swapping and being unresponsive?
In general I'd recommend running it under MemoryScape (in TotalView or the Standalone version) and when it starts to show unexpected memory consumption you want to pause it and run a memory leak report. It is likely that this will point right to the problem.
It is possible that the memory use isn't a "classical" leak because you still have pointers referencing the data -- but you are simply over-allocating. In this case you won't see anything on the leak report, but you may be able to pick out which allocation is "gone bad" by watching which allocations are growing. There are a number of ways to do this.
Pause periodically and note how the heap usage breaks down in the Heap Status Source Code report. For example you may note that the number of allocations associated with a specific source code file just keeps increasing.
If you are using TotalView you can use the "set heap baseline" functionality when the program seems be behaving well then filter against this baseline. Again you may want to use the source code report (though the graphical or backtrace reports support filtering too).
Or you can use the "export memory data" feature to store an image of what the "normal" heap status is. This creates a binary heap status file. Then let the program run till you get into the state where your program has high memory consumption. At that point you pause your live app, load the stored heap data file and you can do a comparison.
Wow, this is getting long. One final thought. You said you are getting core files. Under the debugger you should get a chance to examine the running program before it gets cleaned up. If this doesn't happen let me know. If you really want to work via core files (for example, this is happening in a production environment and you don't want to run the debugger there) let us know -- there are techniques where we can instrument the application using the HIA and then enable you to do offline analysis of your heap status.
Good luck!
Chris Gotbrath
Principal Product Manager for TotalView and ThreadSpotter, Rogue Wave Software
email: first . last at roguewave . com