Algorithm/Code to compare gdb backtraces - gdb

I need to find duplicates of a backtrace (I have in text form) in some kind of database. Ideally there should be a fast lookup method (hash based?) and a slower more acurate one.
I don't wan't a full blown crash reporting system, just the part that compares backtraces. But using code from such an (open source) system would make sense.
What I have found so far are only complex solutions:
https://wiki.ubuntu.com/Apport
https://fedorahosted.org/abrt/wiki
https://launchpad.net/bugzilla-traceparser
https://crash-stats.mozilla.com/products/Firefox
Any suggestions how this could be done?

Related

How to search for images/png/jpeg/any other types in the memory of a program and display it?

Well, lately I have found a very interesting article about Map Hacks in online games.
After reading it I read they used a memory scanner to look for images in the memory.
How would they accomplish such a program, is there a solution for this freely available?
if not how would I code it in C++? How can I know a piece of memory is an "image"?
I can load my own DLL into the process so that shouldn't be a big issue..
To answer your questions:
A memory scanner uses OS apis to query memory from another process and perform searches for patterns or differences. A great tool for this is cheat engine.
The tool mentioned in the article visualizes the memory by coloring pixels according to the value of the bytes in memory. The alignment still needs to be done manually and could be very time consuming. I don't think the mentioned program was ever released.
The main problem is that you can't know that a particular piece of memory is supposed to be a map. Any big regular structure could look like one when colorized and aligned. Finding the actual piece of memory you are looking for is very hard.
Additional Info:
A property map in a game is very dynamic. If units or something moves the visibility has to update. So the actual format of a map like this is most likely a binary bitmap with no specific image format (png,jpg,...).
I personally find the approach to look for a map structure in memory is a very inefficient and time consuming approach. It's beatuful to show to people that have no idea about reverse engineering, but to me seems very impractical. The approach which is best totally depends on the game and your creativity.
I hope I can help you with the following example how I made a map hack for starcraft 2.
My idea was to load up a replay of a game, where I had full view of the map and find the difference to loading up a normal game where my vision is restricted. I switched a couple of times between replay and normal game and could indeed find a state variable that was 0 on normal game and 1 on replay (a common tool for finding memory like this is cheat engine).
Next I loaded the game up in a debugger and put a memory access breakpoint on this state variable. Now when loading up a normal game I would change the value when it is accessed while the map was loading. Through some trial and error I was able to find the correct location that was responsible for revealing the minimap and real map. The only task left was to create a dll that detours the code location and make sure the map is always revealed on every mapload.
Reply to typ1232: You mention that it's hard and impractical to find the map structure in memory. Heres a method I have had great success with: Load up a map in any game with fog of war, like StarCraft 2. Take a dump of the memory and save it. Send out troops/units and reveal as much of the previously undiscovered map and take another memory dump. Compare the two dumps and look closer at the areas in memory where there are a high frequency of changes. This is likely to be where the map is stored.
Sorry if I'm doing it wrong, new to stackoverflow :)
This might be a bit broader answer to the subject of "finding data" but there are binary analysis tools out there. For example, ..cantor.dust.. is a binary visualization tool (though its only in beta the idea remains the same). You can search for different patterns within a memory dump for "images" or structures. Youtube cantor dust and the creator did a presentation at DerbyCon of how he used it to find EFI structures to recreate an exploit of a PNG parser at the EFI level.
I also think the saving two memory states of visible map vs limited visibility map and search for the changes is viable, if not the best option, I just am trying to point out an alternative.

Plotting progress in cplex optimization

I would like to able to plot the progress of a MIP solved by cplex. Specifically I would like to plot lower and upper bounds as functions of cpu-time. But copying an pasting from the node log does not seem to be a smartest way of proceeding. Is it possible to access these information and to print them out/to file during the optimization?
I am using the concert technology C++ interface.
You can add a "MIP info callback" using the API routine CPXsetinfocallbackfunc or its analogue in Concert.
(Copying-and-pasting the log it dumps to the terminal is perfectly fine for getting a rough idea of what's going on, but be aware that the results can be highly variable.)
Callbacks are what you are looking for. You can find a nice introduction here:
http://eaton.math.rpi.edu/cplex90html/usrcplex/callbacks.html

A tree like (graphviz) stack trace (Visualize debugging)

I was trying to find if there exists a library or tool that will allow me to visually debug my program. i.e. something that shows a graphviz like tree structure and highlights exactly where I am in the process tree at a breakpoint. This would give a faster understanding of how my process works rather than sequentially debug through and create a tree in my mind.
I found something that partially does what I am looking for, i.e. show a tree structure of my process and the number of calls made per function call
http://www.ibm.com/developerworks/library/l-graphvis/
If it doesn't exist then I might plan on writing something that does the job. Thanks
-CV
The debug visualization plugin for Eclipse sounds like something that might be helpful for you. Furthermore, the venerable Data Display Debugger also has some automatic routines for creating graphs, albeit of the data structures you are currently seeing. I also like the visualization of kcachegrind, but it is not exactly a debugging aid. However, its graphical view shows you the position in the execution tree.
Since there does not seem to be a tool that matches your requirements exactly, maybe these ones will inspire you to write your own ;)

Best approach for doing full-text search with list-of-integers documents

I'm working on a C++/Qt image retrieval system based on similarity that works as follows (I'll try to avoid irrelevant or off-topic details):
I take a collection of images and build an index from them using OpenCV functions. After that, for each image, I get a list of integer values representing important "classes" that each image belongs to. The more integers two images have in common, the more similar they are believed to be.
So, when I want to query the system, I just have to compute the list of integers representing the query image, perform a full-text search (or similar) and retrieve the X most similar images.
My question is, what's the best approach to permorm such a search?
I've heard about Lucene, Lemur and other indexing methods, but I don't know if this kind of full-text searchs are the best way, given the domain is reduced (only integers instead of words).
I'd like to know about the alternatives in terms of efficiency, accuracy or C++ friendliness.
Thanks!
It sounds to me like you have a vectorspace model, so Lucene or a similar product may work well for you. In general, an inverted-index model will be good if:
You don't know the number of classes in advance
There are a lot of classes relative to the number of images
If your problem doesn't fit these criteria, a normal relational DB might work better, as Thomas suggested. If it meets #1 but not #2, you could investigate one of the "column oriented" non-relational databases. I'm not familiar enough with these to tell you how well they would work, but my intuition is that you'll need to replicate a lot of the functionality in an IR toolkit yourself.
Lucene is written in Java and I don't know of any C++ ports. Solr exposes Lucene as a web service, so it's easy enough to access it that way from whatever language you choose.
I don't know much about Lemur, but it looks like it has a similar vectorspace model, and it's written in C++, so that might be easier for you to use.
You can take a look at Lucene for image retrieval (LIRE) here: http://www.semanticmetadata.net/2006/05/19/lire-lucene-image-retrieval-04-released/
If I'm mistaken, you are trying to implement a typical bag of words image retrieval am I correct? If so you are probably trying to build an inverted file index. Lucene on its own is not suitable as you probably have already realized as it index text instead of numbers. Using its classes for querying the index would also be a problem as it is not designed to "parse" (i.e. detect keypoints, extract descriptors then vector-quantize them) image into the query vector.
LIRE on the other hand have been modified to index feature vectors. However, it does not appear to work out of the box for bag of words model. Also, I think I've read on the author's website that it currently uses brute force matching rather than the inverted file index to retrieve the images but I would expect it to be easier to extend than Lucene itself for your purposes.
Hope this helps.

How to quickly debug when something wrong in code workflow?

I have frequently encounter the following debugging scenario:
Tester provide some reproduce steps for a bug. And to find out where the problem is, I try to play with these reproduce steps to get the minimum necessary reproduce steps. Sometimes, luckily I found that when do a minor change to the steps, the problem is gone.
Then the job turns to find the difference in code workflow between these two reproduce steps. This job is tedious and painful especially when you are working on a large code base and it go through a lot code and involve lots of state changes which you are not familiar with.
So I was wondering is there any tools available to compare "code workflow". As I've learned the "wt" command in WinDbg, I thought it might be possible to do it. For example, I can run the "wt" command on some out most functions with 2 different reproduce steps and then compare the difference between outputs. Then it should be easy to found where the code flow starts to diverge.
But the problem with WinDBG is "wt" is quite slow (maybe I should use a log file instead of output to screen) and not very user-friendly (compared with visual studio debugger) ... So I want to ask you guys is there any existing tools available . or is it possible and difficult to develop a "plug-in" for visual studio debugger to support this functionality ?
Thanks
I'd run it under a profiler in "coverage" mode, then use diff on the results to see which parts of the code were executed in one run by not the other.
Sorry, I don't know of a tool which can do what you want, but even if it existed it doesn't sound like the quickest approach to finding out where the lower layer code is failing.
I would recommend to instrument your layer's code with high-level logs so you can know which module fails, stalls, etc. In debug, your logger can write to file, to output debug window, etc.
In general, failing fast and using exceptions are good ways to find out easily where things go bad.
Doing something after the fact is not going to cut it, since your problem is reproducing it.
The issue with bugs is seldom some interal wackiness but usually what the user's actually doing. If you log all the commands that the user enters then they can simply send you the log. You can substitute button clicks, mouse selects, etc. This will have some cost but certainly much less than something that keeps track of every method visited.
I am assuming that if you have a large application that you have good logging or tracing.
I work on a large server product with over 40 processes and over one million lines of code. Most of the time the error in the trace file is enough to identify the location of problem. However sometimes the error I see in the trace file is caused by some earlier code and the reason for this can be hard to spot. Then I use a comparative debugging technique:
Reproduce the first scenario, copy the trace to a new file (if the application is multi threaded ensure you only have the trace for the thread that does the work).
Reproduce the second scenario, copy the trace to a new file.
Remove the timestamps from the log files (I use awk or sed for this).
Compare the log files with winmerge or similar, to see where and how they diverge.
This technique can be a little time consuming, but is much quicker than stepping through thousand of lines in the debugger.
Another useful technique is producing uml sequence diagrams from trace files. For this you need the function entry and exit positions logged consistently. Then write a small script to parse your trace files and use sequence.jar to produce uml diagrams as png files. This is a great way to understand the logic of code you haven't touched in a while. I wrapped a small awk script in a batch file, I just provide trace file and line number to start then it untangles the threads and generates the input text to sequence.jar then runs its to create the uml diagram.