Writing an MCVE (minimal source code that reproduces an error) automatically? - c++

When I want to ask a question on e.g. stackoverflow I usually have to post the source code.
The problem is, I am using quite a big custom framework, classes structure etc. and the problem-related parts may be localized in many places (sometimes it's very hard to detect which parts of code are important for question). I cannot post the full source code (it would be too big to read efficiently).
For that reason I usually make an effort to write an minimal code (usually in one main.cpp instead of tons of classes) that reproduces the problem.
I wonder - is it possible to automate that process?
The typical things to do here is to replace methods/functions' calls with their bodies, merge files into one .cpp, remove all the "not called" methods & classes, unused variables etc.

The real difficulty here is telling the difference between "it doesn't do what I want", "the bug went away because I removed the essential code" and the case of "it now crashes because I removed something important". And really, hitting the delete key after marking some code "don't need it" is the easy part.
Finding out what is essential to show a problem is the hard part, and it's very difficult to automate this - because it is necessary to understand the difference between what the code should do and what it actually does. Just randomly removing code will not work, because the "new" code may be broken because you removed some essential step, not because you remove unused crud - only humans [that understand the problem] can do that.
Consider this:
Object* something;
void Initialize()
{
something = new Object(1, 2, 3);
}
int main()
{
Initialize();
// Some more code, some of which SOMETIMES sets something = NULL.
something->doStuff(); // Will crash if object is NULL.
}
If we remove Initialize, the code will fail every time, not just every third time. But it's because the Object has not been initialized, rather than the bug in the code [which may be that we should add if (something) before something->doStuff(), or because we shouldn't set it to NULL in the "some more code", so "don't do that"].
When I work on a tricky problem, especially at work where we have test systems that automatically produce code for testing different functionality under different conditions, my first step is to make [or take some existing] code a "small standalone test", which is small and simple, and only does "what is necessary", rather than trying to reduce many thousands of lines of complex code that does lots of extra stuff.
For some projects, there are tools around that helps with identifying "which bit of the code is the problem", for example [bugpoint][1] that finds which "pass" in LLVM is guily of causing a crash.
If you have a version control system that supports this you can "bisect" code to come up with the version that introduced a particular fault [at least sometimes]. However, I had a case at work where some of my code "apparently broke things", but it turns out that some other code was "broken all the time since a long time back" because the other code was not clearing a pointer field that the API manual says should be set to NULL, and my code was inspecting the pointer to find out if it was pointing at the right type thing - which goes horribly wrong when the value is "whatever happens to be in that part of the stack", so it is not NULL and not a valid pointer. I just added the code that made this bug apparent rather than hiding itself.
[1] http://llvm.org/docs/CommandGuide/bugpoint.html

Related

C++ What could cause a line of code in one place to change behavior in an unrelated function?

Consider this mock-up of my situation.
in an external header:
class ThirdPartyObject
{
...
}
my code: (spread among a few headers and source files)
class ThirdPartyObjectWrapper
{
private:
ThirdPartyObject myObject;
}
class Owner
{
public:
Owner() {}
void initialize();
private:
ThirdPartyObjectWrapper myWrappedObject;
};
void Owner::initialize()
{
//not weird:
//ThirdPartyObjectWrapper testWrappedObject;
//weird:
//ThirdPartyObject testObject;
}
ThirdPartyObject is, naturally, an object defined by a third party (static precompiled) library I'm using. ThirdPartyObjectWrapper is a convenience class that eliminates a lot of boiler-plating for working with ThirdPartyObject. Owner::initialize() is called shortly after an instance of Owner is created.
Notice the two lines I have labeled as "weird" and "not weird" in Owner::initialize(). All I'm doing here is creating a couple of objects on the stack with their default constructors. I don't do anything with those objects and they get destroyed when they leave scope. There are no build or linker errors involved, I can uncomment either or both lines and the code will build.
However, if I uncomment "weird" then I get a segmentation fault, and (here's why I say it's weird) it's in a completely unrelated location. Not in the constructor of testObject, like you might expect, but in the constructor of Owner::myObjectWrapper::myObject. The weird line never even gets called, but somehow its presence or absence consistently changes the behavior of an unrelated function in a static library.
And consider that if I only uncomment "not weird" then it runs fine, executing the ThirdPartyObject constructor twice with no problems.
I've been working with C++ for a year so it's not really a surprise to me that something like this would be able happen, but I've about reached the limit of my ability to figure out how this gotcha is happening. I need the input of people with significantly more C++ experience than me.
What are some possibilities that could cause this to happen? What might be going on here?
Also, note, I'm not asking for advice on how to get rid of the segfault. Segfaults I understand, I suspect it's a simple race condition. What I don't understand is the behavior gotcha so that's the only thing I'm trying to get answers for.
My best lead is that it has to do with headers and macros. The third party library actually already has a couple of gotchas having to do with its headers and macros, for example the code won't build if you put your #include's in the wrong order. I'm not changing any #include's so strictly this still wouldn't make sense, but perhaps the compiler is optimizing includes based on the presence of a symbol here? (it would be the only mention of ThirdPartyObject in the file)
It also occurs to me that because I am using Qt, it could be that the Meta-Object Compiler (which generates supplementary code between compilations) might be involved in this. Very unlikely, as Qt has no knowledge of the third party library where the segfault is happening and this is not actually relevant to the functionality of the MOC (since at no point ThirdPartyObject is being passed as an argument), but it's worth investigating at least.
Related questions have suggested that it could be a relatively small buffer overflow or race condition that gets tripped up by compiler optimizations. Continuing to investigate but all leads are welcome.
Typical culprits:
Some build products are stale and not binary-compatible.
You have a memory bug that has corrupted the state of your process, and are seeing a manifestation of that in a completely unrelated location.
Fixing #1 is trivial: delete the build folder and build again. If you're not building in a shadow build folder, you've set yourself up for failure, hopefully you now know enough to stop :)
Fixing #2 is not trivial. View manual memory management and possible buffer overflows with suspicion. Use modern C++ programming techniques to leverage the compiler to help you out: store things by value, use containers, use smart pointers, and use iterators and range-for instead of pointers. Don't use C-style arrays. Abhor C-style APIs of the (Type * array, int count) kind - they don't belong in C++.
What fun. I've boiled this down to the bottom.
//#include <otherthirdpartyheader.h>
#include <thirdpartyobject.h>
int main(...)
{
ThirdPartyObject test;
return 0;
}
This code runs. If I uncomment the first include, delete all build artifacts, and build again, then it breaks. There's obviously a header/macro component, and probably some kind of compiler-optimization component. But, get this, according to the library documentation it should give me a segfault every time because I haven't been doing a required initialization step. So the fact that it runs at all indicates unexpected behavior.
I'm chalking this up to library-specific issues rather than broad spectrum C++ issues. I'll be contacting the vendor going forward from here, but thanks everyone for the help.

Tools to refactor names of types, functions and variables?

struct Foo{
Bar get(){
}
}
auto f = Foo();
f.get();
For example you decide that get was a very poor choice for a name but you have already used it in many different files and manually changing ever occurrence is very annoying.
You also can't really make a global substitution because other types may also have a method called get.
Is there anything for D to help refactor names for types, functions, variables etc?
Here's how I do it:
Change the name in the definition
Recompile
Go to the first error line reported and replace old with new
Goto 2
That's semi-manual, but I find it to be pretty easy and it goes quickly because the compiler error message will bring you right to where you need to be, and most editors can read those error messages well enough to dump you on the correct line, then it is a simple matter of telling it to repeat the last replacement again. (In my vim setup with my hotkeys, I hit F4 for next error message, then dot for repeat last change until it is done. Even a function with a hundred uses can be changed reliably* in a couple minutes.)
You could probably write a script that handles 90% of cases automatically too by just looking for ": Error: " in the compiler's output, extracting the file/line number, and running a plain text replace there. If the word shows up only once and outside a string literal, you can automatically replace it, and if not, ask the user to handle the remaining 10% of cases manually.
But I think it is easy enough to do with my editor hotkeys that I've never bothered trying to script it.
The one case this doesn't catch is if there's another function with the same name that might still compile. That should never happen if you do this change in isolation, because an ambiguous name wouldn't compile without it.
In that case, you could probably do a three-step compiler-assisted change:
Make sure your code compiles before. Then add #disable to the thing you want to rename.
Compile. Every place it complains about it being unusable for being disabled, do the find/replace.
Remove #disable and rename the definition. Recompile again to make sure there's nothing you missed like child classes (the compiler will then complain "method foo does not override any function" so they stand right out too.
So yeah, it isn't fully automated, but just changing it and having the compiler errors help find what's left is good enough for me.
Some limited refactoring support can be found in major IDE plugins like Mono-D or VisualD. I remember that Brian Schott had plans to add similar functionality to his dfix tool by adding dependency on dsymbol but it doesn't seem implemented yet.
Not, however, that all such options are indeed of a very limited robustness right now. This is because figuring out the fully qualified name of any given symbol is very complex task in D, one that requires full semantics analysis to be done 100% correctly. Think about local imports, templates, function overloading, mixins and how it all affects identifying the symbol.
In the long run it is quite certain that we need to wait before reference D compiler frontend becomes available as a library to implement such refactoring tool in clean and truly reliable way.
A good find all feature can be better than a bad refactoring which, as mentioned previously, requires semantic.
Personally I have a find all feature in Coedit which displays the context of a match and works on all the project sources.
It's fast to process the results.

Xcode C++ Unit testing with global variable

I've got a problem when unit testing my program.
The problem is simple but i'm not sure why this is not working.
1 -> i build all my program
2 -> i build my unitTest
3 -> the test is running.
All is ok when it is not about getting global data from the data segment. It seems as if the variable are not initialized / or simply found. So of course all my tests become wrong.
My question is:
Is it totally wrong to build an executable, then running the test on it? Or should i must compile all my code + the unit test in the same time, and then running it? Or is it just a lack of SenTesting framework?
I forgot to mention that this is a C++ const string. Dunno if that change something.
*EDIT***
My assumption was wrong, but i still don't understand the magic beyond! Seems a C++ magic hoydi hoo?
char cstring[] = "***";
std::string cppString = "***";
NSString* nstring = #"***";
- (void)testSync{
STAssertNotNil(nstring, nil); // fine
STAssertNotNil((id)strlen(bbb), nil); // fine
STAssertNotNil((id)cppString.size(), nil); // failed
}
EDIT 2**
Actually this is normal that the C++ is not initialized at this part of the code. If i do a nm on my executable, it appears that my C and Obj-C global are put into the dataSegment. I thought my C++ string was in the same case, but it is actually put into the bss segment. That's means it's uninitialized. The fact is the C++ compiler do some magic beyond and the C++ string is initialized after the main() call and act like if it were into the dataSegment.
I didn't know that testSuit doesn't have main() call, so the C++ object are never initialized. There is some technique in order to call the .ctor before the testSuit. But i am too lazy too explain and it's some kind of topic. I have just replaced my C++ string with a simple char array, and it work perfectly since my value are now POD.
By the way there is no devil in global variable if they are just read-only. ;)
OK, I can see a few faults here.
First of all, this code gives errors on my environment (Xcode 5) and for good reasons (with ARC enabled). I don't know how you got the thing to compile. The reason is that you are casting an integer (or long) to an object, and this will result in many errors, as it is normally an invalid operation. So, the real question is not why the third "assert" failed, but why the second one succeeded.
As far as the second part of your question is concerned, I have to admit that I do not completely understand your question, and you may have to explain it more thoroughly.
In general, unit testing is testing specific parts of your code. Therefore, you typically don't perform the tests on an actual final executable (this is not called unit testing, I believe), nor do you have to compile "all your c++ code + your unit tests at the same time".
Since you are using Xcode, I will give you some indications.
Write your application (or at least a part of it), and find the aspects / functions / objects you want to perform unit tests on.
In separate files, write unit tests that instantiate these objects and test their methods, call them and compare the inputs and outputs.
You should have a second target in your application, that will compile only the unit test source code and the relevant main program code.
Build this target, or press command-U and it will report successes and failures.
So, you need to separate your source code and isolate your classes / methods to make them testable like this. This needs a good architecture and design of the application on your part, and you may need to make some compromises in flexibility (that is up to you to decide). Oh, and I believe that in a testable code you should avoid global variables in general, for various reasons. Global variables are helpful sometimes, but they generally make unit testing really difficult, (and if misused may lead to spaghetti code, but this is a whole different story)
I hope I helped, even without fully understanding the second part of your post.

Insert text into C++ code between functions

I have following requirement:
Adding text at the entry and exit point of any function.
Not altering the source code, beside inserting from above (so no pre-processor or anything)
For example:
void fn(param-list)
{
ENTRY_TEXT (param-list)
//some code
EXIT_TEXT
}
But not only in such a simple case, it'd also run with pre-processor directives!
Example:
void fn(param-list)
#ifdef __WIN__
{
ENTRY_TEXT (param-list)
//some windows code
EXIT_TEXT
}
#else
{
ENTRY_TEXT (param-list)
//some any-os code
if (condition)
{
return; //should become EXIT_TEXT
}
EXIT_TEXT
}
So my question is: Is there a proper way doing this?
I already tried some work with parsers used by compilers but since they all rely on running a pre-processor before parsing, they are useless to me.
Also some of the token generating parser, which do not need a pre-processor are somewhat useless because they generate a memory-mapping of tokens, which then leads to a complete new source code, instead of just inserting the text.
One thing I am working on is to try it with FLEX (or JFlex), if this is a valid option, I would appreciate some input on it. ;-)
EDIT:
To clarify a little bit: The purpose is to allow something like a stack trace.
I want to trace every function call, and in order to follow the call-hierachy, I need to place a macro at the entry-point of a function and at the exit point of a function.
This builds a function-call trace. :-)
EDIT2: Compiler-specific options are not quite suitable since we have many different compilers to use, and many that are propably not well supported by any tools out there.
Unfortunately, your idea is not only impractical (C++ is complex to parse), it's also doomed to fail.
The main issue you have is that exceptions will bypass your EXIT_TEXT macro entirely.
You have several solutions.
As has been noted, the first solution would be to use a platform dependent way of computing the stack trace. It can be somewhat imprecise, especially because of inlining: ie, small functions being inlined in their callers, they do not appear in the stack trace as no function call was generated at assembly level. On the other hand, it's widely available, does not require any surgery of the code and does not affect performance.
A second solution would be to only introduce something on entry and use RAII to do the exit work. Much better than your scheme as it automatically deals with multiple returns and exceptions, it suffers from the same issue: how to perform the insertion automatically. For this you will probably want to operate at the AST level, and modify the AST to introduce your little gem. You could do it with Clang (look up the c++11 migration tool for examples of rewrites at large) or with gcc (using plugins).
Finally, you also have manual annotations. While it may seem underpowered (and a lot of work), I would highlight that you do not leave logging to a tool... I see 3 advantages to doing it manually: you can avoid introducing this overhead in performance sensitive parts, you can retain only a "summary" of big arguments and you can customize the summary based on what's interesting for the current function.
I would suggest using LLVM libraries & Clang to get started.
You could also leverage the C++ language to simplify your process. If you just insert a small object into the code that is constructed on function scope entrance & rely on the fact that it will be destroyed on exit. That should massively simplify recording the 'exit' of the function.
This does not really answer you question, however, for your initial need, you may use the backtrace() function from execinfo.h (if you are using GCC).
How to generate a stacktrace when my gcc C++ app crashes

Visual Studio 2005 C compiler problem when optimizing a switch statement

General Question which may be of interest to others:
I ran into a, what I believe, C++-compiler optimization (Visual Studio 2005) problem with a switch statement. What I'd want to know is if there is any way to satisfy my curiosity and find out what the compiler is trying to but failing to do. Is there any log I can spend some time (probably too much time) deciphering?
My specific problem for those curious enough to continue reading - I'd like to hear your thoughts on why I get problems in this specific case.
I've got a tiny program with about 500 lines of code containing a switch statement. Some of its cases contain some assignment of pointers.
double *ptx, *pty, *ptz;
double **ppt = new double*[3];
//some code initializing etc ptx, pty and ptz
ppt[0]=ptx;
ppt[1]=pty; //<----- this statement causes problems
ppt[2]=ptz;
The middle statement seems to hang the compiler. The compilation never ends. OK, I didn't wait for longer than it took to walk down the hall, talk to some people, get a cup of coffee and return to my desk, but this is a tiny program which usually compiles in less than a second. Remove a single line (the one indicated in the code above) and the problem goes away, as it also does when removing the optimization (on the whole program or using #pragma on the function).
Why does this middle line cause a problem? The compilers optimizer doesn't like pty.
There is no difference in the vectors ptx, pty, and ptz in the program. Everything I do to pty I do to ptx and ptz. I tried swapping their positions in ppt, but pty was still the line causing a problem.
I'm asking about this because I'm curious about what is happening. The code is rewritten and is working fine.
Edit:
Almost two weeks later, I check out the closest version to the code I described above and I can't edit it back to make it crash. This is really annoying, embarrassing and irritating. I'll give it another try, but if I don't get it breaking anytime soon I guess this part of the question is obsolete and I'll remove it. Really sorry for taking your time.
If you need to make this code compilable without changing it too much consider using memcpy where you assign a value to ppt[1]. This should at least compile fine.
However, you problem seems more like another part of the source code causes this behaviour.
What you can also try is to put this stuff:
ppt[0]=ptx;
ppt[1]=pty; //<----- this statement causes problems
ppt[2]=ptz;
in another function.
This should also help compiler a bit to avoid the path it is taking to compile your code.
Did you try renaming pty to something else (i.e. pt_y)? I encountered a couple of times (i.e. with a variable "rect2") the problem that some names seem to be "reserved".
It sounds like a compiler bug. Have you tried re-ordering the lines? e.g.,
ppt[1]=pty;
ppt[0]=ptx;
ppt[2]=ptz;
Also what happens if you juggle about the values that are assigned (which will introduce bugs in your code, but may indicator whether its the pointer or the array that's the issue), e.g.:
ppt[0] = pty;
ppt[1] = ptz;
ppt[2] = ptx;
(or similar).
It's probably due to your declaration of ptx, pty and ptz with them being optimised out to use the same address. Then this action is causing your compiler problems later in your code.
Try
static double *ptx;
static double *pty;
static double *ptz;