gcc compiler optimizations affected code - c++

Unfortunately I am not working with open code right now, so please consider this a question of pure theoretical nature.
The C++ project I am working with seems to be definitely crippled by the following options and at least GCC 4.3 - 4.8 are causing the same problems, didn't notice any trouble with 3.x series (these options might have not been existed or worked differently there), affected are the platforms Linux x86 and Linux ARM. The options itself are automatically set with O1 or O2 level, so I had to find out first what options are causing it:
tree-dominator-opts
tree-dse
tree-fre
tree-pre
gcse
cse-follow-jumps
Its not my own code, but I have to maintain it, so how could I possibly find the sources of the trouble these options are making. Once I disabled the optimizations above with "-fno" the code works.
On a side note, the project does work flawlessly with Visual Studio 2008,2010 and 2013 without any noticeable problems or specific compiler options. Granted, the code is not 100% cross platform, so some parts are Windows/Linux specific but even then I'd like to know what's happening here.
It's no vital question, since I can make the code run flawlessly, but I am still interested how to track down such problems.
So to make it short: How to identify and find the affected code?
I doubt it's a giant GCC bug and maybe there is not even a real fix for the code I am working with, but it's of real interest for me.
I take it that most of these options are eliminations of some kind and I also read the explanations for these, still I have no idea how I would start here.

First of all: try using debugger. If the program crashes, check the backtrace for places to look for the faulty function. If the program misbehaves (wrong outputs), you should be able to tell where it occurs by carefully placing breakpoints.
If it didn't help and the project is small, you could try compiling a subset of your project with the "-fno" options that stop your program from misbehaving. You could brute-force your way to finding the smallest subset of faulty .cpp files and work your way from there. Note: finding a search algorithm with good complexity could save you a lot of time.
If, by any chance, there is a single faulty .cpp file, then you could further factor its contents into several .cpp files to see which functions are the cause of misbehavior.

Related

Antivirus detecting compiled C++ files as trojans

I had installed a c++ compiler for windows with MinGW. I tried to make a simple program:
#include <iostream>
using namespace std;
int main() {
cout << "Hello World!";
return 0;
}
And saved it as try.cc. Afterwards I opened cmd in the folder and ran g++ try.cc -o some.exe. It generated some.exe but my antivirus (avast) recognized it as malware. I thought it could be a false positive, but it specifically said it's a trojan.
I removed the file from the virus chest and uploaded it to "https://www.virustotal.com/"
The result:
24 out of 72 engines detected it as malware and a lot of them as a trojan.
Is this a false positive? Why would it get detected as a trojan? If it is, how do I avoid getting this warning every time I make a new program?
Edit:
Thanks all for the help, I ran a full scan of my computer, with 2 antivirus and everything seemed clean. I also did a scan on the MinGW folder and nothing.
The problem keeps appearing each time I make a new c++ program. I tried modifying the code and the name but the AV kept detecting it as a virus. Funny thing is that changing the code changed the type of virus the av reported.
I'm still not 100% sure that the compiler is clean so I dont know if I should ignore it and run the programs anyway. I downloaded MinGW from "https://osdn.net/projects/mingw/releases/"
If anyone knows how to be completely sure that the executables created are not viruses, only false positives I would be glad they share it.
Edit 2:
It occurred to me that if the compiler is infected and it's adding code, then I might be able to see it with a decompiler/disassembler, feeding it the executable. I downloaded a c++ decompiler I found here "snowman" and used it on the file. The problem is that the code went from 7 lines in the original executable to 5265 and is a bit hard to make sense of it. If someone has some experience with reverse engineering, a link to the original file is in the comments below.
The issue has come up before. Programs compiled with mingw tend to trigger the occasional snake oil (i.e., antivirus program) alarm. That's probably because mingw is a popular tool chain for virus authors and thus its output matches generic patterns occurring in true positives. This has come up over and over again, also on SE (e.g. https://security.stackexchange.com/questions/229576/program-compiled-with-mingw32-is-reported-as-infected). [rant] In my opinion that's true evidence of incapacity for the AV companies because it would be easy to fix and makes you wonder whether the core functions of their programs are better implemented. [/rant]
Your case is a bit suspicious though because the number of triggered AV programs is so large. While I have never heard of a compromised mingw, and a cursory google search did not change that, it's not impossible. Compromising compilers is certainly an efficient method to spread a virus; the most famous example with an added level of indirection is the Ken Thompson hack.
It is also certainly possible that your computer is infected with a non-mingw-originating virus which simply inserts itself into new executables it finds on disk. That should be easy to find out by the usual means. A starting point could be to subject a few other (non-mingw) new executables to the online examination; they should trigger the same AV programs.
Note that while I have some general IT experience I have no special IT security knowledge; take everything I say just as a starting point for your own research and actions.
This could be caused by two things
It really is a trojan, you downloaded your mingw from some places where its code was altered to add a virus inside each program you create. This is done for almost all the commercial compilers, all "free" (cracked) version have that code inside them, each time you compile your code the virus is added to your exe.
The hash of your exe for some reason matched an existing virus, you can confirm if this by altering one characters in your code for example "hello world!" to "hello world?" and see if it is still considered as a virus, if yes, there is a very high chance that your compiler adds viruses to your programs.
Update:
It actually was some kind of hash collision, the compiler wasn't infected. I did change the string in the print function, as suggested, several times, even adding line breaks, but everytime, my AV detected it as malware. I also tried deleting some lines of code (the includes and the print) and it also detected it as malware.
Funny enough, when I added more lines to the code, the AV stopped recognizing it as a virus. Makes you wonder how the hash function used works, and how it relates to the actual content of the programs.
So is solved, and everything was fine, just some AV sloppiness (which I guess has it's reasons).

How do you ascertain that you are running the latest executable?

Every so often I (re)compile some C (or C++) file I am working on -- which by the way succeeds without any warnings -- and then I execute my program only to realize that nothing has changed since my previous compilation. To keep things simple, let's assume that I added an instruction to my source to print out some debugging information onto the screen, so that I have a visual evidence of trouble: indeed, I compile, execute, and unexpectedly nothing is printed onto the screen.
This happened me once when I had a buggy code (I ran out of the bounds of a static array). Of course, if your code has some kind of hidden bug (What are all the common undefined behaviours that a C++ programmer should know about?) the compiled code can be pretty much anything.
This happened me twice when I used some ridiculously slow network hard drive which -- I guess -- simply did not update my executable file after compilation, and I kept running-and-running the old version, despite the updated source. I just speculate here, and feel free to correct me, if such a phenomenon is impossible, but I suspect it has had to do something with certain processes waiting for IO.
Well, such things could of course happen (and they indeed do), when you execute an old version in the wrong directory (that is: you execute something similar, but actually completely unrelated to your source).
It is happening again, and it annoys me enough to ask: how do you make sure that your executable is matching the source you are working on? Should I compare the date strings of the source and the executable in the main function? Should I delete the executable prior compilation? I guess people might do something similar by means of version control.
Note: I was warned that this might be a subjective topic likely doomed to be closed.
Just use ol' good version control possibilities
In easy case you can just add (any) visible version-id in the code and check it (hash, revision-id, timestamp)
If your project have a lot of dependent files and you suspect older version, than "latest", in produced code, you can (except, obvioulsly, good makefile-rules) monitor also version of every file, used for building code (VCS-dependent, but not so heavy trick)
Check the timestamp of your executable. That should give you a hint regarding whether or not it is recent/up-to-date.
Alternatively, calculate a checksum for your executable and display it on startup, then you have a clue that if the csum is the same the executable was not updated.

Segmentation when NOT running on debug

as the title mentions, I have a problem where one executable of a big project that gives a segmentation fault when it runs but is compiled normally and not with debug.
We are working on linux SUSE servers and code is mostly C++. Through bt in gdb, I have been able to see where exactly the problem occurs, which brings me to the question. The file is an auto-generated one which has not been changed for years. The difference now is that we have updated a third party component, gSOAP. Before updating the third party version it worked normally on both debug and not.
With debug flags, the problem disappears magically (for newbies like me).
I am sorry but its not possible to include a lot of code, only the line that is:
/*------------------------------------------------------------.
| yynewstate -- Push a new state, which is found in yystate. |
`------------------------------------------------------------*/
yynewstate:
/* In all cases, when you get here, the value and location stacks
have just been pushed. So pushing a state here evens the stacks. */
yyssp++;
yysetstate:
*yyssp = yystate; <------------------ THIS LINE
So, any help would appreciated. I actually dont understand why this problem rises and what steps I should take to solve it.
EDIT, I dont expect you to solve this particular case for me, as in more to help me understand why in programming this could occur, my case in this code is just an example.
First, please realize that you're using C++, not Java or any other language where the running of your program is always predictable, even runtime issues are predictable.
In C++, things are not predictable as in those languages. Just because your original program hasn't changed for years does not mean the program was error-free. That's how C++ works -- you think you have an error-free program, and it is not really error-free.
From your code, the exception is because yyssp is pointing to something it shouldn't be pointing to, and dereferencing this pointer causes the exception. That is the only thing that could be concluded from the code you posted. Why the pointer is pointing to where it is? We don't know, that is what you need to discover by debugging.
As to why things run differently in debug and release -- again, a bug like this allows a program to run in an unpredictable way. Add or remove code, run it on another machine, run it with differing compiler options, maybe even run it next week, and it might behave differently.
One thing you should not do -- if you make a totally irrelevant code change and magically your program works, do not claim the problem is fixed or resolved. No -- the problem is not fixed -- you've either masked it, or the bug is moved to another part of your code, hidden from you. Every fix that entails things like this must be reasoned as to why the fix addresses the problem.
Too many times, a naive programmer thinks that moving things around, adding or removing lines, and bingo, things work, that becomes the fix. Don't fall into that trap.
someone in my team found a temporary solution for this,
it was the optimization flags that this library is build with.
The default for our build was -O2 while on debug this changes.
Building the library with -O0 (changing the makefile) provides a temporary solution.

Can massive nested loops cause the linker to run endlessly when compiling in Release-Mode?

I'm compiling a very small Win32 command-line application in VS2010 Release-Mode, with all speed optimizations turned on (not memory optimizations).
This application is designed to serve a single purpose - to perform a single pre-defined complex mathematical operation to find a complex solution to a specific problem. The algorithm is completely functional (confirmed) and it compiles and runs fine in Debug-Mode. However, when I compile in Release-Mode (the algorithm is large enough to make use of the optimizations), Link.exe appears to run endlessly, and the code never finishes linking. It sits at 100% CPU usage, with no changes in memory usage (43,232 K).
My application contains only two classes, both of which are pretty short code files. However, the algorithm comprises 20 or so nested loops with inline function calls from within each layer. Is the linker attempting to run though every possible path through these loops? And if so, why is the Debug-Mode linker not having any problems?
This is a tiny command-line application (2KB exe file), and compiling shouldn't take more than a couple minutes. I've waited 30 minutes so far, with no change. I'm thinking about letting it link overnight, but if it really is trying to run through all possible code paths in the algorithm it could end up linking for decades without a SuperComputer handy.
What do I need to do to get the linker out of this endless cycle? Is it possible for such code to create an infinite link-loop without getting a compiler-error prior to the link cycle?
EDIT:
Jerry Coffin pointed out that I should kill the linker and attempt again. I forgot to mention this in the original post, but I have aborted the build, closed and re-opened VS, and attempted to build multiple times. The issue is consistent, but I haven't changed any linker options as of yet.
EDIT2:
I also neglected to mention the fact that I deleted the "Debug" and "Release" folders and re-built from scratch. Same results.
EDIT3:
I just confirmed that turning off function inlining causes the linker to function normally. The problem is I need function inlining, as this is a very performance-sensitive operataion with a minimal memory footprint. This leads me to ask, why would inlining cause such a problem to occur?
EDIT4:
The output that displays during the unending link cycle:
Link:
Generating code
EDIT5:
I confirmed that placing all the code into a single CPP file did not resolve the issue.
Nested loops only affect the linker in terms of Link Time Code Generation. There's tons of options that determine how this works in detail.
For a start I suggest disabling LTCG alltogether to see if there's some other unusual problem.
If it links fine in Release with LTCG disabled you can experiment with inlining limits, intrinsics and optimization level.
Have you done a rebuild? If one of the object files got corrupted, or a .ilk file (though release mode probably isn't using one, I can't be sure about your project settings), then the cleaning pass which rebuild does should cure it.
Closing and re-opening Visual Studio is only helpful in cases when the debugging (or one of the graphical designers) is keeping an open handle to the build product.
Which leads to another suggestion -- check Task Manager and make sure that you don't have a copy of your application still running.
If the linker seems to be the bottle neck and both classes are pretty short, why not put all the code in a single file? Also, there is a multi-processor compilation option in visual studio under C++ general tab in the project settings dialog box. Both of these might help.
Some selected answers:
Debug builds don't inline. Not at all. This makes it possible to put breakpoints in all functions. It wou
It doesn't really matter whether the linker truly runs an infinite loop, or whether it is evaluating 2^20 different combinations of inline/don't inline. The latter still could take 2^20 seconds. (We know the linker is inlining from the "Generating code" message).
Why are you using LTCG anyway? For such a small project, it's entirely feasible to put everything in one .cpp file. Currently, the compiler is just parsing C++, and not generating any code. That also explains why it doesn't have any problem: there's only a single loop in the compiler, over all lines of source code.
Here in my company we discovered something similar.
As far as I know it is no endless loop, just a very long operation.
This means you have to options:
turn off the optimizations
wait until the linker has finished
You have to decide if you really need these optimizations. Here the program is just a little helper, where it does not matter if the program is running 23.4 seconds or 26.8. The gain of program speed compared to the build speed are making the optimizations nearly useless. At least in our special scenario.
I encountered exactly this problem trying to build OSMesa x64 Release mode. My contribution to this discussion is that after letting the linker run overnight, it did complete properly.

Why does my debugger sometimes freak out and do things like not line up with my code?

When I'm using my debugger (in my particular case, it was QT Creator together with GDB that inspired this) on my C++ code, sometimes even after calling make clean followed by make the debugger seems to freak out.
Sometimes it will seem to be lined up with another piece of code's line numbers, and will jump around. Sometimes this is is off by one line, sometimes this is totally off and it'll jump around erratically.
Other times, it'll freak out by stepping into things I didn't ask it to step into, like while stepping over a function call, it might step into the string initialization routine that is part of it.
When I get seg faults, sometimes it's able to tell me where it happened perfectly, and other times it's not even able to display question marks for which functions called the code and from where, and all I see is assembly, even while running the exact same code repeatedly.
I can't seem to figure out a pattern to what causes these failures, and sometimes my debugger is perfectly well behaved.
What are the theoretical reasons behind these debugger freak outs, and what are the concrete steps I can take to prevent them?
There's 3 very common reasons
You're debugging optimized code. This rarely works - optimized code can be reordered/inlined/precomputed/etc. to the point there's no chance whatsoever to map it back to the source code.
You're not debugging, for whatever reason, the binary matching the current source code.
You've invoked undefined behavior somewhere - if whatever stuff your code did, it has messed around with the scaffolding the debugger needs to keep its sanity. This is what usually happens when you get a segfault and you can't get a sane stack trace, you've overwritten/messed with the information(e.g. stack pointers) the debugger needs to do its job.
And probably hundreds more - of the stuff I personally encounter is: debugging multithreaded code; depending on gcc/gdb versions and various other things - there's been quite a handful debugger bugs.
One possible reason is that debuggers are as buggy as any other program!
But the most common reason for a debugger not showing the right source location is that the compiler optimized the code in some way, so there is no simple correspondence between the source code and the executable code. A common optimization that confuses debuggers is inlining, and C++ is very prone to it.
For example, your string initialization routine was probably inlined into the function call, so as far as the debugger was concerned, there was just one function that happened to start with some string initialization code.
If you're tracking down an algorithm bug (as opposed to a coding bug that produces undefined behavior, or a concurrency bug), turning the optimization level down will help you track the bug, because the debugger will have a simpler view of the code.
I have the same question like yours, and I cannot solve it yet. But I have came out one problem solution which is to install a virtual machine and install Unix system in it. And debug it in Linux system. Perhaps it will work.
I have found out the reason, you should rebuild the project every time you changed your code, or the Qt will just run the old version of the code.