So basically, besides possible performance effects, does inlining functions have any considerable effect on how difficult it is to reverse-engineer the program from its compiled and linked binary?
I mean, it should be, since 1) the cracker just sees more machine instructions, instead of nice understandable "call XXXXX", which he may already have discovered to do a certain thing. and 2) inlining provides more possibilities for the compiler to optimize code, and that is even more obfuscation, right?
Also, considering the inline keyword is just a suggestion to the compiler, how much effect can this really have? Should we bother? I mean, of course they will crack it eventually, but if by such simple measures we can make the cracker's life harder, why not?
The choice to inline methods or not should not be based on how easy it is to reverse engineer. The difference between inlining and not will be negligible.
The exception is if you have any anti-piracy code, inlining that, or even using macros to ensure it's "inlined" can help remove a single point of failure.
If you're concerned about this, I suggest looking into obfuscation tools that operate on the binary.
It will decrease the similarity between your input and his output. It generally won't have much effect on his efforts in general though.
Related
When it comes to procedural programming, functional decomposition is ideal for maintaining complicated code. However, functions are expensive- adding to the call stack, passing parameters, storing return addresses. all of this takes extra time! When speed is crucial, how can I get the best of both worlds? I want a highly decomposed program without any necessary overhead introduced by function calls. I'm familiar with the keyword: "inline" but that seems to be only be a suggestion to the compiler, and if used incorrectly by the programmer it will yield an even slower program. I'm using g++, so will the -03 flag optimize away my functions that call functions that call functions..
I just wanted to know, if my concerns are valid and if there are any methods to combat this issue.
First, as always when dealing with performance issues, you should try and measure what are your bottlenecks with a profiler. The first thing coming out is usually not function calls and by a large margin. If you did this, then please read on.
Then, you can anticipate a bit what functions you want inlined by using the inline keyword. The compiler is usually smart enough to know what to inline and what not to inline (it can inline functions you forgot and may not inline some you mentionned if he thinks it won't help).
If (really) you still want to improve performance on function calls and want to force inlining, some compilers allow you to do so (see this question). Please consider that massive inlining may actually decrease performance: your code will use a lot of memory and you may get more cache misses on the code than before (which is not good).
If it's a specific piece of code you're worried about you can measure the time yourself. Just run it in a loop a large number of times and get the system time before and after. Use the difference to find the average time of each call.
As always the numbers you get are subjective, since they will vary depending on your system and compiler. You can compare the times you get from different methods to see which is generally faster, such as replacing the function with a macro. My guess is however you won't notice much difference, or at the very least it will be inconsequential.
If you don't know where the slowdown is follow J.N's advice and use a code profiler and optimise where it's needed. As a rule of thumb always pass large objects to functions by reference or const reference to avoid copy times.
I highly doubt speed is that curcial, but my suggestion would be to use preprocessor macros.
For example
#define max(a,b) ( a > b ? a : b )
This would seem obvious to me, but I don't consider myself an expect in C++, so I may have misunderstood the question.
Suppose I crafted a set of classes to abstract something and now I worry whether my C++ compiler will be able to peel off those wrappings and emit really clean, concise and fast code. How do I find out what the compiler decided to do?
The only way I know is to inspect the disassembly. This works well for simple code, but there're two drawbacks - the compiler might do it different when it compiles the same code again and also machine code analysis is not trivial, so it takes effort.
How else can I find how the compiler decided to implement what I coded in C++?
I'm afraid you're out of luck on this one. You're trying to find out "what the compiler did". What the compiler did is to produce machine code. The disassembly is simply a more readable form of the machine code, but it can't add information that isn't there. You can't figure out how a meat grinder works by looking at a hamburger.
I was actually wondering about that.
I have been quite interested, for the last few months, in the Clang project.
One of Clang particular interests, wrt optimization, is that you can emit the optimized LLVM IR code instead of machine code. The IR is a high-level assembly language, with the notion of structure and type.
Most of the optimizations passes in the Clang compiler suite are indeed performed on the IR (the last round is of course architecture specific and performed by the backend depending on the available operations), this means that you could actually see, right in the IR, if the object creation (as in your linked question) was optimized out or not.
I know it is still assembly (though of higher level), but it does seem more readable to me:
far less opcodes
typed objects / pointers
no "register" things or "magic" knowledge required
Would that suit you :) ?
Timing the code will directly measure its speed and can avoid looking at the disassembly entirely. This will detect when compiler, code modifications or subtle configuration changes have affected the performance (either for better or worse). In that way it's better than the disassembly which is only an indirect measure.
Things like code size can also serve as possible indicators of problems. At the very least they suggest that something has changed. It can also point out unexpected code bloat when the compiler should have boiled down a bunch of templates (or whatever) into a concise series of instructions.
Of course, looking at the disassembly is an excellent technique for developing the code and helping decide if the compiler is doing a sufficiently good translation. You can see if you're getting your money's worth, as it were.
In other words, measure what you expect and then dive in if you think the compiler is "cheating" you.
You want to know if the compiler produced "clean, concise and fast code".
"Clean" has little meaning here. Clean code is code which promotes readability and maintainability -- by human beings. Thus, this property relates to what the programmer sees, i.e. the source code. There is no notion of cleanliness for binary code produced by a compiler that will be looked at by the CPU only. If you wrote a nice set of classes to abstract your problem, then your code is as clean as it can get.
"Concise code" has two meanings. For source code, this is about saving the scarce programmer eye and brain resources, but, as I pointed out above, this does not apply to compiler output, since there is no human involved at that point. The other meaning is about code which is compact, thus having lower storage cost. This can have an impact on execution speed, because RAM is slow, and thus you really want the innermost loops of your code to fit in the CPU level 1 cache. The size of the functions produced by the compiler can be obtained with some developer tools; on systems which use GNU binutils, you can use the size command to get the total code and data sizes in an object file (a compiled .o), and objdump to get more information. In particular, objdump -x will give the size of each individual function.
"Fast" is something to be measured. If you want to know whether your code is fast or not, then benchmark it. If the code turns out to be too slow for your problem at hand (this does not happen often) and you have some compelling theoretical reason to believe that the hardware could do much better (e.g. because you estimated the number of involved operations, delved into the CPU manuals, and mastered all the memory bandwidth and cache issues), then (and only then) is it time to have a look at what the compiler did with your code. Barring these conditions, cleanliness of source code is a much more important issue.
All that being said, it can help quite a lot if you have a priori notions of what a compiler can do. This requires some training. I suggest that you have a look at the classic dragon book; but otherwise you will have to spend some time compiling some example code and looking at the assembly output. C++ is not the easiest language for that, you may want to begin with plain C. Ideally, once you know enough to be able to write your own compiler, then you know what a compiler can do, and you can guess what it will do on a given code.
You might find a compiler that had an option to dump a post-optimisation AST/representation - how readable it would be is another matter. If you're using GCC, there's a chance it wouldn't be too hard, and that someone might have already done it - GCCXML does something vaguely similar. Of little use if the compiler you want to build your production code on can't do it.
After that, some compiler (e.g. gcc with -S) can output assembly language, which might be usefully clearer than reading a disassembly: for example, some compilers alternate high-level source as comments then corresponding assembly.
As for the drawbacks you mentioned:
the compiler might do it different when it compiles the same code again
absolutely, only the compiler docs and/or source code can tell you the chance of that, though you can put some performance checks into nightly test runs so you'll get alerted if performance suddenly changes
and also machine code analysis is not trivial, so it takes effort.
Which raises the question: what would be better. I can image some process where you run the compiler over your code and it records when variables are cached in registers at points of use, which function calls are inlined, even the maximum number of CPU cycles an instruction might take (where knowable at compile time) etc. and produces some record thereof, then a source viewer/editor that colour codes and annotates the source correspondingly. Is that the kind of thing you have in mind? Would it be useful? Perhaps some more than others - e.g. black-and-white info on register usage ignores the utility of the various levels of CPU cache (and utilisation at run-time); the compiler probably doesn't even try to model that anyway. Knowing where inlining was really being done would give me a warm fuzzy feeling. But, profiling seems more promising and useful generally. I fear the benefits are more intuitively real than actually, and compiler writers are better off pursuing C++0x features, run-time instrumentation, introspection, or writing D "on the side" ;-).
The answer to your question was pretty much nailed by Karl. If you want to see what the compiler did, you have to start going through the assembly code it produced--elbow grease is required. As to discovering the "why" behind the "how" of how it implemented your code...every compiler (and every build, potentially), as you mentioned, is different. There are different approaches, different optimizations, etc. However, I wouldn't worry about whether it's emitting clean, concise machine code--cleanliness and concision should be left to the source code. Speed, on the other hand, is pretty much the programmer's responsibility (profiling ftw). More interesting concerns are correctness, maintainability, readability, etc. If you want to see if it made a specific optimization, the compiler docs might help (if they're available for your compiler). You can also just trying searching to see if the compiler implements a known technique for optimizing whatever. If those approaches fail, though, you're right back to reading assembly code. Keep in mind that the code that you're checking out might have little to no impact on performance or executable size--grab some hard data before diving into any of this stuff.
Actually, there is a way to get what you want, if you can get your compiler to
produce DWARF debugging information. There will be a DWARF description for each
out-of-line function and within that description there will (hopefully) be entries
for each inlined function. It's not trivial to read DWARF, and sometimes compilers
don't produce complete or accurate DWARF, but it can be a useful source of information
about what the compiler actually did, that's not tied to any one compiler or CPU.
Once you have a DWARF reading library there are all sorts of useful tools you can
build around it.
Don't expect to use it with Visual C++ as that uses a different debugging format.
(But you might be able to do similar queries through the debug helper library
that comes with it.)
If your compiler manages to translate your "wrappings and emit really clean, concise and fast code" the effort to follow-up the emitted code should be reasonable.
Contrary to another answer I feel that emitted assembly code may well be "clean" if it is (relatively) easily mappable to the original source code, if it doesn't consist of calls all over the place and that the system of jumps is not too complex. With code scheduling and re-ordering an optimized machine code which is also readable is, alas, a thing of the past.
I am wondering if it is still worth with modern compilers and their optimizations to write some critical code in C instead of C++ to make it faster.
I know C++ might lead to bad performance in case classes are copied while they could be passed by reference or when classes are created automatically by the compiler, typically with overloaded operators and many other similar cases; but for a good C++ developer who knows how to avoid all of this, is it still worth writing code in C to improve performance?
I'm going to agree with a lot of the comments. C syntax is supported, intentionally (with divergence only in C99), in C++. Therefore all C++ compilers have to support it. In fact I think it's hard to find any dedicated C compilers anymore. For example, in GCC you'll actually end up using the same optimization/compilation engine regardless of whether the code is C or C++.
The real question is then, does writing plain C code and compiling in C++ suffer a performance penalty. The answer is, for all intents and purposes, no. There are a few tricky points about exceptions and RTTI, but those are mainly size changes, not speed changes. You'd be so hard pressed to find an example that actually takes a performance hit that it doesn't seem worth it do write a dedicate module.
What was said about what features you use is important. It is very easy in C++ to get sloppy about copy semantics and suffer huge overheads from copying memory. In my experience this is the biggest cost -- in C you can also suffer this cost, but not as easily I'd say.
Virtual function calls are ever so slightly more expensive than normal functions. At the same time forced inline functions are cheaper than normal function calls. In both cases it is likely the cost of pushing/popping parameters from the stack that is more expensive. Worrying about function call overhead though should come quite late in the optimization process -- as it is rarely a significant problem.
Exceptions are costly at throw time (in GCC at least). But setting up catch statements and using RAII doesn't have a significant cost associated with it. This was by design in the GCC compiler (and others) so that truly only the exceptional cases are costly.
But to summarize: a good C++ programmer would not be able to make their code run faster simply by writing it in C.
measure! measure before thinking about optimizing, measure before applying optimization, measure after applying optimization, measure!
If you must run your code 1 nanosecond faster (because it's going to be used by 1000 people, 1000 times in the next 1000 days and that second is very important) anything goes.
Yes! it is worth ...
changing languages (C++ to C; Python to COBOL; Mathlab to Fortran; PHP to Lisp)
tweaking the compiler (enable/disable all the -f options)
use different libraries (even write your own)
etc
etc
What you must not forget is to measure!.
pmg nailed it. Just measure instead of global assumptions. Also think of it this way, compilers like gcc separate the front, middle, and back end. so the frontend fortran, c, c++, ada, etc ends up in the same internal middle language if you will that is what gets most of the optimization. Then that generic middle language is turned into assembler for the specific target, and there are target specific optimizations that occur. So the language may or may not induce more code from the front to middle when the languages differ greatly, but for C/C++ I would assume it is the same or very similar. Now the binary size is another story, the libraries that may get sucked into the binary for C only vs C++ even if it is only C syntax can/will vary. Doesnt necessarily affect execution performance but can bulk up the program file costing storage and transfer differences as well as memory requirements if the program loaded as a while into ram. Here again, just measure.
I also add to the measure comment compile to assembler and/or disassemble the output and compare the results of your different languages/compiler choices. This can/will supplement the timing differences you see when you measure.
The question has been answered to death, so I won't add to that.
Simply as a generic question, assuming you have measured, etc, and you have identified that a certain C++ (or other) code segment is not running at optimal speed (which generally means you have not used the right tool for the job); and you know you can get better performance by writing it in C, then yes, definitely, it is worth it.
There is a certain mindset that is common, trying to do everything from one tool (Java or SQL or C++). Not just Maslow's Hammer, but the actual belief that they can code a C construct in Java, etc. This leads to all kinds of performance problems. Architecture, as a true profession, is about placing code segments in the appropriate architectural location or platform. It is the correct combination of Java, SQL and C that will deliver performance. That produces an app that does not need to be re-visited; uneventful execution. In which case, it will not matter if or when C++ implements this constructors or that.
I am wondering if it is still worth with modern compilers and their optimizations to write some critical code in C instead of C++ to make it faster.
no. keep it readable. if your team prefers c++ or c, prefer that - especially if it is already functioning in production code (don't rewrite it without very good reasons).
I know C++ might lead to bad performance in case classes are copied while they could be passed by reference
then forbid copying and assigning
or when classes are created automatically by the compiler, typically with overloaded operators and many other similar cases
could you elaborate? if you are referring to templates, they don't have additional cost in runtime (although they can lead to additional exported symbols, resulting in a larger binary). in fact, using a template method can improve performance if (for example) a conversion would otherwise be necessary.
but for a good C++ developer who knows how to avoid all of this, is it still worth writing code in C to improve performance?
in my experience, an expert c++ developer can create a faster, more maintainable program.
you have to be selective about the language features that you use (and do not use). if you break c++ features down to the set available in c (e.g., remove exceptions, virtual function calls, rtti) then you're off to a good start. if you learn to use templates, metaprogramming, optimization techniques, avoid type aliasing (which becomes increasingly difficult or verbose in c), etc. then you should be on par or faster than c - with a program which is more easily maintained (since you are familiar with c++).
if you're comfortable using the features of c++, use c++. it has plenty of features (many of which have been added with speed/cost in mind), and can be written to be as fast as c (or faster).
with templates and metaprogramming, you could turn many runtime variables into compile-time constants for exceptional gains. sometimes that goes well into micro-optimization territory.
I was going through this link Will it optimize and wondered how can we know what optimizations are done by a particular compiler.
Like does VC8.0 convert if-else statements to switch-case?
Is such information available on msdn?
As everyone seems to be bent on telling the OP that he shouldn't worry about it, there is some useful although not as specific as the OP requested) information about compiler optimization (options).
You'll have to figure out what flags you're using, especially for MSVC and Intel (GCC release build should default to -O2), but here are the links:
GCC
MSVC
Intel
This is about as close as you'll get before disassembling your binary after compilation.
It depends on the level of of optimization you choose for compiler.
you can find a very nice article about it here
First of all, if optimization took place then your program should work faster usually. After that you could inspect disassembly code to find out what kind of optimizations were performed.
I don't know anything about VC8.0, so I'm not sure how you would access that information. However, if you are generally interested in the kinds of optimisations that go on and want to experiment, I recommend you use LLVM. You can look at the unoptimised, disassembled byte code generated from the default C front end, and then run various optimiser passes over it to see what the effect is each time. Because it's a nicer, abstract assembly code, it tends to be a little easier to figure out what is an optimisation derivable from the code and what is a machine-specific optimisation.
Like does VC8.0 convert if-else statements to switch-case?
Compilers do not do magically rewrite your source code. And even if they did, what would that tell you? What you really would want to know is if the compiler compiled it into a jump table or into multiple compare operations. Any dis-assembler will tell you that.
To clarify my point: Writing a switch-case statement does not necesseraly imply that there will be a jump table in the binary. Not needing to worry about this is the whole point of having compilers.
Instead of figuring out which optimizations are done by the compiler in general, it's probably better to NOT have any dependencies on such compiler-specific knowledge.
Instead start out with a good design and algorithm, writing (as much as possible) portable code that's easy to follow. Then profile the code if it's too slow and fix the actual hotspots. Compiler optimizations are useful no doubt, but better is to apply some investigation to what's actually happening in the code. Algorithmic/design improvements at the source level will typically help performance more than the presence or absence of optimizations like transforming if/else into switch-case.
I'm not sure what "convert if/else to switch/case" means. My processor doesn't have a hardware switch/case instruction.
Typical compilers have several different ways to implement switch/case. A well-known one is using a jump table, but this is only done if appropriate.
For if/else, certainly it is normal for compilers to analyse a digraph of execution flow. I would expect a compiler to notice if each condition references the same variable, and I would expect the compiler to treat equivalent forms of conditionals the same way in general. But this isn't something I'd worry about.
IIRC, the general policy in GCC is that regressions in optimisation are tolerable so long as preferred improvements result. Optimisation is complex and what is "generally" a good optimisation isn't always that great. Plus for perfect optimisation, the compiler would have to know things it can't know (e.g. what inputs it will encounter in real life).
The point is that it really isn't worthwhile knowing that much about specific optimisations unless you happen to be a compiler developer. If you depend on something being optimised by V8, that particular optimisation might not happen in V9 or V10.
I have a section of my program that contains a large amount of math with some rather long equations. Its long and unsightly and I wish to replace it with a function. However, chunk of code is used an extreme number of times in my code and also requires a lot of variables to be initialized.
If I'm worried about speed, is the cost of calling the function and initializing the variables negligible here or should i stick to directly coding it in each time?
Thanks,
-Faken
Most compilers are smart about inlining reasonably small functions to avoid the overhead of a function call. For functions big enough that the compiler won't inline them, the overhead for the call is probably a very small fraction of the total execution time.
Check your compiler documentation to understand it's specific approach. Some older compilers required or could benefit from hints that a function is a candidate for inlining.
Either way, stick with functions and keep your code clean.
Are you asking if you should optimize prematurely?
Code it in a maintainable manner first; if you then find that this section is a bottleneck in the overall program, worry about tuning it at that point.
You don't know where your bottlenecks are until you profile your code. Anything you can assume about your code hot spots is likely to be wrong. I remember once I wanted to optimize some computational code. I ran a profiler and it turned out that 70 % of the running time was spent zeroing arrays. Nobody would have guessed it by looking at the code.
So, first code clean, then run a profiler, then optimize the rough spots. Not earlier. If it's still slow, change algorithm.
Modern C++ compilers generally inline small functions to avoid function call overhead. As far as the cost of variable initialization, one of the benefits of inlining is that it allows the compiler to perform additional optimizations at the call site. After performing inlining, if the compiler can prove that you don't need those extra variables, the copying will likely be eliminated. (I assume we're talking about primitives, not things with copy constructors.)
The only way to answer that is to test it. Without knowing more about the proposed function, nobody can really say whether the compiler can/will inline that code or not. This may/will also depend on the compiler and compiler flags you use. Depending on the compiler, if you find that it's really a problem, you may be able to use different flags, a pragma, etc., to force it to be generated inline even if it wouldn't be otherwise.
Without knowing how big the function would be, and/or how long it'll take to execute, it's impossible guess how much effect on speed it'll have if it isn't generated inline.
With both of those being unknown, none of us can really guess at how much effect moving the code into a function will have. There might be none, or little or huge.