How should I compare a c++ metaprogram with an C code ? (runtime ) - c++

I have ported a C program to a C++ Template Meta program .Now i want to compare the runtime .
Since there is almost no runtime in the C++ program , how should i compare these 2 programs.
Can i compare C runtime with C++ compile time ? or is it just not comparable ?

You can compare anything you want to compare. There is no one true rule of what should be compared.
You can compare the time each version takes to execute, or you can compare the time taken to compile each.
Or you can compare the length of the program, or the number of 'r' characters in the source file.
You could compare the timestamp of each file.
How you should compare the two programs depend on what you want to show!
If you want to show that one executes faster than the other, then run both, time how long they take to execute, and compare those numbers.
If you want to show that one compiles faster than the other, then time the time it takes to compile them.
If you think the relation between the compile time of the C++ program and the run time of the C program is relevant, then compare those.
Decide what it is you want to show. Then you'll know what to compare.

If I understand correctly, you've rewritten a C program with one that is entirely template-based? As a result, you're comparing the time it takes to run the C program with a C++ program that takes almost no time but simply writes the result out.
In this case, I don't think its quite comparable - the end user will see the C program take x seconds to run, and the C++ one complete immediately. However, the developer will see the C program compile in x seconds, and the C++ compile in many more seconds.
You could compare the C++ compile time to the C run time, and if the app is designed to produce a result and never run twice, then yes, you can compare the times in this way. If the program is designed to be run multiple times, then the run time is what you need to compare.
I just hope you put a LOT of comments in your C++ template code though :)
PS. I'm curious - how long does the C take to run, compared to the compile time for both?

since the C++ program will always produce the same result, why bother with any of it? compute the result once using either program, and then replace both with:
int main()
{
printf("<insert correct output here>\n");
return 0;
}

I think what would make sense is to compare compile times of the two programs, then runtimes, then you can calculate after how many runs you have amortized the additional compile time.

This is what i think you're trying to do:
You haven't said what your c program does so lets say it computes a cosine number to some specified degree of accuracy. You've converted this program into a c++ template-based equivalent which does the same thing but at compile time to yield a compile-time contant value. This is a reasonable thing to do as you may have an algorithm that uses "hard-coded" cosine values and you prefer not to have a table of random looking numbers. See this article for an example of real-world use for this (or do a search for Blitz and/or Todd Veldhuizen for more examples).
In which case, you therefore want to compare the compile-time performance of the C++ sine calculator against the run-time performance of the original C version.
A direct comparison of the time to compile the C++ source file against the time to run the C version will almost certainly show the compile time to be significantly slower. But this is hardly a fair comparison since the compiler is doing a lot more than just "executing" the template code.
EDIT: You could compensate for the compiler overhead by creating a copy of your c++ program which has some simple code equivalent to what the templated code would generate - i.e. you have to hand compile your templated code if that makes sense. If you then time the compilation of that source, the difference between that time and time to compile your original C++ templated program is presumably just the time required to execute the template.

Today's C and C++ compilers share the same backends, hence generate most likely the same assembly code.
C++ is just a more annotated C, and you can still do good C while Cplusplusing ;)
C is just C++ old brother.

Related

Compiling C++ code to .EXE which returns double [duplicate]

This question already has answers here:
What should main() return in C and C++?
(19 answers)
Closed 5 years ago.
I am working with a MATLAB optimization platform for black-box cost functions (BBCF).
To make the user free-handed, the utilized BBCF can be any executable file which inputs the input parameters of BBF and must output (return) the cost value of BBCF, so that the MATLAB BBCF optimizer finds the best (least cost) input parameter.
Considering that, from the one hand, my BBCF is implemented in C++, and from the other hand the cost value is a double (real number), I need to compile my code to an EXE file that outputs (returns) double.
But, to the best of my knowledge, when I compile a C++ code to EXE, it "mainly" compiles the main() function and its output is the output of main() function (i.e. 0 if running successful!).
An idea could be using a main function that returns double, and then, compiling such main() to EXE but, unfortunately, it is not possible in C++ (as explained in this link or as claimed in the 3rd answer of this question to be a bug of C++ neither of which are business of this question).
Can anyone provide an idea by which the EXE compiled form of a C++ code, outputs (returns) a double value?
This is not 'a bug in C++' (by the way, the bug might be in some C++ compiler, not in the language itself) - it's described in the standard that main() should return an integer:
http://en.cppreference.com/w/cpp/language/main_function
Regarding how to return a non-int from an executable, there are a couple of ways to do that. Two simplest (in terms of how to implement them) solutions come to my mind:
Save it to a file. Then either monitor that file in Matlab for changes (e.g. compare timestamps) or read after each execution of your EXE file depending on how you're going to use it. Not very efficient solution, but does the job and probably the performance penalty is negligible to that of your other calculations.
If you are fine with your cost value losing some numerical accuracy, you can just multiply the double value by some number (the larger this number, the more decimal places you will retain). Then round it, cast it to an int, have it returned from main(), cast it back to double in matlab and divide by the same number. The number used as the multiplier should be a power of 2 so that it doesn't introduce additional rounding errors. This method might be particularly useful if your cost value takes the values limited to the range [0, 1] or if you can normalize it to these values and you know that variations less than some threshold are not important.
In English, 'shall' is an even stronger imperative than 'must'.
Making a change like this would require changes to the operating system and shell. Such changes are unlikely to happen.
The easiest way to pass a double return would to to write it to standard output. Alternatively there are several methods available for interprocess communication.

Optimizing if-then-else statement in Fortran 77

For my C++ code, I asked this question about two days ago. But I realize now that I have to do the coding in Fortran since the kernels I write is going to be part of an existing application written in Fortran 77. Therefore I am posting this question again, this time the context is Fortran. Thank you.
I have different functions for square matrix multiplication depending on matrix size which varies from 8x8 through 20x20. The functions differ from each other because each employ different strategies for optimization, namely, different loop permutations and different loop unroll factors. Matrix size is invariant during the life of a program, and is known at compile time. My goal is to reduce the time to decide which function must be used. For example, a naive implementation is:
if (matrixSize == 8) C = mxm8(A, B);
else if (matrixSize == 9) C = mxm9(A,B);
...
else if (matrixSize == 20) C = mxm20(A,B);
The time taken to decide which function to use for every matrix multiplication is non-trivial in this case, specially since matrix multiplication happens frequently in the code. Thanks in advance for any suggestion on how to handle this in Fortran 77.
If matrixSize is a compile time constant in a language sense (i.e. it is a Fortran PARAMETER), then I would expect most optimising compilers to take advantage of that, and completely eliminate a runtime branch.
If matrixSize is not a compile time constant, then you should make it one. Facilities provided in later Fortran language revisions (modules) make it very easy to propagate such a runtime constant from a single point of definition to a point of use.
Note that conforming Fortran 77 is also conforming Fortran 90, and with very few exceptions, will also be conforming Fortran 2015.
If it is known at compile time, then you only need 1 version of this function. It seems like you just put each version of the function in its obj object file or library, and then link to the appropriate one.
If you meant to say it is known at runtime, but does not change over the course or an execution, then you could have 13 versions of the code, one for each size, and use on set of ifs to decide which to use.

C++ array to Halide Image (and back)

I'm getting started with Halide, and whilst I've grasped the basic tenets of its design, I'm struggling with the particulars (read: magic) required to efficiently schedule computations.
I've posted below a MWE of using Halide to copy an array from one location to another. I had assumed this would compile down to only a handful of instructions and take less than a microsecond to run. Instead, it produces 4000 lines of assembly and takes 40ms to run! Clearly, therefore, I have a significant hole in my understanding.
What is the canonical way of wrapping an existing array in a Halide::Image?
How should the function copy be scheduled to perform the copy efficiently?
Minimal working example
#include <Halide.h>
using namespace Halide;
void _copy(uint8_t* in_ptr, uint8_t* out_ptr, const int M, const int N) {
Image<uint8_t> in(Buffer(UInt(8), N, M, 0, 0, in_ptr));
Image<uint8_t> out(Buffer(UInt(8), N, M, 0, 0, out_ptr));
Var x,y;
Func copy;
copy(x,y) = in(x,y);
copy.realize(out);
}
int main(void) {
uint8_t in[10000], out[10000];
_copy(in, out, 100, 100);
}
Compilation Flags
clang++ -O3 -march=native -std=c++11 -Iinclude -Lbin -lHalide copy.cpp
Let me start with your second question: _copy takes a long time, because it needs to compile Halide code to x86 machine code. IIRC, Func caches the machine code, but since copy is local to _copy that cache cannot be reused. Anyways, scheduling copy is pretty simple because it's a pointwise operation: First, it would probably make sense to vectorize it. Second, it might make sense to parallelize it (depending on how much data there is). For example:
copy.vectorize(x, 32).parallel(y);
will vectorize along x with a vector size of 32 and parallelize along y. (I am making this up from memory, there might be some confusion about the correct names.) Of course, doing all this might also increase compile times...
There is no recipe for good scheduling. I do it by looking at the output of compile_to_lowered_stmt and profiling the code. I also use the AOT compilation provided by Halide::Generator, this makes sure that I only measure the runtime of the code and not the compile time.
Your other question was, how to wrap an existing array in a Halide::Image. I don't do that, mostly because I use AOT compilation. However, internally Halide uses a type called buffer_t for everything image related. There is also C++ wrapper called Halide::Buffer that makes using buffer_t a little easier, I think it can also be used in Func::realize instead of Halide::Image. The point is: If you understand buffer_t you can wrap almost everything into something digestible by Halide.
To emphasize the first thing Florian mentioned, which I think is the key point of misunderstanding here: you appear to be timing the compilation of the copy operation ("pipeline," in common Halide terms), not just its execution. Your code size estimate is presumably also for the whole binary resulting from copy.cpp, not just the code in the Halide-generated copy function (which won't actually even appear in the binary you're compiling with clang, since it is only constructed by JITing at runtime in this program).
You can observe the actual cost of your pipeline here by first calling copy.compile_jit() before realize (realize implicitly calls compile_jit the first time it is run, so it's not necessary, but it's valuable to factor apart the runtime from the compile overhead). You would then put your timer exclusively around realize.
If you actually want to pre-compile this (or any other) pipeline for static linking into your ultimate program, which is what it seems you might be expecting, what you really want to do is use Func::compile_to_file in one program to compile and emit the code (as copy.h and copy.o), and then link and call these in another program. Check out tutorial lesson 10 to see this in more detail:
https://github.com/halide/Halide/blob/master/tutorial/lesson_10_aot_compilation_generate.cpp https://github.com/halide/Halide/blob/master/tutorial/lesson_10_aot_compilation_run.cpp

Is it possible to convert all regular programming tasks to compile time using meta-programming?

I read about meta-programming, and found it was really interesting. For example, check to see if the number is prime, calculate fibonacci number...I'm curious about its practical usage, if we can convert all runtime solution to meta-programming, the the application would perform much better. Let's say to find max value of an array. We would take O( n ) at run time if it was not sorted. Is it possible to get O( 1 ) with meta-programing?
Thanks,
Chan
You can't because metaprogramming only works for inputs that are known at compile time. So you can have a metafunction that calculates a Fibonacci number given a constant known at compile time:
int value = Fibonacci<5>::Value;
But it won't work for values that are inputted by a user at runtime:
int input = GetUserInput();
int value = Fibonacci<input>::Value; // Does not compile
Sure, you can recompile the program every time you get new values, but that becomes impractical for non-trivial programs.
Keep in mind that metaprogramming in C++ is basically a "useful accidental abuse" of the way C++ handles templates. Template metaprogramming was definitely not what the C++ standards committee had in mind when creating the C++ standards prior to C++0x. You can only push the compiler so much until you get internal compiler errors (that has changed nowadays with newer compilers, but you still shouldn't go overboard).
There's an (advanced-level) book dedicated to C++ template metaprogramming if you want to see what they are really useful for.
If it ain't known when you hit the compile button, then it won't be solvable by meta-programming.
If you're talking about processing data known at compile-time (as opposed to known at run-time), then theoretically, yes.
In practice, no. Any non-trivial task quickly becomes a tangled nightmare of impenetrable template code, giving even more impenetrable error messages when they fail to compile. Furthermore, most C++ compilers can only tolerate a certain depth of template nesting before they explode.
Sure. Of course, you can't make any system calls, all users will need a compiler to run your program, user input will have to take the form of defining constant expressions, but yeah...if you really, really wanted to you could write just about any program in C++ template code so that it 'runs' during compilation rather than runtime.

What makes EXE's grow in size?

My executable was 364KB in size. It did not use a Vector2D class so I implemented one with overloaded operators.
I changed most of my code from
point.x = point2.x;
point.y = point2.y;
to
point = point2;
This resulted in removing nearly 1/3 of my lines of code and yet my exe is still 364KB. What exactly causes it to grow in size?
The compiler probably optimised your operator overload by inlining it. So it effectively compiles to the same code as your original example would. So you may have cut down a lot of lines of code by overloading the assignment operator, but when the compiler inlines, it takes the contents of your assignment operator and sticks it inline at the calling point.
Inlining is one of the ways an executable can grow in size. It's not the only way, as you can see in other answers.
What makes EXE’s grow in size?
External libraries, especially static libraries and debugging information, total size of your code, runtime library. More code, more libraries == larger exe.
To reduce size of exe, you need to process exe with gnu strip utility, get rid of all static libraries, get rid of C/C++ runtime libraries, disable all runtime checks and turn on compiler size optimizations. Working without CRT is a pain, but it is possible. Also there is a wcrt (alternative C runtime) library created for making small applications (by the way, it hasn't been updated/maintained during last 5 years).
The smallest exe that I was able create with msvc compiler is somewhere around 16 kilobytes. This was a windows application that displayed single window and required msvcrt.dll to run. I've modified it a bit, and turned it into practical joke that wipes out picture on monitor.
For impressive exe size reduction techniques, you may want to look at .kkrieger. It is a 3D first person shooter, 96 kilobytes total. The game has a large and detailed level, supports shaders, real-time shadows, etc. I.e. comparable with Saurbraten (see screenshots). The smallest working windows application (3d demo with music) I ever encountered was 4 kilobytes big, and used compression techniques and (probably) undocumented features (i.e. the fact that *.com executbale could unpack and launch win32 exe on windows xp)..
In most cases, size of *.exe shouldn't really bother you (I haven't seen a diskette for a few years), as long as it is reasonable (below 100 megabytes). For example of "unreasonable" file size see debug build of Qt 4 for mingw.
This resulted in removing nearly 1/3 of my lines of code and yet my exe is still 364KB.
Most likely it is caused by external libraries used by compiler, runtime checks, etc.
Also, this is an assignment operation. If you aren't using custom types for x (with copy constructor), "copy" operation is very likely to result in small number of operations - i.e. removing 1/3 of lines doesn't guarantee that your code will be 1/3 shorter.
If you want to see how much impact your modification made, you could "ask" compiler to produce asm listing for both versions of the program then compare results (manually or with diff). Or you could disasm/compare both versions of executable. BUt I'm certain that using GNU strip or removing extra libraries will have more effect than removing assignment operators.
What type is point? If it's two floats, then the compiler will implicitly do a member-by-member copy, which is the same thing you did before.
EDIT: Apparently some people in today's crowd didn't understand this answer and compensated by downvoting. So let me elaborate:
Lines of code have NO relation to the executable size. The source code tells the compiler what assembly line to create. One line of code can cause hundreds if not thousands of assembly instructions. This is particularly true in C++, where one line can cause implicit object construction, destruction, copying, etc.
In this particular case, I suppose that "point" is a class with two floats, so using the assignment operator will perform a member-by-member copy, i.e. it takes every member individually and copies it. Which is exactly the same thing he did before, except that now it's done implicitly. The resulting assembly (and thus executable size) is the same.
Executables are most often sized in 'pages' rather than discrete bytes.
I think this a good example why one shouldn't worry too much about code being too verbose if you have a good optimizing compiler. Instead always code clearly so that fellow programmers can read your code and leave the optimization to the compiler.
Some links to look into
http://www2.research.att.com/~bs/bs_faq.html#Hello-world
GCC C++ "Hello World" program -> .exe is 500kb big when compiled on Windows. How can I reduce its size?
http://www.catch22.net/tuts/minexe
As for Windows, lots of compiler options in VC++ may be activated like RTTI, exception handling, buffer checking, etc. that may add more behind the scenes to the overall size.
When you compile a c or c++ program into an executable, the compiler translates your code into machine code, and applying optimizations as it sees fit.
But simply, more code = more machine code to generate = more size to the executable.
Also, check if you have lot of static/global objects. This substantially increase your exe size if they are not zero initialized.
For example:
int temp[100] = {0};
int main()
{
}
size of the above program is 9140 bytes on my linux machine.
if I initialize temp array to 5, then the size will shoot up by around 400 bytes. The size of the below program on my linux machine is 9588.
int temp[100] = {5};
int main()
{
}
This is because, zero initialized global objects go into .bss segment, which ill be initialized at once during program startup. Where as non zero initialized objects contents will be embedded in the exe itself.