Test for a program I'm programming - c++

Hay!
I would like to create a test that can find the complexity (time & space) of the program.
function by function...
I thought of doing so with the library "time" and to count seconds while running the functions for a large number of "n".
Does anyone have a better idea? maybe it already exists? :)
Thanks!
Amihay

Looks like a perfectly reasonable approach, for the time complexity at least. Make sure that your program outputs in a useful format, for example CSV or tab separated, so that you can easily copy/load this into a spreadsheet.
Space complexity might be a bit more tricky to get reliably. For this, you might want to modify your functions so that they return a useful metric. For example, if the main data structure of your algorithm is a map of fixed elements, then returning the maximum size of the map during the run would give you enough information.

Write some tests and do performance profiling. Of course, you can write your own functions, but that is not the way how it is done. Good profiler will provide you with all kinds of information you can imagine.
Check out this tutorial on msnd about profiling.

Related

How can I compute another bit pattern with identical SHA-160 sum?

I want to test my application when it gets tricked by a false passed SHA-160 sum and thus would like to compute a change to the data being summed which results in the original SHA-160 sum again and thus would be missed. I am using the Botan library in C++ to compute the sum.
How can I compute a change to a bit stream that is around 1500 bits such that its SHA-160 is identical to the original?
The short answer is: you can't.
The long answer is: you can, but only with vast amounts of computation power. The entire purpose of hash algorithms is to make it hard to find collisions. If it were easy to find a collision, then there'd be little point in using the hash algorithm.
To solve your test problem, I suggest abstracting away the file-reading/hash-computing part of your application into a separate class, and then mocking it with a fake hash implementation in order to test the rest of the application.

How to generate good random seed for a random generator?

I certainly can't use the random generator for that. Currently I'm creating a CRC32 hash from unixtime()+microtime().
Are there any smarter methods than hashing time()+microtime() ?
I am not fully satisfied from the results though, I expected it to be more random, but I can see strong patterns in it, until I added more calls to MicroTime() but it gets a lot slower, so I'm looking for some optimal way of doing this.
This silly code generates the best output I could make so far, the calculations were necessary or I could see some patterns in the output:
starthash(crc32);
addtohash(crc32, MicroTime());
addtohash(crc32, time(NULL)); // 64bit
addtohash(crc32, MicroTime()/13.37f);
addtohash(crc32, (10.0f-MicroTime())*1337.0f);
addtohash(crc32, (11130.0f-MicroTime())/1313137.0f);
endhash(crc32);
MicroTime() returns microseconds elapsed from program start. I have overloaded the addtohash() to every possible type.
I would rather take non-library solutions, it's just ~10 lines of code probably anyways, I don't want to install huge library because of something I don't actually need that much, and I'm more interested in the code than just using it from a function call.
If in any doubt, get your seed from CryptGenRandom on Windows, or by reading from dev/random or dev/urandom on *NIX systems.
This might be overkill for your purposes, but unless it causes performance problems there's no point messing with low-entropy sources like the time.
It's unlikely to be underkill. And if you're writing code with a real need for high-quality secure random data, and didn't bother mentioning that in the question, well, you get what you deserve ;-)
you can check for lfsr & pseudorandom generators.. usually this is a hardwre solution but you can implement easily your own software lfsr

When should I use ASM calls?

I'm planning on writing a game with C++, and it will be extremely CPU-intensive (pathfinding,genetic algorithms, neural networks, ...)
So I've been thinking about how to tackle this situation best so that it would run smoothly.
(let this top section of this question be side information, I don't want it to restrict the main question, but it would be nice if you could give me side notes as well)
Is it worth it to learn how to work with ASM, so I can make ASM calls in C++,
can it give me a significant/notable performance advantage?
In what situations should I use it?
Almost never:
You only want to be using it once you've profiled your C++ code and have identified a particular section as a bottleneck.
And even then, you only want to do it once you've exhausted all C++ optimization options.
And even then, you only want to be using ASM for tight, inner loops.
And even then, it takes quite a lot of effort and skill to beat a C++ compiler on a modern platform.
If your not an experienced assembly programmer, I doubt you will be able to optimize assembly code better than your compiler.
Also note that assembly is not portable. If you decide to go this way, you will have to write different assembly for all the architectures you decide to support.
Short answer: it depends, most likely you won't need it.
Don't start optimizing prematurely. Write code that is also easy to read and to modify. Separate logical sections into modules. Write something that is easy to extend.
Do some profiling.
You can't tell where your bottlenecks are unless you profile your code. 99% of the time you won't get that much performance gain by writing asm. There's a high chance you might even worsen your performance. Optimizers nowadays are very good at what they do. If you do have a bottleneck, it will most probably be because of some poorly chosen algorithm or at least something that can be remedied at a high-level.
My suggestion is, even if you do learn asm, which is a good thing, don't do it just so you can optimize.
Profile profile profile....
A legitimate use case for going low-level (although sometimes a compiler can infer it for you) is to make use of SIMD instructions such as SSE. I would assume that at least some of the algorithms you mention will benefit from parallel processing.
However, you don't need to write actual assembly, instead you can simply use intrinsic functions. See, e.g. this.
Don't get ahead of yourself.
I've posted a sourceforge project showing how a simulation program was massively speeded up (over 700x).
This was not done by assuming in advance what needed to be made fast.
It was done by "profiling", which I put in quotes because the method I use is not to employ a profiler.
Rather I rely on random pausing, a method known and used to good effect by some programmers.
It proceeds through a series of iterations.
In each iteration a large source of time-consumption is identified and fixed, resulting in a certain speedup ratio.
As you proceed through multiple iterations, these speedup ratios multiply together (like compound interest).
That's how you get major speedup.
If, and only if, you get to a point where some code is taking a large fraction of time, and it doesn't contain any function calls, and you think you can write assembly code better than the compiler does, then go for it.
P.S. If you're wondering, the difference between using a profiler and random pausing is that profilers look for "bottlenecks", on the assumption that those are localized things. They look for routines or lines of code that are responsible for a large percent of overall time.
What they miss is problems that are diffuse.
For example, you could have 100 routines, each taking 1% of time.
That is, no bottlenecks.
However, there could be an activity being done within many or all of those routines, accounting for 1/3 of the time, that could be done better or not at all.
Random pausing will see that activity with a small number of samples, because you don't summarize, you examine the samples.
In other words, if you took 9 samples, on average you would notice the activity on 3 of them.
That tells you it's big.
So you can fix it and get your 3/2 speedup ratio.
"To understand recursion, you must first understand recursion." That quote comes to mind when I consider my response to your question, which is "until you understand when to use assembly, you should never use assembly." After you have completely implemented your soution, extensively profiled its performance and determined precise bottlenecks, and experimented with several alternative solutions, then you can begin to consider using assembly. If you code a single line of assembly before you have a working and extensively profiled program, you have made a mistake.
If you need to ask than you don't need it.

How to make large calculations program faster

I'm implementing a compression algorithm. Thing is, it is taking a second for a 20 Kib files, so that's not acceptable. I think it's slow because of the calculations.
I need suggestions on how to make it faster. I have some tips already, like shifting bits instead of multiplying, but I really want to be sure of which changes actually help because of the complexity of the program. I also accept suggestions concerning compiler options, I've heard there is a way to make the program do faster mathematical calculations.
Common operations are:
pow(...) function of math library
large number % 2
large number multiplying
Edit: the program has no floating point numbers
The question of how to make things faster should not be asked here to other people, but rather in your environment to a profiler. Use the profiler to determine where most of the time is spent, and that will hint you into which operations need to be improved, then if you don't know how to do it, ask about specific operations. It is almost impossible to say what you need to change without knowing what your original code is, and the question does not provide enough information: pow(...) function: what are the arguments to the function, is the exponent fixed? how much precision do you need? can you change the function for something that will yield a similar result? large number: how large is large in large number? what is number in this context? integers? floating point?
Your question is very broad, without enough informaiton to give you concrete advise, we have to do with a general roadmap.
What platform, what compiler? What is "large number"? What have you done already, what do you know about optimization?
Test a release build with optimization (/Ox /LTCG in Visual C++, -O3 IIRC for gcc)
Measure where time is spent - disk access, or your actual compression routine?
Is there a better algorithm, and code flow? The fastest operation is the one not executed.
for 20K files, memory working set should not be an issue (unless your copmpression requries large data structures), so so code optimization are the next step indeed
a modern compiler implements a lot of optimizations already, e.g replacing a division by a power-of-two constant with a bit shift.
pow is very slow for native integers
if your code is well written, you may try to post it, maybe someone's up to the challenge.
Hints :-
1) modulo 2 works only on the last bit.
2) power functions can be implemented in logn time, where n is the power. (Math library should be fast enough though). Also for fast power you may check this out
If nothing works, just check if there exists some fast algorithm.

Are there any tools for tracking down bloat in C++?

A carelessly written template here, some excessive inlining there - it's all too easy to write bloated code in C++. In principle, refactoring to reduce that bloat isn't too hard. The problem is tracing the worst offending templates and inlines - tracing those items that are causing real bloat in real programs.
With that in mind, and because I'm certain that my libraries are a bit more bloat-prone than they should be, I was wondering if there's any tools that can track down those worst offenders automatically - i.e. identify those items that contribute most (including all their repeated instantiations and calls) to the size of a particular target.
I'm not much interested in performance at this point - it's all about the executable file size.
Are there any tools for this job, usable on Windows, and fitting with either MinGW GCC or Visual Studio?
EDIT - some context
I have a set of multiway-tree templates that act as replacements for the red-black tree standard containers. They are written as wrappers around non-typesafe non-template code, but they were also written a long time ago and as an "will better cache friendliness boost real performance" experiment. The point being, they weren't really written for long-term use.
Because they support some handy tricks, though (search based on custom comparisons/partial keys, efficient subscripted access, search for smallest unused key) they ended up being in use just about everywhere in my code. These days, I hardly ever use std::map.
Layered on top of those, I have some more complex containers, such as two-way maps. On top of those, I have tree and digraph classes. On top of those...
Using map files, I could track down whether non-inline template methods are causing bloat. That's just a matter of finding all the instantiations of a particular method and adding the sizes. But what about unwisely inlined methods? The templates were, after all, meant to be thin wrappers around non-template code, but historically my ability to judge whether something should be inlined or not hasn't been very reliable. The bloat impact of those template inlines isn't so easy to measure.
I have some idea which methods are heavily used, but that's the well-known opimization-without-profiling mistake.
Check out Symbol Sort. I used it a while back to figure out why our installer had grown by a factor of 4 in six months (it turns out the answer was static linking of the C runtime and libxml2).
Map file analysis
I have seen a problem like this some time ago, and I ended up writing a custom tool which analysed map file (Visual Studio linker can be instructed to produce one). The tool output was:
list of function sorted descending by code size, listing only first N
list of source files sorted descending by code size, listing only first N
Parsing map file is relatively easy (function code size can be computed as a difference between current and following line), the hardest part is probably handling mangled names in a reasonable way. You might find some ready to use libraries for both of this, I did it a few years ago and I do not know the current situation.
Here is a short excerpt from a map file, so that you know what to expect:
Address Publics by Value Rva+Base Lib:Object
0001:0023cbb4 ?ApplyScheme#Input##QAEXPBVParamEntry###Z 0063dbb4 f mainInput.obj
0001:0023cea1 ?InitKeys#Input##QAEXXZ 0063dea1 f mainInput.obj
0001:0023cf47 ?LoadKeys#Input##QAEXABVParamEntry###Z 0063df47 f mainInput.obj
Symbol Sort
As posted in Ben Staub's answer, Symbol Sort is a ready to use command line utility (comes with a complete C# source) which does all of this, with the only difference of not analysing map files, but rather pdb/exe files.
So what I'm reading based on your question and your comments is that the library is not actually too large.
The only tool you need to determine this is a command shell, or Windows File explorer. Look at the file size. Is it so big that it causes real actual problems? (Unacceptable download times, won't fit in memory on the target platform, that kind of thing)?
If not, then you should worry about code readability and maintainability and nothing else. And the tool for that is your eyes. Read the code, and take the actions needed to make it more readable if necessary.
If you can point to an actual reason why the executable size is a problem, please edit that into your question, as it is important context.
However, assuming the file size is actually a problem:
Inlined functions are generally not a problem, because the compiler, and no one else, chooses which functions to inline. Simply marking something inline does not inline the actual generated code. The compiler inlines if it determines the trade-off between larger code and less indirection to be worth it. If a function is called often, it will not be inlined, because that would dramatically affect code size, which would hurt performance.
If you're worried that inlined functions cause code bloat, simply compile with the "optimize for size" flag. Then the compiler will restrict inlining to the cases where it doesn't affect executable size noticeably.
For finding out which symbols are biggest, parse the map file as #Suma suggested.
But really, you said it yourself when you mentioned "the well-known opimization-without-profiling mistake."
The very first act of profiling you need to do is to ask is the executable size actually a problem? In the comments you said that you "have a feeling", which, in a profiling context is useless, and can be translated into "no, the executable size is not a problem".
Profile. Gather data and identify trouble spots. Before worrying about how to bring down the executable size, find out what the executable size is, and identify whether or not that is actually a problem. You haven't done that yet. You read in a book that "code bloat is a problem in C++", and so you assume that code bloat is a problem in your program. but is it? Why? How do you determine that it is?
http://www.sikorskiy.net/prj/amap/index.html
This is wonderful object file in lib/library size analysis GUI tool generated from Visual studio compiler map file . this tool analyses and generates report from map file . you can do filtering also and it dynamically display size . just input the map file to this tool and this tool will list what function are occupying which size the given map fiel generated by dll/exe check the screenshots of it in above file/ you can sort on size also.
Basically, you are looking for costly things that you don't need. Suppose there is some category of functions that you don't need taking some large percent of the space, like 20%. Then if you picked 20 random bytes out of the image size, on the average 4 of them (20 * 20%) will be in that category, and you will be able to see them. So basically, you take those samples, look at them, and if you see an obvious pattern of functions that you don't really need, then remove them. Then do it again because other categories of routines that used less space are now taking a higher percentage.
So I agree with Suma that parsing the map file is a good start. Then I would write a routine to walk through it, and every 5% of the way (space-wise) print the routine I am in. That way I get 20 samples. Often I find that a large chunk of object space results from a very small number (like 1) of lines of source code that I could easily have done another way.
You are also worried about too much inlining making functions larger than they could be. To figure that out, I would take each of those sample, and since it represents a specific address in a specific function, I would trace that back to the line of code it is in. That way, I can tell if it is in an expanded function. This is a bit of work, but doable.
A similar problem is how to find tumors when disks get full. The same idea there is to walk the directory tree, adding up the file sizes, Then you walk it again, and as you pass each 5% point, you print out the path of the file you are in. This tells you not only if you have large files, it tells you if you have large numbers of small files, and it doesn't matter how deeply they are buried or how widely they are scattered. When you clean out one category of files that you don't need, you can do it again to get the next category, and so on.
Good luck.
Your question seems to tend towards run-time rather than compile-time bloat.
However, if compile-time bloat (plus binary bloat resulting from inefficient compilation) is relevant, then I have to mention clang tool IWYU.
Since IWYU likely will manage to toss quite a number of #include:s in your code areas, this should also manage to reduce binary bloat. At least for my own environment I can certainly confirm a useful reduction in build time.