I've written a small program to calculate prime numbers using the naive division algorithm. In order to improve the performance, I thought it should only check the divisibility based on previously detected primes less than equal to the number's square root.
In order to do that, I need to keep track of the primes. I've implemented it using dynamic arrays. (e.g. Using new and delete). Should I use std::vector instead? Which is better in terms of performance? (Maintenance is not an issue.)
Any help would be appreciated. 😊
The ideal answer:
How should any of us know? It depends on your compiler, your OS, your architecture, your standard library implementation, the alignment of the planets...
Benchmark it. Possibly with this. (Haven't used it, but it seems simple enough to use.)
The practical answer:
Use std::vector. Every new and delete you make is a chance for a memory leak, or a double-delete, or otherwise forgetting to do something. std::vector basically does this under the hood anyway. You're more likely to get a sizeable performance boost by maxing out your optimization flags (if you're using gcc, try -Ofast and -march=native).
Also:
Maintenance is not an issue.
Doubt it. Trust me on this one. If nothing else, at least comment your code (but that's another can of worms).
For your purpose, vector might be better, so that you don't need to worry about memory management (e.g. grow your array size and copy previous results), or reserve too much memory to store your results.
Related
I want to make a container with few elements and I will be just checking whether an element is part of that set or not.
I know a vector would not be the appropriate container if the set is big enough since each research would be worst-case O(n) and there are better options that use a hash function or binary trees.
However I was wondering what happens if my set has few elements (eg just 5) and I know it in advance, is it worth it to implement the container as a structure having a hash function?
Maybe if the set is not big enough the overhead introduced by having to apply the hash function is greater than having to iterate through 5 elements.
For example in C++ using std::unordered_set instead of std::vector.
As always, thank you
There are many factors affecting the point where the std::vector falls behind other approaches. See std::vector faster than std::unordered_set? and Performance of vector sort/unique/erase vs. copy to unordered_set for some of the reasons why this is the case. As a consequence, any calculation of this point would have to be rather involved and complicated. The most expedient way to find this point is performance testing.
Keep in mind that some of the factors are based on the hardware being used. So not only do you need performance testing on your development machine, but also performance testing on "typical" end-user machines. Your development machine can give you a gut check, though, (as can an online tool like Quick Bench) letting you know that you are not even in the ballpark yet. (The intuition of individual programmers is notoriously unreliable. Some people see 100 as a big number and worry about performance; others don't worry until numbers hit the billions. Either of these people would be blown away by the others' view.)
Given the difficulty in determining the point where std::vector falls behind, I think this is a good place to give a reminder that premature optimization is usually a waste of time. Investigating this performance out of curiosity is fine, but until this is identified as a performance bottleneck, don't hold up work on more important aspects of your project for this. Choose the approach that fits your code best.
That being said, I personally would assume the break point is well, well over 10 items. So for the 5-element collection of the question, I would be inclined to use a vector and not look back.
What's the need to go for defining and implementing data structures (e.g. stack) ourselves if they are already available in C++ STL?
What are the differences between the two implementations?
First, implementing by your own an existing data structure is a useful exercise. You understand better what it does (so you can understand better what the standard containers do). In particular, you understand better why time complexity is so important.
Then, there is a quality of implementation issue. The standard implementation might not be suitable for you.
Let me give an example. Indeed, std::stack is implementing a stack. It is a general-purpose implementation. Have you measured sizeof(std::stack<char>)? Have you benchmarked it, in the case of a million of stacks of 3.2 elements on average with a Poisson distribution?
Perhaps in your case, you happen to know that you have millions of stacks of char-s (never NUL), and that 99% of them have less than 4 elements. With that additional knowledge, you probably should be able to implement something "better" than what the standard C++ stack provides. So std::stack<char> would work, but given that extra knowledge you'll be able to implement it differently. You still (for readability and maintenance) would use the same methods as in std::stack<char> - so your WeirdSmallStackOfChar would have a push method, etc. If (later during the project) you realize or that bigger stack might be useful (e.g. in 1% of cases) you'll reimplement your stack differently (e.g. if your code base grow to a million lines of C++ and you realize that you have quite often bigger stacks, you might "remove" your WeirdSmallStackOfChar class and add typedef std::stack<char> WeirdSmallStackOfChar; ....)
If you happen to know that all your stacks have less than 4 char-s and that \0 is not valid in them, representing such "stack"-s as a char w[4] field is probably the wisest approach. It is fast and easy to code.
So, if performance and memory space matters, you might perhaps code something as weird as
class MyWeirdStackOfChars {
bool small;
union {
std::stack<char>* bigstack;
char smallstack[4];
}
Of course, that is very incomplete. When small is true your implementation uses smallstack. For the 1% case where it is false, your implemention uses bigstack. The rest of MyWeirdStackOfChars is left as an exercise (not that easy) to the reader. Don't forget to follow the rule of five.
Ok, maybe the above example is not convincing. But what about std::map<int,double>? You might have millions of them, and you might know that 99.5% of them are smaller than 5. You obviously could optimize for that case. It is highly probable that representing small maps by an array of pairs of int & double is more efficient both in terms of memory and in terms of CPU time.
Sometimes, you even know that all your maps have less than 16 entries (and std::map<int,double> don't know that) and that the key is never 0. Then you might represent them differently. In that case, I guess that I am able to implement something much more efficient than what std::map<int,double> provides (probably, because of cache effects, an array of 16 entries with an int and a double is the fastest).
That is why any developer should know the classical algorithms (and have read some Introduction to Algorithms), even if in many cases he would use existing containers. Be also aware of the as-if rule.
STL implementation of Data Structures is not perfect for every possible use case.
I like the example of hash tables. I have been using STL implementation for a while, but I use it mainly for Competitive Programming contests.
Imagine that you are Google and you have billions of dollars in resources destined to storing and accessing hash tables. You would probably like to have the best possible implementation for the company use cases, since it will save resources and make search faster in general.
Oh, and I forgot to mention that you also have some of the best engineers on the planet working for you (:
(This video is made by Kulukundis talking about the new hash table made by his team at Google )
https://www.youtube.com/watch?v=ncHmEUmJZf4
Some other reasons that justify implementing your own version of Data Structures:
Test your understanding of a specific structure.
Customize part of the structure to some peculiar use case.
Seek better performance than STL for a specific data structure.
Hating STL errors.
Benchmarking STL against some simple implementation.
So basically, what I need is to have 28,800 values which can be accessed by an index and can all be set to true or false. Using an array of bools or integers isn't an option, because the size needs to be set during rumtime using a parameter. Using a vector is way too slow and memory intensive. I'm new to C++ and therefore don't have any idea on how to solve this, can anyone help?
EDIT: Thank you to all who commented! Like I said, Im new to C++ programming and your answers really helped me to understand the functionality behind vectors.
So, after everyone said that vector isn't slow I checked again and it turns out that my program was running so slow because of another bug I had while filling the vector. But especially midor's and Some programmer dude's answers helped me to make the program run a bit faster than before, so thanks!
Using a vector is way too slow and memory intensive.
C++ specializes std::vector<bool> so it uses only as much memory as it needs. One bit per "flag" (+ bookkeeping overhead of course).
You could only optimize that implementation if you knew its size a priori (which you don't according to your question), or if you know that the bitmap will only contain very few set bits (e.g. 1bit in a 50'000, but you'd need to measure if a more complex implementation would be worth it). For sparse bitmaps an std::unordered_set<std::uint32_t> that stores the set bits could be an option.
But 28'800 is a very small number, so don't waste your time on optimizations. You won't get any benefits from it.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
My question is mostly about STL than the rest of the C++ that can be compared (I guess) to be as much fast as C a long as classes aren't used at every corner.
STL is standard for games and in engines like OGRE3D, but I was wondering that if STL's features are nice to use, the problem is while I don't really know how they work, I should know first what features can cause serious hogs before using them.
I'm very excited to begin that game programming school, and apparently there is no way I am going to not use those advanced features.
Using STL tends to generate as good if not more efficient code than hand written code for many cases.
Use a profiler to see where you have problems.
Even where C++ STL might perform worse its code is likely to be less error prone. So only write code if the profiler shows there is an issue
1) Debug builds. Severaly slow down many stl containers due to excessive error checking. At least on Microsoft compilers.
2) Excessive dynamic memory allocation. If you have routine that contains std::vector within it AND if you'll call it few thousand times per frame, it will be very slow and bottleneck will be somewhere within operator new or another memory allocation routine. If you'll turn this vector into some kind of static buffer (so you won't have to recreate it every time), it will be much faster. Memory allocation is slow. If you have a buffer, it is normally better to reuse it instead of making a new one during each call.
3) Using inefficient algorithms. For example, using linear search instead of binary search, using wrong sort algorithms (for example, quick sort, heap sort are faster than bubble sort on unsorted data but insertion sort can be faster than quicksort on partially sorted data). Searching instead of using std::map, and so on.
4) Using wrong kind of container. For example, std::vector isn't suitable for inserting elements at random places. std::deque, while is comparable with std::vector (random access), allows fast push_front and push_back, can be 10 times slower than std::vector if you subtract two iterators (on MSVC, again).
Unless you're building the entire engine from scratch, you're not going to notice a difference in using or not using C++ classes or STL. In general, most of the time the CPU is going to be running code that's not even written by you anyway. Plus the overhead imposed by any scripting languages you implement (Lua, Python, etc) will eclipse any small performance penalty you may incur by using C++/STL.
In short, don't worry about it. Writing good OOP code is better than trying to write super-fast code from the get-go. "Premature optimization is the root of much programming evil."
Actually, you can use classes everywhere, and still get as good of performance as C (and often better performance than typical C).
Most STL is designed to do most of its "tricky" parts at compile time, so the run-time performance is excellent. The main thing to look out for (especially if you write for things like game consoles or mobile phones, that have less capable graphics hardware) is structuring your data to work well with a cache.
Here are, in my opinion, the key points to writing performant C++/STL code:
Learn what memory allocation strategies are for each STL container,
Learn what algorithms work best with what iterator categories,
Learn run-time polymorphism vs compile-time polymorphism.
Good starting points are:
SGI STL Programmer's Guide,
STL Reference,
The Definitive C++ Book Guide and List (SO).
I recommend Effective STL by Scott Myers. The biggest performance hog of STL, is the lack of understanding of it. Please learn it well!
Also see Optimizing software in C++ by Agner Fog for C++ specific performance related topics.
The other answers are all accurate: the problems with STL and game programming are mostly with misuse.
My general approach is the following:
1. Write it with STL and carefully choose the appropriate algorithm, container, etc.
2. Profile for bottlenecks.
3. If it's STL causing the problem, replace it.
Optimizing too early can really slow you down and only cause more problems later.
Of course, it depends on platform as well. Sometimes, you have to write all of your own stuff because you simply can't afford the extra CPU/RAM overhead of STL.
Stroustrup talks about the design and performance of the STL in general, and specifically about the different performance characteristics of the various different container types, in his book The C++ Programming Language (3rd edition).
I don't have experience in gaming, but Electronic Arts developed their own (non-conforming) implementation of the STL. There is an extensive article explaining the motives and design of the library here.
Note that in many cases, you will be better off by using the STL that comes with your implementation, then measure, then measure again and make sure that you understand what is going on and what is really a performance problem. Then, only then, and if the problem is within the STL (and not in how the STL is used), I would use unstandard libraries.
These rarely become performance hogs if used correctly. A profiler should always be your primary means of finding bottlenecks in your code short of obvious algorithmic inefficiencies (in which case it's still good practice to use a profiler to make sure if you are on a tight deadline).
There are some legitimate efficiency concerns, however, if you do come across STL usage to show up as a profiler hotspot.
vector<ExpensiveElement> v;
// insert a lot of elements to v
v.push_back(ExpensiveElement(...) );
This push_back immediately above has the worst case scenario of having to linearly copy all the ExpensiveElements inserted so far (if we've exceeded the current capacity). In the best case scenario, we still have to copy ExpensiveElement one time unnecessarily.
We can mitigate the issue by making vector store shared_ptr, but now we pay for two additional heap allocations per ExpensiveElement inserted (one for the reference counter in boost::shared_ptr, and one for ExpensiveElement) along with the overhead of a pointer indirection each time we want to access ExpensiveElement stored in the vector.
To mitigate the memory allocation/deallocation overhead (generally more likely to be a hotspot than an additional level of indirection), we can implement a fast memory allocator for ExpensiveElement. Nevertheless, imagine if std::vector provided an alloc_back method:
new (v.alloc_back()) ExpensiveElement(...);
This would avoid any copy ctor overhead, but is unsafe and prone to abuse. Nevertheless, that's exactly what I did with our vector-clone in response to hotspots. Note that I work in raytracing which is a field where performance is often one of the highest measures of quality (other than correctness) and we profile our code daily so it's not like we just decided out of the blue that vector wasn't efficient enough for our purposes.
We also had no choice but to implement a vector clone because we provide a software development kit where other people's std::vector implementations may be incompatible with our own. I don't want to give you the wrong idea: explore these kinds of solutions only if your profiler sessions really call for it!
Another common source of inefficiency is when using linked STL containers like set, multiset, map, multimap, and list. However, that's not necessarily their fault, but rather the fault of the default std::allocator being used. These perform a separate memory allocation/deallocation per node so the default allocator can be pretty slow for these purposes, especially across multiple threads (thread contention, better off with per-thread memory pools). You can really get a speed boost by writing your own memory allocator (though this is not a trivial thing to do and don't forget alignment if you do).
I can't emphasize enough that these kinds of optimizations should only be applied in response to the profiler. You'll make your code harder to use and maintain this way, so you should be doing it only in exchange for a solid, demonstrable boost in your application's performance.
This book covers what issues you face when using C++ in games.
I need to write a program (a project for university) that solves (approx) an NP-hard problem.
It is a variation of Linear ordering problems.
In general, I will have very large inputs (as Graphs) and will try to find the best solution
(based on a function that will 'rate' each solution)
Will there be a difference if I write this in C-style code (one main, and functions)
or build a Solver class, create an instance and invoke a 'run' method from a main (similar to Java)
Also, there will be alot of floating point math going on in each iteration.
Thanks!
No.
The biggest performance gains/flaws will be on the algorithm you implement, and how much unneeded work you perform (Unneeded work could be everything from recalculating a previous value that could have been cached, to using too many malloc/free's vs using memory pools,
passing large immutable data by value instead of reference)
The biggest roadblock to optimal code is no longer the language (for properly compiled languages), but rather the programmer.
No, unless you are using virtual functions.
Edit: If you have a case where you need run-time dynamism, then yes, virtual functions are as fast or faster than a manually constructed if-else statement. However, if you drop in the virtual keyword in front of a method, but you don't actually need the polymorphism, then you will be paying an unnecessary overhead. The compiler won't optimize it away at compile time. I am just pointing this out because it's one of the features of C++ that breaks the 'zero-overhead principle` (quoting Stroustrup).
As a side note, since you mention heavy use of fp math:
The following gcc flags may help you speed things up (I'm sure there are equivalent ones for visual C++, but I don't use it): -mfpmath=sse, -ffast-math and -mrecip (The last two are 'slightly dangerous', meaning that they could give you weird results in edge cases in exchange for the speed. The first one reduces precision by a bit -- you have 64-bit doubles instead of 80-bit ones -- but this extra precision is often unneeded.) These flags would work equally well for C and C++ compilers.
Depending on your processor, you may also find that simulating true INFINITY with a large-but-not-infinite value gives you a good speed boost. This is because true INFINITY has to be handled as a special case by the processor.
Rule of thumb - do not optimize until you know what to optimize. So start with C++ and have some working prototype. Then profile it and rewrite bottle necks in assembly. But as others noted, chosen algorithm will have much greater impact than the language.
When speaking of performance, anything you can do in C can be done in C++.
For example, virtual methods are known to be “slow”, but if it's really a problem, you can still resort to C idioms.
C++ also brings templates, which lead to better performance than using void* for generic programming.
The Solver class will be constructed once, I take it, and the run method executed once... in that kind of environment, you won't see a difference. Instead, here are things to watch out for:
Memory management is hellishly expensive. If you need to do lots of little malloc()s, the operating system will eat your lunch. Make a determined effort to re-use whatever data structures you create if you know you'll be doing the same kind of thing again soon!
Instantiating classes generally means... allocating memory! Again, there's practically no cost for instantiating a handful of objects and re-using them. But beware of creating objects only to tear them down and rebuild them soon after!
Choose the right flavor of floating point for your architecture, insofar as the problem permits. It's possible that double will end up being faster than float, although it will need more memory. You should experiment to fine-tune this. Ideally, you'll use a #define or typedef to specify the type so you can change it easily in one place.
Integer calculations are probably faster than floating point. Depending on the numeric range of your data, you may also consider doing it with integers treated as fixed-point decimals. If you need 3 decimal places, you could use ints and just consider them "milli-somethings". You'll have to remember to shift decimals after division and multiplication... but no big deal. If you use any math functions beyond the basic arithmetic, of course, that would of course kill this possibility.
Since both are compiled, and the compilers now are very good at how to handle C++, I think the only problem would come from how well optimized your code is. I think it would be easier to write slower code in C++, but that depends on which style your model fits into best.
When it comes down to it, I doubt there will be any real difference, assuming both are well-written, any libraries you use, how well written they are, if you are measuring on the same computer.
Function call vs. member function call overhead is unlikely to be the limiting factor, compared to file input and the algorithm itself. C++ iostreams are not necessarily super high speed. C has 'restrict' if you're really optimizing, in C++ it's easier to inline function calls. Overall, C++ offers more options for organizing your code clearly, but if it's not a big program, or you're just going to write it in a similar manner whether it's C or C++, then the portability of C libraries becomes more important.
As long as you don't use any virtual functions etc. you won't note any considerable performance differences. Early C++ was compiled to C, so as long as you know the pinpoints where this creates any considerable overhead (such as with virtual functions) you can clearly calculate for the differences.
In addition I want to note that using C++ can give you a lot to gain if you use the STL and Boost Libraries. Especially the STL provides very efficient and proven implementations of the most important data structures and algorithms, so you can save a lot of development time.
Effectively it also depends on the compiler you will be using and how it will optimize the code.
first, writing in C++ doesn't imply using OOP, look at the STL algorithms.
second, C++ can be even slightly faster at runtime (the compilation times can be terrible compared to C, but that's because modern C++ tends to rely heavily on abstractions that tax the compiler).
edit: alright, see Bjarne Stroustrup's discussion of qsort and std::sort, and the article that FAQ mentions (Learning Standard C++ as a New Language), where he shows that C++-style code can be not only shorter and more readable (because of higher abstractions), but also somewhat faster.
Another aspect:
C++ templates can be an excellent tool to generate type-specific /
optimized code variations.
For example, C qsort requires a function call to the comparator, whereas std::sort can inline the functor passed. This can make a significant difference when compare and swap themselves are cheap.
Note that you could generate "custom qsorts" optimized for various types with a barrage of defines or a code generator, or by hand - you could do these optimizations in C, too, but at much higher cost.
(It's not a general weapon, templates help only in sepcific scenarios - usually a single algorithm applied to different data types or with differing small pieces of code injected.)
Good answers. I would put it like this:
Make the algorithm as efficient as possible in terms of its formal structure.
C++ will be as fast as C, except that it will tempt you to do dumb things, like constructing objects that you don't have to, so don't take the bait. Things like STL container classes and iterators may look like the latest-and-greatest thing, but they will kill you in a hotspot.
Even so, single-step it at the disassembly level. You should see it very directly working on your problem. If it is spending lots of cycles getting in and out of routines, try some in-lining (or macros). If it is wandering off into memory allocation and freeing, for much of the time, put a stop to that. If it's got inner loops where the loop overhead is a large percentage, try unrolling the loop.
That's how you can make it about as fast as possible.
I would go with C++ definitely. If you are careful about your design and avoid creating heavy objects inside hotspots you should not see any performance difference but the code is going to be much simpler to understand, maintain, and expand.
Use templates and classes judiciously. avoid unnecessary object creation by passing objects by reference. Avoid excessive memory allocation, if needed, allocate memory in advance of hotspots. Use restrict keyword on memory pointers to tell compiler whenever pointers overlap or not.
As far as optimization, pay careful attention to memory alignment. Assuming you are working on Intel processor, you can make use of vector instructions, provided you tell the compiler through pragma's about your memory alignment and aliased pointers. you can also use vector instructions directly via intrinsics.
you can also automatically create hotspot code using templates and let compiler optimize it out if you have things like short loops of different sizes. To find out performance and to drill down to your bottlenecks, Intel vtune or oprofile are extremely helpful.
hope that helps
I do some DSP coding, where it still pays off to go to assembly language sometimes. I'd say use C or C++, either one, and be prepared to go to assembly language when you need to, especially to exploit SIMD instructions.