I've been asked by a friend to help them improve the performance of their code, I'm not very well versed in C++, with most of my training in Java. The program is designed to find optimal values for a set of data, and uses a fairly blunt "try everything" method, which obviously is fairly slow to run. My solution was to replace the outer most loop of the main process.
However: I am very unfamiliar with this tool, as I've never used it before, and can't tell if I've successfully replicated the original loop's behavior with the new one I've written. Or even if my parameters are correctly formatted. I've searched a good few times but can't seem to find a concise explanation for how to format these as opposed to "normal" for loops.
Additional inclusions (obviously):
include "tbb/parallel_for.h"
The original loop:
for(alpha=30; alpha < 101; alpha= alpha+4) {
The new one:
parallel_for (alpha(30); 101; & {
The example I'm working from:
example:
for (size_t i = 0; i < size; i++)
Example:
parallel_for (size_t(0), size, [&](size_t i)
Sadly; I can't simply test the code, since I haven't the entire program, merely the .cpp file.
Related
I have encounter a horrible situation.
I usually use visual code to edit my code, also compile and execute in it(F5).
But I found vscode is too smart or ignore some warning message for me. And
output the right answer, which also work fine in Ideone. But in window cmd or dev C++ my code can't output anything, just return a big number.
And I find some situation will occur the thing I mention above.
The code like this
for (i = 0; i < g.size(); i++)
{
int source;
int dest;
int minWeight = 999;
for (j = 0; i < g[j].size(); j++)
{
// no edge, come to next condition
if (!g[i][j])
continue;
if (g[i][j] < minWeight)
{
source = i;
dest = j;
minWeight = g[i][j];
}
}
if
updateGroup(index, index[source], index[dest]);
else
updateGroup(index, index[source], index[dest]);
}
You may found that the second for loops have wrong condition statement, it should change
j = 0; i < g[j].size(); j++ to j = 0; j < g[i].size(); j++
So I wonder to know
Are there any way let my vscode more strict?
Why it still can output right answer in vscode and ideone?
How to avoid or be easier to found where my code wrong when this kind of no message error?
Really hope someone can help me, and appreciate all of your suggestion!!
There is no way for the compiler or computer to read your mind and guess what you meant to write instead of what you really did mean.
Even when this mistake results in a bug, it cannot know that you did not intend to write this, or that you meant to write some other specific thing instead.
Even when this bug results in your program having undefined behaviour, it is not possible to detect many cases of undefined behaviour, and it is not worthwhile for a compiler author to attempt to write code to do this, because it's too hard and not useful enough. Even if they did, the compiler could still not guess what you meant instead.
Remember, loops like this don't have to check or increment the same variable that you declared in the preamble; that's just a common pattern (now superseded by the safer ranged-for statement). There's nothing inherently wrong with having a loop that increments i but checks j.
Ultimately, the solution to this problem is to write tests for your code, which is why many organisations have dedicated Quality Assurance teams to search for bugs, and why you should already be testing your code before committing it to your project.
Remember to concentrate and pay close attention and read your code, and eventually such typos will become less common in your work. Of course once in a while you will write a bug, and your tests will catch it. Sometimes your tests won't catch it, which is when your customers will eventually notice it and raise a complaint. Then, you release a new version that fixes the bug.
This is all totally normal software development practice. That's what makes it fun! 😊
1) Crank up your compiler warnings as high as they will go.
2) Use multiple different compilers (they all warn about different things).
3) Know the details of the language well (a multi year effort) and be really, really careful about the code you write.
4) Write (and regularly run) lots of tests.
5) Use tools like sanitizers, fuzzers, linters, static code analyzers etc. to help catch bugs.
6) Build and run your code on multiple platforms to keep it portable and find bugs exposed by different environments/implementations.
I have done my best and read a lot of Q&As on SO.SE, but I haven't found an answer to my particular question. Most for-loop and break related question refer to nested loops, while I am concerned with performance.
I want to know if using a break inside a for-loop has an impact on the performance of my C++ code (assuming the break gets almost never called). And if it has, I would also like to know tentatively how big the penalization is.
I am quite suspicions that it does indeed impact performance (although I do not know how much). So I wanted to ask you. My reasoning goes as follows:
Independently of the extra code for the conditional statements that
trigger the break (like an if), it necessarily ads additional
instructions to my loop.
Further, it probably also messes around when my compiler tries to
unfold the for-loop, as it no longer knows the number of iterations
that will run at compile time, effectively rendering it into a
while-loop.
Therefore, I suspect it does have a performance impact, which could be
considerable for very fast and tight loops.
So this takes me to a follow-up question. Is a for-loop & break performance-wise equal to a while-loop? Like in the following snippet, where we assume that checkCondition() evaluates 99.9% of the time as true. Do I loose the performance advantage of the for-loop?
// USING WHILE
int i = 100;
while( i-- && checkCondition())
{
// do stuff
}
// USING FOR
for(int i=100; i; --i)
{
if(checkCondition()) {
// do stuff
} else {
break;
}
}
I have tried it on my computer, but I get the same execution time. And being wary of the compiler and its optimization voodoo, I wanted to know the conceptual answer.
EDIT:
Note that I have measured the execution time of both versions in my complete code, without any real difference. Also, I do not trust compiling with -s (which I usually do) for this matter, as I am not interested in the particular result of my compiler. I am rather interested in the concept itself (in an academic sense) as I am not sure if I got this completely right :)
The principal answer is to avoid spending time on similar micro optimizations until you have verified that such condition evaluation is a bottleneck.
The real answer is that CPU have powerful branch prediction circuits which empirically work really well.
What will happen is that your CPU will choose if the branch is going to be taken or not and execute the code as if the if condition is not even present. Of course this relies on multiple assumptions, like not having side effects on the condition calculation (so that part of the body loop depends on it) and that that condition will always evaluate to false up to a certain point in which it will become true and stop the loop.
Some compilers also allow you to specify the likeliness of an evaluation as a hint the branch predictor.
If you want to see the semantic difference between the two code versions just compile them with -S and examinate the generated asm code, there's no other magic way to do it.
The only sensible answer to "what is the performance impact of ...", is "measure it". There are very few generic answers.
In the particular case you show, it would be rather surprising if an optimising compiler generated significantly different code for the two examples. On the other hand, I can believe that a loop like:
unsigned sum = 0;
unsigned stop = -1;
for (int i = 0; i<32; i++)
{
stop &= checkcondition(); // returns 0 or all-bits-set;
sum += (stop & x[i]);
}
might be faster than:
unsigned sum = 0;
for (int i = 0; i<32; i++)
{
if (!checkcondition())
break;
sum += x[i];
}
for a particular compiler, for a particular platform, with the right optimization levels set, and for a particular pattern of "checkcondition" results.
... but the only way to tell would be to measure.
This question already has an answer here:
How to prevent optimization of busy-wait
(1 answer)
Closed 7 years ago.
I am doing some experiments on CPU's performance. I wonder if anyone know a formal way or a tool to generate simple code that can run for a period of time (several seconds) and consumes significant computation resource of a CPU.
I know there are a lot of CPU benchmarks but the code of them is pretty complicated. What I want is a program more straight forward.
As the compiler is very smart, writing some redundant code as following will not work.
for (int i = 0; i < 100; i++) {
int a = i * 200 + 100;
}
Put the benchmark code in a function in a separate translation unit from the code that calls it. This prevents the code from being inlined, which can lead to aggressive optimizations.
Use parameters for the fixed values (e.g., the number of iterations to run) and return the resulting value. This prevents the optimizer from doing too much constant folding and it keeps it from eliminating calculations for a variable that it determines you never use.
Building on the example from the question:
int TheTest(int iterations) {
int a;
for (int i = 0; i < iterations; i++) {
a = i * 200 + 100;
}
return a;
}
Even in this example, there's still a chance that the compiler might realize that only the last iteration matters and completely omit the loop and just return 200*(iterations - 1) + 100, but I wouldn't expect that to happen in many real-life cases. Examine the generated code to be certain.
Other ideas, like using volatile on certain variables can inhibit some reasonable optimizations, which might make your benchmark perform worse that actual code.
There are also frameworks, like this one, for writing benchmarks like these.
It's not necessarily your optimiser that removes the code. CPU's these days are very powerful, and you need to increase the challenge level. However, note that your original code is not a good general benchmark: you only use a very subset of a CPU's instruction set. A good benchmark will try to challenge the CPU on different kinds of operations, to predict the performance in real world scenarios. Very good benchmarks will even put load on various components of your computer, to test their interplay.
Therefore, just stick to a well known published benchmark for your problem. There is a very good reason why they are more involved. However, if you really just want to benchmark your setup and code, then this time, just go for higher counter values:
double j=10000;
for (double i = 0; i < j*j*j*j*j; i++)
{
}
This should work better for now. Note that there a just more iterations. Change j according to your needs.
I have a method in my project which when placed into its own program takes mere seconds to run, when run inside the project where it belongs it takes 5 minutes. I have NO idea why. I have tried profiling, taking bits out, changing this and that. I'm stumped.
It populates a vector of integers to be used by another class, but this class is not currently being instantiated. I have checked as much as I can and it does seem as if there really is nothing else happening but the method magically taking longer in one project than it does in another.
The method is run at startup and takes about 5 minutes, or about 3 seconds if run on its own. What could be causing this? Strange project settings? Multithreading stuff that I'm unaware of? (as far as I know there is none in my project anyway unless it's done automatically).
There is a link to the project here. If anyone could solve this for me I would be so grateful, and as soon as I can will start a bounty for this.
The method is called PopulatePathVectors and runs in Level.cpp. Commenting out the call to the method (in the level constructor) means the program starts up in seconds. The only other class that uses the lists it generates is Agent, but currently none are being instantiated.
EDIT - As requested, here is the code. Although keep in mind that my question is not 'why is the code slow?' but 'why is it fast in one place and slow in my project?'
//parses the text path vector into the engine
void Level::PopulatePathVectors(string pathTable)
{
// Read the file line by line.
ifstream myFile(pathTable);
for (unsigned int i = 0; i < nodes.size(); i++)
{
pathLookupVectors.push_back(vector<vector<int>>());
for (unsigned int j = 0; j < nodes.size(); j++)
{
string line;
if (getline(myFile, line)) //enter if a line is read successfully
{
stringstream ss(line);
istream_iterator<int> begin(ss), end;
pathLookupVectors[i].push_back(vector<int>(begin, end));
}
}
}
}
EDIT - I understand that the code is not the best it could be, but that isn't the point here. It runs quickly on it's own - about 3 seconds and that's fine for me. The problem I'm tying to solve is why it takes so much longer inside the project.
EDIT - I commented out all of the game code apart from the main game loop. I placed the method into the initialize section of the code which is run once on start up. Apart from a few methods setting up a window it's now pretty much the same as the program with ONLY the method in, only it STILL takes about 5 minutes to run. Now I know it has nothing to do with dependencies on the pathLookupVectors. Also, I know it's not a memory thing where the computer starts writing to the hard drive because while the slow program is chugging away running the method, I can open another instance of VS and run the single method program at the same time which completes in seconds. I realise that the problem might be some basic settings, but I'm not experienced so apologies if this does disappointingly end up being the reason why. I still don't have a clue why it's taking so much longer.
It would be great if this didn't take so long in debug mode as it means waiting 5 minutes every time I make a change. There MUST be a reason why this is being so slow here. These are the other included headers in the cut down project:
#include <d3d10.h>
#include <d3dx10.h>
#include "direct3D.h"
#include "game.h"
#include "Mesh.h"
#include "Camera.h"
#include "Level.h"
#include <vector>
using namespace std;
EDIT - this is a much smaller self contained project with only a tiny bit of code where the problem still happens.
this is also a very small project with the same code where it runs very fast.
I ran this code in MSVC10 (same compiler you are using) and duplicated your results with the projects you provided. However I was unable to profile with this compiler due to using the express version.
I ran this code in the MSVC9 compiler, and it ran 5 times faster! I also profiled it, and got these results:
Initialize (97.43%)
std::vector::_Assign (29.82%)
std::vector::erase (12.86%)
std::vector::_Make_iter (3.71%)
std::_Vector_const_iterator (3.14%)
std::_Iterator_base (3.71%)
std::~_Ranit (3.64%)
std::getline (27.74%)
std::basic_string::append (12.16%)
std::basic_string::_Grow (3.66%)
std::basic_string::_Eos (3.43%)
std::basic_streambuf::snextc (5.61%)
std::operator<<(std::string) (13.04%)
std::basic_streambuf::sputc(5.84%)
std::vector::push_back (11.84%)
std::_Uninit_move::?? (3.32%)
std::basic_istream::operator>>(int) (7.77%)
std::num_get::get (4.6%)
std::num_get::do_get (4.55%)
The "fast" version got these results: (scaled to match other times):
Initialize (97.42%)
std::_Uninit_copy (31.72%)
std::_Construct (18.58%)
std::_Vector_const_iterator::operator++ (6.34%)
std::_Vector_const_iterator::operator!= (3.62%)
std::getline (25.37%)
std::getline (13.14%)
std::basic_ios::widen (12.23%)
std::_Construct (18.58%)
std::vector::vector (14.05%)
std::_Destroy (14.95%)
std::vector::~vector (11.33%)
std::vector::_Tidy (23.46%)
std::_Destroy (19.89%)
std::vector::~vector (12.23%)
[ntdll.dll] (3.62%)
After studying these results and considering Michael Price's comments many times, it dawned on me to make sure the input files were the same size. When this dawned on me, I realized the profile for the "fast" version, does not show std::operator<<(std::string) or std::vector::push_back at all, which seems suspicious. I checked the MethodTest project, and found that it did not have a WepTestLevelPathTable.txt, causing the getline to fail, and the entire function to do almost nothing at all, except allocate a bunch of empty vectors. When I copied the WepTestLevelPathTable.txt to the MethodTest project, it is exactly the same speed as the "slow" verison. Case solved. Use a smaller file for debug builds.
Here're a few methods I believe could slow down the start-up process:
Level::GenerateGridNodes():
void Level::GenerateGridNodes()
{
int gridHeight = gridSize;
int gridWidth = gridSize;
// ADD THIS STATEMENT:
nodes.reserve(gridHeight*gridWidth);
for (int h = 2; h <= gridHeight; h++)
{
for (int w = 2; w <= gridWidth; w++)
{
nodes.push_back(Node(D3DXVECTOR3((float)w, (float)h, 0)));
}
}
}//end GenerateGridNodes()
Level::CullInvalidNodes(): For std::vectors, use remove-erase idiom to make erasing elements faster. You also need to re-think of how this function should work because it seems to have lots of redudant erasing and adding of nodes. Would it make sense in the code that instead of erasing you could simply assign the value you push_back() right after deletion to the value you're erasing? So instead of v.erase(itr) followed by v.push_back(new_element), you could simply do *itr = new_element;? DISCLAIMER: I haven't looked at actually what the functions does. Honestly, I don't have the time for that. I'm just pointing you to a possiblity.
In Level::LinkNodes():
void Level::LinkNodes()
{
//generates a vector for every node
// ADD THIS BEFORE THE FOR LOOP
nodeAdjacencyVectors.reserve(nodes.size());
for (unsigned int i = 0; ....)
//... Rest of the code
}//end LinkNodes()
In short, you still have got room for a lot of improvement. I believe the main hog is the Level class functions. You should have another look through it and probably rethink how each function should be implemented. Especially those invoked within the constructor of Level class.
It contained a loop withing a loop, each inner loop reads each line of an over 30MB data file. Of course it's going to be slow. I would say it's slow by design.
Disclaimer: I tried to search for similar question, however this returned about every C++ question... Also I would be grateful to anyone that could suggest a better title.
There are two eminent loop structure in C++: while and for.
I purposefully ignore the do ... while construct, it is kind of unparalleled
I know of std::for_each and BOOST_FOREACH, but not every loop is a for each
Now, I may be a bit tight, but it always itches me to correct code like this:
int i = 0;
while ( i < 5)
{
// do stuff
++i; // I'm kind and use prefix notation... though I usually witness postfix
}
And transform it in:
for (int i = 0; i < 5; ++i)
{
// do stuff
}
The advantages of for in this example are multiple, in my opinion:
Locality: the variable i only lives in the scope of the loop
Pack: the loop 'control' is packed, so with only looking at the loop declaration I can figure if it is correctly formed (and will terminate...), assuming of course that the loop variable is not further modified within the body
It may be inlined, though I would not always advised it (that makes for tricky bugs)
I have a tendency therefore not to use while, except perhaps for the while(true) idiom but that's not something I have used in a while (pun intended). Even for complicated conditions I tend to stick to a for construct, though on multiple lines:
// I am also a fan of typedefs
for (const_iterator it = myVector.begin(), end = myVector.end();
it != end && isValid(*it);
++it)
{ /* do stuff */ }
You could do this with a while, of course, but then (1) and (2) would not be verified.
I would like to avoid 'subjective' remarks (of the kind "I like for/while better") and I am definitely interested to references to existing coding guidelines / coding standards.
EDIT:
I tend to really stick to (1) and (2) as far as possible, (1) because locality is recommended >> C++ Coding Standards: Item 18, and (2) because it makes maintenance easier if I don't have to scan a whole body loop to look for possible alterations of the control variable (which I takes for granted using a for when the 3rd expression references the loop variables).
However, as gf showed below, while do have its use:
while (obj.advance()) {}
Note that this is not a rant against while but rather an attempt to find which one of while or for use depending on the case at hand (and for sound reasons, not merely liking).
Not all loops are for iteration:
while(condition) // read e.g.: while condition holds
{
}
is ok, while this feels forced:
for(;condition;)
{
}
You often see this for any input sources.
You might also have implicit iteration:
while(obj.advance())
{
}
Again, it looks forced with for.
Additionally, when forcing for instead of while, people tend to misuse it:
for(A a(0); foo.valid(); b+=x); // a and b don't relate to loop-control
Functionally, they're the same thing, of course. The only reason to differentiate is to impart some meaning to a maintainer or to some human reader/reviewer of the code.
I think the while idiom is useful for communicating to the reader of the code that a non-linear test is controlling the loop, whereas a for idiom generally implies some kind of sequence. My brain also kind of "expects" that for loops are controlled only by the counting expression section of the for statement arguments, and I'm surprised (and disappointed) when I find someone conditionally messing with the index variable inside the execution block.
You could put it in your coding standard that "for" loops should be used only when the full for loop construct is followed: the index must be initialized in the initializer section, the index must be tested in the loop-test section, and the value of the index must only be altered in the counting expression section. Any code that wants to alter the index in the executing block should use a while construct instead. The rationale would be "you can trust a for loop to execute using only the conditions you can see without having to hunt for hidden statements that alter the index, but you can't assume anything is true in a while loop."
I'm sure there are people who would argue and find plenty of counter examples to demonstrate valid uses of for statements that don't fit my model above. That's fine, but consider that your code can be "surprising" to a maintainer who may not have your insight or brilliance. And surprises are best avoided.
i does not automatically increase within a while loop.
while (i < 5) {
// do something with X
if (X) {
i++;
}
}
One of the most beautiful stuff in C++ is the algorithms part of STL. When reading code written properly using STL, the programmer would be reading high-level loops instead of low-level loops.
I don't believe that compilers can optimize significantly better if you chose to express your loop one way or the other. In the end it all boils down to readability, and that's a somewhat subjective matter (even though most people probably agree on most examples' readability factor).
As you have noticed, a for loop is just a more compact way of saying
type counter = 0;
while ( counter != end_value )
{
// do stuff
++counter;
}
While its syntax is flexible enough to allow you to do other things with it, I try to restrict my usage of for to examples that aren't much more complicated than the above. OTOH, I wouldn't use a while loop for the above.
I tend to use for loops when there is some kind of counting going on and the loop ends when the counting ends. Obviously you have your standard for( i=0; i < maxvalue; i++ ), but also things like for( iterator.first(); !iterator.is_done(); iterator.next() ).
I use while loops when it's not clear how many times the loop might iterate, i.e. "loop until some condition that cannot be pre-computed holds (or fails to hold)".
// I am also a fan of typedefs
for (const_iterator it = myVector.begin(), end = myVector.end();
it != end && isValid(*it);
++it)
{ /* do stuff */ }
It seems to me that the above code, is rather less readable than the code below.
// I am also a fan of typedefs
const_iterator it = myVector.begin();
end = myVector.end();
while(it != end && isValid(*it))
{
/* do stuff */
++it}
Personally, I think legibility trumps these kind of formatting standards. If another programmer can't easily read your code, that leads to mistakes at worst, and at best it results in wasted time which costs the company money.
In Ye Olde C, for and while loops were not the same.
The difference was that in for loops, the compiler was free to assign a variable to a CPU register and reclaim the register after the loop. Thus, code like this had non-defined behaviour:
int i;
for (i = 0; i < N; i++) {
if (f(i)) break;
}
printf("%d", i); /* Non-defined behaviour, unexpected results ! */
I'm not 100% sure, but I believe this is described in K&R
This is fine:
int i = 0;
while (i < N) {
if (f(i)) break;
i++;
}
printf("%d", i);
Of course, this is compiler-dependent. Also, with time, compilers stopped making use of that freedom, so if you run the first code in a modern C compiler, you should get the expected results.
I wouldn't be so quick to throw away do-while loops. They are useful if you know your loop body will run at least once. Consider some code which creates one thread per CPU core. With a for loop it might appear:
for (int i = 0; i < number_of_cores; ++i)
start_thread(i);
Uentering the loop, the first thing that is checked is the condition, in case number_of_cores is 0, in which case the loop body should never run. Hang on, though - this is a totally redundant check! The user will always have at least one core, otherwise how is the current code running? The compiler can't eliminate the first redundant comparison, as far as it knows, number_of_cores could be 0. But the programmer knows better. So, eliminating the first comparison:
int i = 0;
do {
start_thread(i++);
} while (i < number_of_cores);
Now every time this loop is hit there is only one comparison instead of two on a one-core machine (with a for loop the condition is true for i = 0, false for i = 1, whereas the do while is false all the time). The first comparison is omitted, so it's faster. With less potential branching, there is less potential for branch mispredicts, so it is faster. Because the while condition is now more predictable (always false on 1-core), the branch predictor can do a better job, which is faster.
Minor nitpick really, but not something to be thrown away - if you know the body will always run at least once, it's premature pessimization to use a for loop.