I the below code demonstrates strange behaviour when trying to access an out-of-range index in a vector
#include <iostream>
#include <vector>
int main()
{
std::vector<int> a_vector(10, 0);
for(int i = 0; i < a_vector.size(); i++)
{
std::cout << a_vector[i] << ", ";
}
for(int j = 0; j <= a_vector.size(); j++)
{
std::cout << a_vector[i] << ", ";
}
return 0;
}
The first for loop produces the expected 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, output, however the second loop produces 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1318834149,.
The last number produced by the second loop changes every time to code is run, and is always large, unless the length of the vector is between three and seven (inclusive), in which case the last number is 0. It also persists for larger indexes - for example modifying the stop value of the second loop to j <= a_vector.size() + 2000 keeps producing large numbers until index 1139, at which point it reverts to 0.
Where does this number come from, does it mean anything, and most importantly why isn't the code throwing an 'out of range' error, which is what I would expect it to do when asked the access the 11th element of a vector 10 elements long
Did you meant ?
for(int j = 0; j < a_vector.size(); j++)
{
std::cout << a_vector[j] << ", ";
}
Because you're going out of the vector range, wich is an undefined behavior and will return and "random" number everytime your run it.
C++ is powerful, and with great power comes great responsibility. If you go out of range, it lets you. 99.99999999% of the time that isn't a good thing, but it still lets you do it.
As for why it changes every time, the computer is treating the hunk of memory after the end of the array as another int, then displaying it. The value of that int depends on what bits are left in that memory from when it was used last. It might have been used for a string allocated and discarded earlier in the program, it might be padding that the compiler inserts in memory allocations to optimize access, it might be active memory being used by another object. If you have no idea (like in this case), you have no way to know and shouldn't expect any sort of consistent behavior.
(What is the 0.00000001% when is it a good thing, you may ask? Once I intentionally reached beyond the range of an array in a subclass to access some private data in the parent class that had no accessor to fix a bug. This was in a library I had no control over, so I had to do this or live with the bug. Perhaps not exactly a good thing, but since I was confident of the layout of memory in that specific case it worked.)
ADDENDUM:
The case I mentioned was a pragmatic solution to a bad situation. Code needed to ship and the vendor wasn't going to fix a bug for months. Although I was confident of the behavior (I knew the exact target platform, the code was statically linked so wasn't going to be replaced by the user, etc.) this introduced code fragility and new responsibility for the future. Namely, the next time the library was updated it would almost certainly break.
So I commented the heck of the code explaining the exact issue, what I was doing, and when it should be removed. I also used lots of CAPITAL LETTERS in my commit message. And I told all of the other programmers, just in case I got hit by a bus before the bug was fixed. In other words, I exercised the great responsibility needed to wield this great power.
Related
I have implemented a pixel mask class used for checking for perfect collision. I am using SFML so the implementation is fairly straight forward:
Loop through each pixel of the image and decide whether its true or false based on its transparency value. Here is the code I have used:
// Create an Image from the given texture
sf::Image image(texture.copyToImage());
// measure the time this function takes
sf::Clock clock;
sf::Time time = sf::Time::Zero;
clock.restart();
// Reserve memory for the pixelMask vector to avoid repeating allocation
pixelMask.reserve(image.getSize().x);
// Loop through every pixel of the texture
for (unsigned int i = 0; i < image.getSize().x; i++)
{
// Create the mask for one line
std::vector<bool> tempMask;
// Reserve memory for the pixelMask vector to avoid repeating allocation
tempMask.reserve(image.getSize().y);
for (unsigned int j = 0; j < image.getSize().y; j++)
{
// If the pixel is not transparrent
if (image.getPixel(i, j).a > 0)
// Some part of the texture is there --> push back true
tempMask.push_back(true);
else
// The user can't see this part of the texture --> push back false
tempMask.push_back(false);
}
pixelMask.push_back(tempMask);
}
time = clock.restart();
std::cout << std::endl << "The creation of the pixel mask took: " << time.asMicroseconds() << " microseconds (" << time.asSeconds() << ")";
I have used the an instance of the sf::Clock to meassure time.
My problem is that this function takes ages (e.g. 15 seconds) for larger images(e.g. 1280x720). Interestingly, only in debug mode. When compiling the release version the same texture/image only takes 0.1 seconds or less.
I have tried to reduce memory allocations by using the resize() method but it didn't change much. I know that looping through almost 1 million pixels is slow but it should not be 15 seconds slow should it?
Since I want to test my code in debug mode (for obvious reasons) and I don't want to wait 5 min till all the pixel masks have been created, what I am looking for is basically a way to:
Either optimise the code / have I missed somthing obvious?
Or get something similar to the release performance in debug mode
Thanks for your help!
Optimizing For Debug
Optimizing for debug builds is generally a very counter-productive idea. It could even have you optimize for debug in a way that not only makes maintaining code more difficult, but may even slow down release builds. Debug builds in general are going to be much slower to run. Even with the flattest kind of C code I write which doesn't pose much for an optimizer to do beyond reasonable register allocation and instruction selection, it's normal for the debug build to take 20 times longer to finish an operation. That's just something to accept rather than change too much.
That said, I can understand the temptation to do so at times. Sometimes you want to debug a certain part of code only for the other operations in the software to takes ages, requiring you to wait a long time before you can even get to the code you are interested in tracing through. I find in those cases that it's helpful, if you can, to separate debug mode input sizes from release mode (ex: having the debug mode only work with an input that is 1/10th of the original size). That does cause discrepancies between release and debug as a negative, but the positives sometimes outweigh the negatives from a productivity standpoint. Another strategy is to build parts of your code in release and just debug the parts you're interested in, like building a plugin in debug against a host application in release.
Approach at Your Own Peril
With that aside, if you really want to make your debug builds run faster and accept all the risks associated, then the main way is to just pose less work for your compiler to optimize away. That's going to be flatter code typically with more plain old data types, less function calls, and so forth.
First and foremost, you might be spending a lot of time on debug mode assertions for safety. See things like checked iterators and how to disable them:
https://msdn.microsoft.com/en-us/library/aa985965.aspx
For your case, you can easily flatten your nested loop into a single loop. There's no need to create these pixel masks with separate containers per scanline, since you can always get at your scanline data with some basic arithmetic (y*image_width or y*image_stride). So initially I'd flatten the loop. That might even help modestly for release mode. I don't know the SFML API so I'll illustrate with pseudocode.
const int num_pixels = image.w * image.h;
vector<bool> pixelMask(num_pixels);
for (int j=0; j < num_pixels; ++j)
pixelMask[j] = image.pixelAlpha(j) > 0;
Just that already might help a lot. Hopefully SFML lets you access pixels with a single index without having to specify column and row (x and y). If you want to go even further, it might help to grab the pointer to the array of pixels from SFML (also hopefully possible) and use that:
vector<bool> pixelMask(image.w * image.h);
const unsigned int* pixels = image.getPixels();
for (int j=0; j < num_pixels; ++j)
{
// Assuming 32-bit pixels (should probably use uint32_t).
// Note that no right shift is necessary when you just want
// to check for non-zero values.
const unsigned int alpha = pixels[j] & 0xff000000;
pixelMask[j] = alpha > 0;
}
Also vector<bool> stores each boolean as a single bit. That saves memory but translates to some more instructions for random-access. Sometimes you can get a speed up even in release by just using more memory. I'd test both release and debug and time carefully, but you can try this:
vector<char> pixelMask(image.w * image.h);
const unsigned int* pixels = image.getPixels();
char* pixelUsed = &pixelMask[0];
for (int j=0; j < num_pixels; ++j)
{
const unsigned int alpha = pixels[j] & 0xff000000;
pixelUsed[j] = alpha > 0;
}
Loops are faster if working with costants:
1. for (unsigned int i = 0; i < image.getSize().x; i++) get this image.getSize() before the loop.
2. get the mask for one line out of the loop and reuse it. Lines are of the same length I assume. std::vector tempMask;
This shall speed you up a bit.
Note that the compilation for debugging gives way more different machine code.
I'm debugging some code at the moment, and I've come across this line:
for (std::size_t j = M; j <= M; --j)
(Written by my boss, who's on holiday.)
It looks really odd to me.
What does it do? To me, it looks like an infinite loop.
std::size_t is guaranteed by the C++ standard to be a unsigned type. And if you decrement an unsigned type from 0, the standard guarantees that the result of doing that is the largest value for that type.
That wrapped-around value is always greater than or equal to M1 so the loop terminates.
So j <= M when applied to an unsigned type is a convenient way of saying "run the loop to zero then stop".
Alternatives such as running j one greater than you want, and even using the slide operator for (std::size_t j = M + 1; j --> 0; ){ exist, which are arguably clearer although require more typing. I guess one disadvantage though (other than the bewildering effect it produces on first inspection) is that it doesn't port well to languages with no unsigned types, such as Java.
Note also that the scheme that your boss has picked "borrows" a possible value from the unsigned set: it so happens in this case that M set to std::numeric_limits<std::size_t>::max() will not have the correct behaviour. In fact, in that case, the loop is infinite. (Is that what you're observing?) You ought to insert a comment to that effect in the code, and possibly even assert on that particular condition.
1 Subject to M not being std::numeric_limits<std::size_t>::max().
What your boss was probably trying to do was to count down from M to zero inclusive, performing some action on each number.
Unfortunately there's an edge case where that will indeed give you an infinite loop, the one where M is the maximum size_t value you can have. And, although it's well defined what an unsigned value will do when you decrement it from zero, I maintain that the code itself is an example of sloppy thinking, especially as there's a perfectly viable solution without the shortcomings of your bosses attempt.
That safer variant (and more readable, in my opinion, while still maintaining a tight scope limit), would be:
{
std::size_t j = M;
do {
doSomethingWith(j);
} while (j-- != 0);
}
By way of example, see the following code:
#include <iostream>
#include <cstdint>
#include <climits>
int main (void) {
uint32_t quant = 0;
unsigned short us = USHRT_MAX;
std::cout << "Starting at " << us;
do {
quant++;
} while (us-- != 0);
std::cout << ", we would loop " << quant << " times.\n";
return 0;
}
This does basically the same thing with an unsigned short and you can see it processes every single value:
Starting at 65535, we would loop 65536 times.
Replacing the do..while loop in the above code with what your boss basically did will result in an infinite loop. Try it and see:
for (unsigned int us2 = us; us2 <= us; --us2) {
quant++;
}
I appear to have coded a class that travels backwards in time. Allow me to explain:
I have a function, OrthogonalCamera::project(), that sets a matrix to a certain value. I then print out the value of that matrix, as such.
cam.project();
std::cout << "My Projection Matrix: " << std::endl << ProjectionMatrix::getMatrix() << std::endl;
cam.project() pushes a matrix onto ProjectionMatrix's stack (I am using the std::stack container), and ProjectionMatrix::getMatrix() just returns the stack's top element. If I run just this code, I get the following output:
2 0 0 0
0 7.7957 0 0
0 0 -0.001 0
-1 -1 -0.998 1
But if I run the code with these to lines after the std::cout call
float *foo = new float[16];
Mat4 fooMatrix = foo;
Then I get this output:
2 0 0 0
0 -2 0 0
0 0 -0.001 0
-1 1 -0.998 1
My question is the following: what could I possibly be doing such that code executed after I print a value changes the value being printed?
Some of the functions I'm using:
static void load(Mat4 &set)
{
if(ProjectionMatrix::matrices.size() > 0)
ProjectionMatrix::matrices.pop();
ProjectionMatrix::matrices.push(set);
}
static Mat4 &getMatrix()
{
return ProjectionMatrix::matrices.top();
}
and
void OrthogonalCamera::project()
{
Mat4 orthProjection = { { 2.0f / (this->r - this->l), 0, 0, -1 * ((this->r + this->l) / (this->r - this->l)) },
{ 0, 2.0f / (this->t - this->b), 0, -1 * ((this->t + this->b) / (this->t - this->b)) },
{ 0, 0, -2.0f / (this->farClip - this->nearClip), -1 * ((this->farClip + this->nearClip) / (this->farClip - this->nearClip)) },
{ 0, 0, 0, 1 } }; //this is apparently the projection matrix for an orthographic projection.
orthProjection = orthProjection.transpose();
ProjectionMatrix::load(orthProjection);
}
EDIT: whoever formatted my code, thank you. I'm not really too good with the formatting here, and it looks much nicer now :)
FURTHER EDIT: I have verified that the initialization of fooMatrix is running after I call std::cout.
UPTEENTH EDIT: Here is the function that initializes fooMatrix:
typedef Matrix<float, 4, 4> Mat4;
template<typename T, unsigned int rows, unsigned int cols>
Matrix<T, rows, cols>::Matrix(T *set)
{
this->matrixData = new T*[rows];
for (unsigned int i = 0; i < rows; i++)
{
this->matrixData[i] = new T[cols];
}
unsigned int counter = 0; //because I was too lazy to use set[(i * cols) + j]
for (unsigned int i = 0; i < rows; i++)
{
for (unsigned int j = 0; j < cols; j++)
{
this->matrixData[i][j] = set[counter];
counter++;
}
}
}
g64th EDIT: This isn't just an output problem. I actually have to use the value of the matrix elsewhere, and it's value aligns with the described behaviours (whether or not I print it).
TREE 3rd EDIT: Running it through the debugger gave me a yet again different value:
-7.559 0 0 0
0 -2 0 0
0 0 -0.001 0
1 1 -0.998 1
a(g64, g64)th EDIT: the problem does not exist compiling on linux. Just on Windows with MinGW. Could it be a compiler bug? That would make me sad.
FINAL EDIT: It works now. I don't know what I did, but it works. I've made sure I was using an up-to-date build that didn't have the code that ensures causality still functions, and it works. Thank you for helping me figure this out, stackoverflow community. As always you've been helpful and tolerant of my slowness. I'll by hypervigilant for any undefined behaviours or pointer screw-ups that can cause this unpredictability.
You're not writing your program instruction by instruction. You are describing its behavior to a C++ compiler, which then tries to express the same in machine code.
The compiler is allowed to reorder your code, as long as the observable behavior does not change.
In other words, the compiler is almost certainly reordering your code. So why does the observable behavior change?
Because your code exhibits undefined behavior.
Again, you are writing C++ code. C++ is a standard, a specification saying what the "meaning" of your code is. You're working under a contract that "As long as I, the programmer, write code that can be interpreted according to the C++ standard, then you, the compiler, will generate an executable whose behavior matches that of my source code".
If your code does anything not specified in this standard, then you have violated this contract. You have fed the compiler code whose behavior can not be interpreted according to the C++ standard. And then all bets are off. The compiler trusted you. It believed that you would fulfill the contract. It analyzed your code and generated an executable based on the assumption that you would write code that had a well-defined meaning. You did not, so the compiler was working under a false assumption. And then anything it builds on top of that assumption is also invalid.
Garbage in, garbage out. :)
Sadly, there's no easy way to pinpoint the error. You can carefully study ever piece of your code, or you can try stepping through the offending code in the debugger. Or break into the debugger at the point where the "wrong" value is seen, and study the disassembly and how you got there.
It's a pain, but that's undefined behavior for you. :)
Static analysis tools (Valgrind on Linux, and depending on your version of Visual Studio, the /analyze switch may or may not be available. Clang has a similar option built in) may help
What is your compiler? If you are compiling with gcc, try turning on thorough and verbose warnings. If you are using Visual Studio, set your warnings to /W4 and treat all warnings as errors.
Once you have done that and can still compile, if the bug still exists, then run the program through Valgrind. It is likely that at some point in your program, at an earlier point, you read past the end of some array and then write something. That something you write is overwriting what you're trying to print. Therefore, when you put more things on the stack, reading past the end of some array will put you in a completely different location in memory, so you are instead overwriting something else. Valgrind was made to catch stuff like that.
I'm trying to optimize some C++ code for speed, and not concerned about memory usage. If I have some function that, for example, tells me if a character is a letter:
bool letterQ ( char letter ) {
return (lchar>=65 && lchar<=90) ||
(lchar>=97 && lchar<=122);
}
Would it be faster to just create a lookup table, i.e.
int lookupTable[128];
for (i = 0 ; i < 128 ; i++) {
lookupTable[i] = // some int value that tells what it is
}
and then modifying the letterQ function above to be
bool letterQ ( char letter ) {
return lookupTable[letter]==LETTER_VALUE;
}
I'm trying to optimize for speed in this simple region, because these functions are called a lot, so even a small increase in speed would accumulate into long-term gain.
EDIT:
I did some testing, and it seems like a lookup array performs significantly better than a lookup function if the lookup array is cached. I tested this by trying
for (int i = 0 ; i < size ; i++) {
if ( lookupfunction( text[i] ) )
// do something
}
against
bool lookuptable[128];
for (int i = 0 ; i < 128 ; i++) {
lookuptable[i] = lookupfunction( (char)i );
}
for (int i = 0 ; i < size ; i++) {
if (lookuptable[(int)text[i]])
// do something
}
Turns out that the second one is considerably faster - about a 3:1 speedup.
About the only possible answer is "maybe" -- and you can find out by running a profiler or something else to time the code. At one time, it would have been pretty easy to give "yes" as the answer with little or no qualification. Now, given how much faster CPUs have gotten than memory, it's a lot less certain -- you can do a lot of computation in the time it takes to fill one cache line from main memory.
Edit: I should add that in either C or C++, it's probably best to at least start with the functions (or macros) built into the standard library. These are often fairly carefully optimized for the target and (more importantly for most people) support things like switching locales, so you won't be stuck trying to explain to your German users that 'ß' isn't really a letter (and I doubt many will be much amused by "but that's really two letters, not one!)
First, I assume you have profiled the code and verified that this particular function is consuming a noticeable amount of CPU time over the runtime of the program?
I wouldn't create a vector as you're dealing with a very fixed data size. In fact, you could just create a regular C++ array and initialize is at program startup. With a really modern compiler that supports array initializers you even can do something like this:
bool lookUpTable[128] = { false, false, false, ..., true, true, ... };
Admittedly I'd probably write a small script that generates out the code rather then doing it all manually.
For a simple calculation like this, the memory access (caused by a lookup table) is going to be more expensive than just doing the calculation every time.
I am implementing a particle interaction simulator in pthreads,and I keep getting segmentation faults in my pthreads code. The fault occurs in the following loop, which each thread does in the end of each timestep in my thread_routine:
for (int i = first; i < last; i++)
{
get_id(particles[i], box_id);
pthread_mutex_lock(&locks[box_id.x + box_no * box_id.y]);
//cout << box_id.x << "," << box_id.y << "," << thread_id << "l" << endl;
box[box_id.x][box_id.y].push_back(&particles[i]);
//cout << box_id.x << box_id.y << endl;
pthread_mutex_unlock(&locks[box_id.x + box_no * box_id.y]);
}
The strange thing is that if I uncomment one (it doesn't matter which one) or both of the couts, the program runs as expected, with no errors occurring (but this obviously kills performance, and isn't an elegant solution), giving correct output.
box is a globally declared
vector < vector < vector < particle_t*> > > box
which represents a decomposition of my (square) domain into boxes.
When the loop starts, box[i][j].size() has been set to zero for all i, j, and the loop is supposed to put particles back into the box-structure (the get_id function gives correct results, I've checked)
The array pthread_mutex_t locks is declared as a global
pthread_mutex_t * locks,
and the size is set by thread 0 and the locks initialized by thread 0 before the other threads are created:
locks = (pthread_mutex_t *) malloc( box_no*box_no * sizeof( pthread_mutex_t ) );
for (int i = 0; i < box_no*box_no; i++)
{
pthread_mutex_init(&locks[i],NULL);
}
Do you have any idea of what could cause this? The code also runs if the number of processors is set to 1, and it seems like the more processors I run on, the earlier the seg fault occurs (it has run through the entire simulation once on two processors, but this seems to be an exception)
Thanks
This is only an educated guess, but based on the problem going away if you use one lock for all the boxes: push_back has to allocate memory, which it does via the std::allocator template. I don't think allocator is guaranteed to be thread-safe and I don't think it's guaranteed to be partitioned, one for each vector, either. (The underlying operator new is thread-safe, but allocator usually does block-slicing tricks to amortize operator new's cost.)
Is it practical for you to use reserve to preallocate space for all your vectors ahead of time, using some conservative estimate of how many particles are going to wind up in each box? That's the first thing I'd try.
The other thing I'd try is using one lock for all the boxes, which we know works, but moving the lock/unlock operations outside the for loop so that each thread gets to stash all its items at once. That might actually be faster than what you're trying to do -- less lock thrashing.
Are the box and box[i] vectors initialized properly? You only say the innermost set of vectors are set. Otherwise it looks like box_id's x or y component is wrong and running off the end of one of your arrays.
What part of the look is it crashing on?