Stack overflow when passing a large object by reference - c++

I have a c++ class which contains structs representing vectors of preset length and matrices of preset size as well. Each vector is just an array of doubles and each matrix is an array of vectors. I chose not to use the Vector class that C++ provides since I don't need to resize the vectors at all and won't be using any of the instance methods of a vector. I was just looking for a wrapper around my double arrays.
The goal of this class is to perform matrix multiplication of 2 big matrices (512x512) by breaking the matrix into smaller blocks and then performing the multiplication across several nodes on our local computing cluster using MPI. I had an issue with stack overflow exceptions when just trying to break the matrix into the smaller blocks. Here is some code:
// Vector Structs
struct Vec512 { double values[512]; };
struct Vec256 { double values[256]; };
struct Vec128 { double values[128]; };
struct Vec64 { double values[64]; };
// Matrix Structs
struct Mat512 {
Vec512 rows[512];
Mat512(){}
Mat512(MatrixInitEnum e){
switch(e){
case Empty:
for(int row = 0; row < 512; row++){
Vec512 temp;
for(int col = 0; col < 512; col++){
temp.values[col] = 0;
}
rows[row] = temp;
}
break;
case Random:
for(int row = 0; row < 512; row++){
Vec512 temp;
for(int col = 0; col < 512; col++){
temp.values[col] = myRandom();
}
rows[row] = temp;
}
break;
}
}
Vec512 GetRow(int row){
return rows[row];
}
Vec512 GetColumn(int col){
Vec512 column;
for(int i = 0; i < 512; i++){
column.values[i] = rows[i].values[col];
}
return column;
}
void SetValue(int row, int col, double value){
rows[row].values[col] = value;
}
double GetValue(int row, int col){
return rows[row].values[col];
}
};
// Analogous structs for Mat256, Mat128, Mat64
/*Decomposes the big matrix into 4 256x256 matrices in row-major fashion*/
Mat256* DecomposeMatrix256(Mat512 *bigMat){
Mat256 matArray[4];
int beginRow, endRow, beginCol, endCol, rowOffset, colOffset;
for(int it = 0; it < 4; it++){
beginRow = (it/2) * 256;
endRow = beginRow + 256;
beginCol = (it % 2) * 256;
endCol = beginCol + 256;
rowOffset = (it / 2) * 256;
colOffset = (it % 2) * 256;
for(int row = beginRow; row < endRow; row++){
for(int col = beginCol; col < endCol; col++){
double val = bigMat->GetValue(row, col);
matArray[it].SetValue(row - rowOffset, col - colOffset, val);
}
}
}
return matArray;
}
// Analogous methods for breaking into 16 128x128 Mat128s and 64 64x64 Mat64s
Then my main method was simply
int main(int argc, char* argv[])
{
cout << "Welcome, the program is now initializing the matrices.\n";
Mat512* bigMat = new Mat512(Random); // Creates this just fine
Mat256* mats256 = DecomposeMatrix256(bigMat); // Gets here and can step to the signature of the method above without issue
// MPI code to split up the multiplication and to
// wait until user is ready to exit
return 0;
}
Here is what my issue was/is:
I could create the big Mat512 my my random values no problem. I set a break point at the point where the big matrix was created and verified that it was being created successfully. Then I stepped into the call to DecomposeMatrix256(Mat512 * bigMat) and saw that I was getting to the method no problem. Also, when hovering over the bigMat object, visual studio showed me that it was indeed receiving the big matrix. When I tried to step into the method, I got a stack overflow exception immediately.
What I am confused about is why I would get the stack overflow before I even create another new object (like the array of 4 256x256 matrices). I am pretty sure I am passing the matrix by reference and not by value (I am used to C# and not C++ so I would be happy to hear that I am simply doing it wrong on the reference passing) so I thought that there wouldn't be a big overhead in simply passing a reference to the big matrix.
I was able to resolve my issue by going into the project configuration settings and increasing the stack reserve size from 1MB (the default) to 8MB (possibly overkill but I just wanted it to work for my debugging purposes).
Can someone explain why I would be getting the overflow when I am simply passing a reference to the big matrix and not the matrix itself (by value)? Again, I got it to work by increasing the stack size but I don't see why this would be necessary when I am passing the object by reference and not by value.
Thanks for reading and for the input. I'd be happy to post anything else that is relevant to helping understand my issue.

DecomposeMatrix256() creates an array of four Mat256 objects on the stack. This is most probably causing the overflow, since it will require a lot of stack space. The parameter you are passing is not the reason of the overflow.
As another issue, the function is returning a pointer to a local variable that will go out of scope at the end of the function. This pointer will then not longer point to a valid object.

This is because in method DecomposeMatrix256 you are creating automatic variable on stack with:
Mat256* DecomposeMatrix256(Mat512 *bigMat){
Mat256 matArray[4];
the size of this is 4*256*256*sizeof(double) which is 4*256*256*8 on x64 bit machine. 2 097 152 bytes is just too much to put on stack, thus overflow.

Related

Creating a large 1D matrix returning exit 11

I have wrote some code that uses a 1D array to represent a matrix. I am currently testing large input sizes.
When I set rows and cols to 50000 the program exits with code 11.
I tried printing a lot out.
double* create_matrix_1d(int n_rows, int n_cols) {
long long len = (long long ) n_rows * (long long) n_cols;
auto* A = new double[len];
int row, col ;
for(row = 0; row < n_rows; row++) {
for( col = 0; col < n_cols; col++) {
int i = col + row * n_cols;
A[i] = 1; //static_cast <int> (rand()) % 10 ;
}
}
return A;
}
Let's compute the required memory. A double generally uses 8 bytes, hence your matrix requires:
50000*50000*8 = 20000000000 bytes
of memory
20000000000 bytes = 20000000000 / 1024 = 19531250 kb
19531250 / 1024 = 19073 Mb
19073 / 1024 = 18.6265 Gb
So unless you have a computer with more that 19 Gb of RAM it is normal that you get some out of memory error
the answer should be easy. But I cannot be sure, because you gave not enough information about your compiler, the language and your hardware. And most important, I do net see a question.
But we guess that you want to know, why you routine fails.
Next. I was not sure, what language you use. It looks like Plain old C, but for whatever reason, you used the keyword auto and a static_cast in a comment. So, it 'should' be C++.
First the answer, then some additional comments:
Your are trying to allocate 19GB on the heap. Depending on the memory model that you are using and the availibilty of pyhsical RAM, this will most probably fail.
Additionally your are writing
int i = col + row * n_cols;
This will create an overflow.
Second: Some improvement proposals.
If you are unsing modern C++, then you should use modern C++. Sounds strange, but your Code is C-Style.
The, if you really want to handle big data, you could consider a database. But I doubt that you really need 19GB of filled data. There are other techniques availible to store only needed data. You need to change your algoritm.
I commented your code to give at least some proposals for improvement:
// Rows and Cols could be made const. They are not modified in your code
// If you anyway later cast to long long, then you could also make the parameters long long
// You should use unique_ptr to take ownership of the allocated memory
// But this cannot be copied and needs to be "moved" out of the function
// You should use a C++ container to hold your matrix, like a std::vector
double* create_matrix_1d(int n_rows, int n_cols) {
// You should not use C-Style Cast but static_cast
long long len = (long long ) n_rows * (long long) n_cols;
// You should use a unique_ptr to handle the resource
auto* A = new double[len];
int row, col ;
for(row = 0; row < n_rows; row++) {
for( col = 0; col < n_cols; col++) {
// The "int i" can most likely hold only (2^32-1)
// SO you will get an overfolow here
int i = col + row * n_cols;
// You wanted to assign an int to a double
A[i] = 1; //static_cast <int> (rand()) % 10 ;
}
}
return A;
}
Hope, this helps
The question has been already answered by Vincent.
I just leaving a few comments about the written code:
It is always safer to use RAII, using stdlib data structures like vector should help
Also the nested loops could have written as--
for(int i=0; i<nrows*ncols; ++i)
A[i] = 1.0;
Helps the compiler to act a bit smarter by mapping to vectorized instructions.
Happy coding!

C++ Avoiding Triple Pointers

I am trying to create an array of X pointers referencing matrices of dimensions Y by 16. Is there any way to accomplish this in C++ without the use of triple pointers?
Edit: Adding some context for the problem.
There are a number of geometries on the screen, each with a transform that has been flattened to a 1x16 array. Each snapshot represents the transforms for each of number of components. So the matrix dimensions are 16 by num_components by num_snapshots , where the latter two dimensions are known at run-time. In the end, we have many geometries with motion applied.
I'm creating a function that takes a triple pointer argument, though I cannot use triple pointers in my situation. What other ways can I pass this data (possibly via multiple arguments)? Worst case, I thought about flattening this entire 3D matrix to an array, though it seems like a sloppy thing to do. Any better suggestions?
What I have now:
function(..., double ***snapshot_transforms, ...)
What I want to accomplish:
function (..., <1+ non-triple pointer parameters>, ...)
Below isn't the function I'm creating that takes the triple pointer, but shows what the data is all about.
static double ***snapshot_transforms_function (int num_snapshots, int num_geometries)
{
double component_transform[16];
double ***snapshot_transforms = new double**[num_snapshots];
for (int i = 0; i < num_snapshots; i++)
{
snapshot_transforms[i] = new double*[num_geometries];
for (int j = 0; j < num_geometries; j++)
{
snapshot_transforms[i][j] = new double[16];
// 4x4 transform put into a 1x16 array with dummy values for each component for each snapshot
for (int k = 0; k < 16; k++)
snapshot_transforms[i][j][k] = k;
}
}
return snapshot_transforms;
}
Edit2: I cannot create new classes, nor use C++ features like std, as the exposed function prototype in the header file is getting put into a wrapper (that doesn't know how to interpret triple pointers) for translation to other languages.
Edit3: After everyone's input in the comments, I think going with a flattened array is probably the best solution. I was hoping there would be some way to split this triple pointer and organize this complex data across multiple data pieces neatly using simple data types including single pointers. Though I don't think there is a pretty way of doing this given my caveats here. I appreciate everyone's help =)
It is easier, better, and less error prone to use an std::vector. You are using C++ and not C after all. I replaced all of the C-style array pointers with vectors. The typedef doublecube makes it so that you don't have to type vector<vector<vector<double>>> over and over again. Other than that the code basically stays the same as what you had.
If you don't actually need dummy values I would remove that innermost k loop completely. reserve will reserve the memory space that you need for the real data.
#include <vector>
using std::vector; // so we can just call it "vector"
typedef vector<vector<vector<double>>> doublecube;
static doublecube snapshot_transforms_function (int num_snapshots, int num_geometries)
{
// I deleted component_transform. It was never used
doublecube snapshot_transforms;
snapshot_transforms.reserve(num_snapshots);
for (int i = 0; i < num_snapshots; i++)
{
snapshot_transforms.at(i).reserve(num_geometries);
for (int j = 0; j < num_geometries; j++)
{
snapshot_transforms.at(i).at(j).reserve(16);
// 4x4 transform put into a 1x16 array with dummy values for each component for each snapshot
for (int k = 0; k < 16; k++)
snapshot_transforms.at(i).at(j).at(k) = k;
}
}
return snapshot_transforms;
}
Adding a little bit of object-orientation usually makes the code easier to manage -- for example, here's some code that creates an array of 100 Matrix objects with varying numbers of rows per Matrix. (You could vary the number of columns in each Matrix too if you wanted to, but I left them at 16):
#include <vector>
#include <memory> // for shared_ptr (not strictly necessary, but used in main() to avoid unnecessarily copying of Matrix objects)
/** Represents a (numRows x numCols) 2D matrix of doubles */
class Matrix
{
public:
// constructor
Matrix(int numRows = 0, int numCols = 0)
: _numRows(numRows)
, _numCols(numCols)
{
_values.resize(_numRows*_numCols);
std::fill(_values.begin(), _values.end(), 0.0f);
}
// copy constructor
Matrix(const Matrix & rhs)
: _numRows(rhs._numRows)
, _numCols(rhs._numCols)
{
_values.resize(_numRows*_numCols);
std::fill(_values.begin(), _values.end(), 0.0f);
}
/** Returns the value at (row/col) */
double get(int row, int col) const {return _values[(row*_numCols)+col];}
/** Sets the value at (row/col) to the specified value */
double set(int row, int col, double val) {return _values[(row*_numCols)+col] = val;}
/** Assignment operator */
Matrix & operator = (const Matrix & rhs)
{
_numRows = rhs._numRows;
_numCols = rhs._numCols;
_values = rhs._values;
return *this;
}
private:
int _numRows;
int _numCols;
std::vector<double> _values;
};
int main(int, char **)
{
const int numCols = 16;
std::vector< std::shared_ptr<Matrix> > matrixList;
for (int i=0; i<100; i++) matrixList.push_back(std::make_shared<Matrix>(i, numCols));
return 0;
}

How to use memset or fill_n to initialize a dynamic two dimensional array in C++

I have a 2D array created dynamically.
int **abc = new int*[rows];
for (uint32_t i = 0; i < rows; i++)
{
abc[i] = new int[cols];
}
I want to fill the array with some value (say 1). I can loop over each item and do it.
But is there a simpler way. I am trying to use memset and std::fill_n as mentioned in this post.
std::fill_n(abc, rows * cols, 1);
memset(abc, 1, rows * cols * sizeof(int));
Using memset crashes my program. Using fill_n gives a compile error.
invalid conversion from 'int' to 'int*' [-fpermissive]
What am I doing wrong here ?
You could just use vector:
std::vector<std::vector<int>> abc(rows, std::vector<int>(cols, 1));
You cannot use std::fill_n or memset on abc directly, it simply will not work. You can only use either on the sub-arrays:
int **abc = new int*[rows];
for (uint32_t i = 0; i < rows; i++)
{
abc[i] = new int[cols];
std::fill_n(abc[i], cols, 1);
}
Or make the whole thing single-dimensional:
int *abc = new int[rows * cols];
std::fill_n(abc, rows*cols, 1);
Or I guess you could use std::generate_n in combination with std::fill_n, but this just seems confusing:
int **abc = new int*[rows];
std::generate_n(abc, rows, [cols]{
int* row = new int[cols];
std::fill_n(row, cols, 1);
return row;
});
I think that your main problem here is that you don't have an array of int values. You have an array of pointers to ints.
You probably should start with int* abc = new int[rows * cols]; and work from there, if I understand what you are trying to achieve here.
Just use with * inside the loop you already have:
for (uint32_t i = 0; i < rows; i++)
{
abc[i] = new int[cols];
std::fill_n(*(abc+i), cols, sizeof(int));
}
fill_n don't know where the memory maps the new int array, so you must be carefully coding that way.
I recommend to read:
A proper way to create a matrix in c++
Since you've already got good, workable answers to solve your problem, I want to add just two pointers left and right from the standard path ;-)
a) is just a link to the documentation of Boost.MultiArray
and b) is something I don't recommend you use, but it might help you to understand what you've initially tried. And since your profile shows visual studio tags, you might come in contact with something like this in the win32 api. If that is the case the documentation usually tells you not to use free()/LocalFree()/... on the elements and the "outer" pointer-pointer but to use a specialized function.
(note: I'm not trying to make this code look pretty or clever; it's a mishmash of c and a little c++-ish junk ;-))
const std::size_t rows = 3, cols =4;
int main()
{
std::size_t x,y;
// allocate memory for 0...rows-1 int* pointers _and_ cols*rows ints
int **abc = (int**)malloc( (rows*sizeof(int*)) + cols*rows*sizeof(int) );
// the memory behind abc is large enough to hold the pointers for abc[0...rows-1]
// + the actual data when accessing abc[0...rows-1][0....cols-1]
int* data = (int*)((abc+rows));
// data now points to the memory right after the int*-pointer array
// i.e. &(abc[0][0]) and data should point to the same location when we're done:
// make abc[0] point to the first row (<-> data+(cols*0)), abc[1] point the second row (<-> data+(cols*1)....
for(y=0;y<rows; y++) {
abc[y] = &(data[y*cols]);
}
// now you can use abc almost like a stack 2d array
for(y=0; y<rows; y++) {
for (x=0; x<cols; x++) {
abc[y][x] = 127;
}
}
// and -since the memory block is continuos- you can also (with care) use memset
memset(&abc[0][0], 1, sizeof(int)*rows*cols);
// and with equal care ....
std::fill_n( &(abc[0][0]), rows*cols, 127);
// and get rid of the whole thing with just one call to free
free(abc);
return 0;
}

Add 1 to vector<unsigned char> value - Histogram in C++

I guess it's such an easy question (I'm coming from Java), but I can't figure out how it works.
I simply want to increment an vector element by one. The reason for this is, that I want to compute a histogram out of image values. But whatever I try I just can accomplish to assign a value to the vector. But not to increment it by one!
This is my histogram function:
void histogram(unsigned char** image, int height,
int width, vector<unsigned char>& histogramArray) {
for (int i = 0; i < width; i++) {
for (int j = 0; j < height; j++) {
// histogramArray[1] = (int)histogramArray[1] + (int)1;
// add histogram position by one if greylevel occured
histogramArray[(int)image[i][j]]++;
}
}
// display output
for (int i = 0; i < 256; i++) {
cout << "Position: " << i << endl;
cout << "Histogram Value: " << (int)histogramArray[i] << endl;
}
}
But whatever I try to add one to the histogramArray position, it leads to just 0 in the output. I'm only allowed to assign concrete values like:
histogramArray[1] = 2;
Is there any simple and easy way? I though iterators are hopefully not necesarry at this point, because I know the exakt index position where I want to increment something.
EDIT:
I'm so sorry, I should have been more precise with my question, thank you for your help so far! The code above is working, but it shows a different mean value out of the histogram (difference of around 90) than it should. Also the histogram values are way different than in a graphic program - even though the image values are exactly the same! Thats why I investigated the function and found out if I set the histogram to zeros and then just try to increase one element, nothing happens! This is the commented code above:
for (int i = 0; i < width; i++) {
for (int j = 0; j < height; j++) {
histogramArray[1]++;
// add histogram position by one if greylevel occured
// histogramArray[(int)image[i][j]]++;
}
}
So the position 1 remains 0, instead of having the value height*width. Because of this, I think the correct calculation histogramArray[image[i][j]]++; is also not working properly.
Do you have any explanation for this? This was my main question, I'm sorry.
Just for completeness, this is my mean function for the histogram:
unsigned char meanHistogram(vector<unsigned char>& histogram) {
int allOccurences = 0;
int allValues = 0;
for (int i = 0; i < 256; i++) {
allOccurences += histogram[i] * i;
allValues += histogram[i];
}
return (allOccurences / (float) allValues) + 0.5f;
}
And I initialize the image like this:
unsigned char** image= new unsigned char*[width];
for (int i = 0; i < width; i++) {
image[i] = new unsigned char[height];
}
But there shouldn't be any problem with the initialization code, since all other computations work perfectly and I am able to manipulate and safe the original image. But it's true, that I should change width and height - since I had only square images it didn't matter so far.
The Histogram is created like this and then the function is called like that:
vector<unsigned char> histogramArray(256);
histogram(array, adaptedHeight, adaptedWidth, histogramArray);
So do you have any clue why this part histogramArray[1]++; don't increases my histogram? histogramArray[1] remains 0 all the time! histogramArray[1] = 2; is working perfectly. Also histogramArray[(int)image[i][j]]++; seems to calculate something, but as I said, I think it's wrongly calculating.
I appreciate any help very much! The reason why I used a 2D Array is simply because it is asked for. I like the 1D version also much more, because it's way simpler!
You see, the current problem in your code is not incrementing a value versus assigning to it; it's the way you index your image. The way you've written your histogram function and the image access part puts very fine restrictions on how you need to allocate your images for this code to work.
For example, assuming your histogram function is as you've written it above, none of these image allocation strategies will work: (I've used char instead of unsigned char for brevity.)
char image [width * height]; // Obvious; "char[]" != "char **"
char * image = new char [width * height]; // "char*" != "char **"
char image [height][width]; // Most surprisingly, this won't work either.
The reason why the third case won't work is tough to explain simply. Suffice it to say that a 2D array like this will not implicitly decay into a pointer to pointer, and if it did, it would be meaningless. Contrary to what you might read in some books or hear from some people, in C/C++, arrays and pointers are not the same thing!
Anyway, for your histogram function to work correctly, you have to allocate your image like this:
char** image = new char* [height];
for (int i = 0; i < height; ++i)
image[i] = new char [width];
Now you can fill the image, for example:
for (int i = 0; i < height; ++i)
for (int j = 0; j < width; ++j)
image[i][j] = rand() % 256; // Or whatever...
On an image allocated like this, you can call your histogram function and it will work. After you're done with this image, you have to free it like this:
for (int i = 0; i < height; ++i)
delete[] image[i];
delete[] image;
For now, that's enough about allocation. I'll come back to it later.
In addition to the above, it is vital to note the order of iteration over your image. The way you've written it, you iterate over your columns on the outside, and your inner loop walks over the rows. Most (all?) image file formats and many (most?) image processing applications I've seen do it the other way around. The memory allocations I've shown above also assume that the first index is for the row, and the second is for the column. I suggest you do this too, unless you've very good reasons not to.
No matter which layout you choose for your images (the recommended row-major, or your current column-major,) it is in issue that you should always keep in your mind and take notice of.
Now, on to my recommended way of allocating and accessing images and calculating histograms.
I suggest that you allocate and free images like this:
// Allocate:
char * image = new char [height * width];
// Free:
delete[] image;
That's it; no nasty (de)allocation loops, and every image is one contiguous block of memory. When you want to access row i and column j (note which is which) you do it like this:
image[i * width + j] = 42;
char x = image[i * width + j];
And you'd calculate the histogram like this:
void histogram (
unsigned char * image, int height, int width,
// Note that the elements here are pixel-counts, not colors!
vector<unsigned> & histogram
) {
// Make sure histogram has enough room; you can do this outside as well.
if (histogram.size() < 256)
histogram.resize (256, 0);
int pixels = height * width;
for (int i = 0; i < pixels; ++i)
histogram[image[i]]++;
}
I've eliminated the printing code, which should not be there anyway. Note that I've used a single loop to go through the whole image; this is another advantage of allocating a 1D array. Also, for this particular function, it doesn't matter whether your images are row-major or column major, since it doesn't matter in what order we go through the pixels; it only matters that we go through all the pixels and nothing more.
UPDATE: After the question update, I think all of the above discussion is moot and notwithstanding! I believe the problem could be in the declaration of the histogram vector. It should be a vector of unsigned ints, not single bytes. Your problem seems to be that the value of the vector elements seem to stay at zero when your simplify the code and increment just one element, and are off from the values they need to be when you run the actual code. Well, this could be a symptom of numeric wrap-around. If the number of pixels in your image are a a multiple of 256 (e.g. 32x32 or 1024x1024 image) then it is natural that the sum of their number would be 0 mod 256.
I've already alluded to this point in my original answer. If you read my implementation of the histogram function, you see in the signature that I've declared my vector as vector<unsigned> and have put a comment above it that says this victor counts pixels, so its data type should be suitable.
I guess I should have made it bolder and clearer! I hope this solves your problem.

How can I optimize this function which handles large c++ vectors?

According to Visual Studio's performance analyzer, the following function is consuming what seems to me to be an abnormally large amount of processor power, seeing as all it does is add between 1 and 3 numbers from several vectors and store the result in one of those vectors.
//Relevant class members:
//vector<double> cache (~80,000);
//int inputSize;
//Notes:
//RealFFT::real is a typedef for POD double.
//RealFFT::RealSet is a wrapper class for a c-style array of RealFFT::real.
//This is because of the FFT library I'm using (FFTW).
//It's bracket operator is overloaded to return a const reference to the appropriate array element
vector<RealFFT::real> Convolver::store(vector<RealFFT::RealSet>& data)
{
int cr = inputSize; //'cache' read position
int cw = 0; //'cache' write position
int di = 0; //index within 'data' vector (ex. data[di])
int bi = 0; //index within 'data' element (ex. data[di][bi])
int blockSize = irBlockSize();
int dataSize = data.size();
int cacheSize = cache.size();
//Basically, this takes the existing values in 'cache', sums them with the
//values in 'data' at the appropriate positions, and stores them back in
//the cache at a new position.
while (cw < cacheSize)
{
int n = 0;
if (di < dataSize)
n = data[di][bi];
if (di > 0 && bi < inputSize)
n += data[di - 1][blockSize + bi];
if (++bi == blockSize)
{
di++;
bi = 0;
}
if (cr < cacheSize)
n += cache[cr++];
cache[cw++] = n;
}
//Take the first 'inputSize' number of values and return them to a new vector.
return Common::vecTake<RealFFT::real>(inputSize, cache, 0);
}
Granted, the vectors in question have sizes of around 80,000 items, but by comparison, a function which multiplies similar vectors of complex numbers (complex multiplication requires 4 real multiplications and 2 additions each) consumes about 1/3 the processor power.
Perhaps it has something to with the fact it has to jump around within the vectors rather then just accessing them linearly? I really have no idea though. Any thoughts on how this could be optimized?
Edit: I should mention I also tried writing the function to access each vector linearly, but this requires more total iterations and actually the performance was worse that way.
Turn on compiler optimization as appropriate. A guide for MSVC is here:
http://msdn.microsoft.com/en-us/library/k1ack8f1.aspx