Addition of multidimensional vectors C++ - c++

Having a bit of a headache trying to sum the elements in a 3d Vector.
Its for a k-means algorithm that I'm currently programming; an algorithm that I understand and can do on paper, but syntactically has me a bit tongue tied at the moment. I might mention that this project is the first time that I've really dealt with complex containers in C++. Currently I want calculate the new centroid for my points in a cluster, this is done by averaging the positions of every co-ordinate in the cluster. My 3d vector is set out as a vector of clusters each containing a vector which contain vectors of my co-ordinates in that cluster (I hope that sounds clear, hopefully my code will alleviate any confusion). I'm trying to use iterators at the moment, but am considering going back to ints and indices as I am more comfortable with them, though I feel that I should learn how this syntax works as it seems to be important and powerful.
I'll post just the function that I'm stuck on and the parts of the header that relate to it. If you would like to see any of the other code I'm happy to throw that in too on request, but I feel that this should be enough to show my problem.
.h file parts (public members of class):
vector< vector < vector <float> > > clusters;
vector<vector<float> > avg;
int avgDiv;
.cpp file part with comments to help elaborate my query:
vector<vector<vector<float> > >::iterator threeD;
vector<vector<float> >::iterator row;
vector<float>::iterator col;
for (threeD = clusters.begin(); threeD != clusters.end(); threeD++) {
for (row = threeD->begin(); row != threeD->end(); row++) {
for(col = row->begin(); col != row->end(); col++){
//its this code below that is causing my headache,
//I know that what is written isn't correct,
//it is there to serve as an example of what I've
//been trying to do to sort out my issue.
avg.at(row) ( = or push_back ) ((clusters.at(row).at(col)) + (clusters.at(row+1).at(col)));
}
avgDiv = distance(row->begin(),row->end());
//divide each value in avg vector by the amount of members in row, giving the new centroid for that cluster, loop forward to next cluster. this isn't a problem I should think.
}
}
My problem is that the compiler tells me that call to 'at' is not a member function. Now from what I can see from other questions it is because I'm not passing the right object as an arguement, though, I'm sure that I want to add together every element in the vector which the iterators are at together with the element next in the row.
I've tried and make this as clear as possible, please ask and I will add as much detail as I can to help you answer. I am new to this, and am very happy to take criticism; it will only make me a better programmer. Thank you for your time.

avg.at(index) is used with an integer index, it's just the 'c' array[index] notation with bounds checking - incidentally in real code you want to use [] or disable checking for speed.
But row is an iterator, effectively it's a pointer to the element in avg already so just dereference it to get the value.
*row = value of avg at position of iterator 'row'
A good tutorial on C++ iterators http://www.cprogramming.com/tutorial/stl/iterators.html
ps. With vectors and 'maths' type code, it's often simpler to just use array index notation

Related

C++: What is causing this stack smashing error?

Disclaimer: I have limited knowledge of C++ due to switching from a college where they didn't teach C++ to another where it was the only language that was taught.
I'm trying to implement the box counting method for a randomly generated 2D cluster in a lattice that's 54x54.
One of the requirements is that we use a 1D array to represent the 2D square lattice, so a transformation is required to associate x and y values (columns and lines, respectively) to the actual positions of the array.
The transformation is "i = x + y*N", with N being the length of the side of the square lattice (in this case, it would be 54) and i being the position of the array.
The box-counting method, simply put, involves splitting a grid into large squares that get progressively smaller and counting how many contain the cluster in each instance.
The code works in the way that it should for smaller lattice sizes, at least the ones that I could verify (for obvious reasons, I can't verify even a 10x10 lattice by hand). However, when I run it, the box size goes all the way to 1/37 and gives me a "stack smashing detected" error.
From what I understand, the error may have something to do with array sizes, but I've checked the points where the arrays are accessed and made sure they're within the actual dimensions of the array.
A "for" in the function "boxTransform(int grid[], int NNew, int div)" is responsible for the error in question, but I added other functions that I believe are relevant to it.
The rest of the code is just defining a lattice and isolating the aggregate, which is then passed to boxCounting(int grid[]), and creating a .dat file. Those work fine.
To "fit" the larger array into the smaller one, I divide each coordinate (x, y) by the ratio of squares on the large array to the small array. This is how my teacher explained it, and as mentioned before, works fine for smaller array sizes.
EDIT: Thanks to a comment by VTT, I went back and checked if the array index goes out of bounds with the code itself. It is indeed the case, which is likely the origin of the problem.
EDIT #2: It was indeed the origin of the problem. There was a slight error in the calculations that didn't appear for smaller lattice sizes (or I just missed it).
//grid[] is an array containing the cluster
//that I want to analyze.
void boxCounting(int grid[]) {
//N is a global constant; it's the length of the
//side of the square lattice that's being analyzed.
//NNew is the side of the larger squares. It will
//be increased until it reaches N
for (int NNew = 1; N - NNew > 0; NNew++) {
int div = N/NNew;
boxTransform(grid, NNew, div);
}
}
void boxTransform(int grid[], int NNew, int div) {
int gridNew[NNew*NNew];
//Here the array elements are set to zero, which
//I understand C++ cannot do natively
for (int i = 0; i < NNew*NNew; i++) {
gridNew[i] = 0;
}
for (int row = 0; row < N; row++) {
for (int col = 0; col < N; col++) {
if (grid[col + row*N] == 1) {
//This is where the error occurs. The idea here is
//that if a square on the initial grid is occupied,
//the corresponding square on the new grid will have
//its value increased by 1, so I can later check
//how many squares on the larger grid are occupied
gridNew[col/div + (row/div)*NNew]++;
}
}
}
int boxes = countBox(gridNew, NNew);
//Creates a .dat file with the relevant values
printResult(boxes, NNew);
}
int countBox(int grid[], int NNew) {
int boxes = 0;
//Any array values that weren't touched remain at zero,
//so I just have to check that it's greater than zero
//to know if the square is occupied or not
for(int i = 0; i < NNew*NNew; i++) {
if(grid[i] > 0) boxes++;
}
return boxes;
}
Unfortunately this is not enough information to find the exact problem for you but I will try to help.
There are like multiple reasons that you should use a dynamic array instead of the fixed size arrays that you are using except if it's required in your exercise.
If you've been learning other languages you might think that fixed array is good enough, but it's far more dangerous in C++ than in most of the languages.
int gridNew[NNew*NNew]; You should know that this is not valid according to C++ standard, only the GCC compiler made it work. In C++ you always have to know the size of the fixed arrays in compile time. Which means you can't use variables to declare an array.
You keep updating global variables to track the size of the array which makes your code super hard to read. You are probably doing this because you know that you are not able to query the size of the array once you pass it to a function.
For both of these problems a dynamic array is the perfect solution. The standard dynamic array implementation in C++ is the std::vector: https://en.cppreference.com/w/cpp/container/vector
When you create a vector you can define it's size and also you can query the length of the vector with the size() member function.
Even better: You can use the at() function instead of the square brackets([]) to get and element with an index which does bounds check for you and throws an exception if you provided an index which is out of bounds which helps a lot to locate these kind of errors. Because in C++ if you just simply provide an index which does not exist in an array it is an undefined behaviour which might be your problem.
I wouldn't like to write any more features of the vector because it's really easy to find examples on how to do these things, I just wanted to help you where to start.
VTT was right in his comment. There was a small issue with the transformation to fit the large array into the smaller one that made the index go out of bounds. I only checked this on pen and paper when I should've put it in the actual code, which is why I didn't notice it. Since he didn't post it as an answer, I'm doing so on his behalf.
The int gridNew[NNew*NNew]; bit was kind of a red herring, but I appreciate the lesson and will take that into account when coding in C++ in the future.

Copying vector elements to a vector pair

In my C++ code,
vector <string> strVector = GetStringVector();
vector <int> intVector = GetIntVector();
So I combined these two vectors into a single one,
void combineVectors(vector<string>& strVector, vector <int>& intVector, vector < pair <string, int>>& pairVector)
{
for (int i = 0; i < strVector.size() || i < intVector.size(); ++i )
{
pairVector.push_back(pair<string, int> (strVector.at(i), intVector.at(i)));
}
}
Now this function is called like this,
vector <string> strVector = GetStringVector();
vector <int> intVector = GetIntVector();
vector < pair <string, int>> pairVector
combineVectors(strVector, intVector, pairVector);
//rest of the implementation
The combineVectors function uses a loop to add the elements of other 2 vectors to the vector pair. I doubt this is a efficient way as this function gets called hundrands of times passing different data. This might cause a performance issue because everytime it goes through the loop.
My goal is to copy both the vectors in "one go" to the vector pair. i.e., without using a loop. Am not sure whether that's even possible.
Is there a better way of achieving this without compromising the performance?
You have clarified that the arrays will always be of equal size. That's a prerequisite condition.
So, your situation is as follows. You have vector A over here, and vector B over there. You have no guarantees whether the actual memory that vector A uses and the actual memory that vector B uses are next to each other. They could be anywhere.
Now you're combining the two vectors into a third vector, C. Again, no guarantees where vector C's memory is.
So, you have really very little to work with, in terms of optimizations. You have no additional guarantees whatsoever. This is pretty much fundamental: you have two chunks of bytes, and those two chunks need to be copied somewhere else. That's it. That's what has to be done, that's what it all comes down to, and there is no other way to get it done, other than doing exactly that.
But there is one thing that can be done to make things a little bit faster. A vector will typically allocate memory for its values in incremental steps, reserving some extra space, initially, and as values get added to the vector, one by one, and eventually reach the vector's reserved size, the vector has to now grab a new larger block of memory, copy everything in the vector to the larger memory block, then delete the older block, and only then add the next value to the vector. Then the cycle begins again.
But you know, in advance, how many values you are about to add to the vector, so you simply instruct the vector to reserve() enough size in advance, so it doesn't have to repeatedly grow itself, as you add values to it. Before your existing for loop, simply:
pairVector.reserve(pairVector.size()+strVector.size());
Now, the for loop will proceed and insert new values into pairVector which is guaranteed to have enough space.
A couple of other things are possible. Since you have stated that both vectors will always have the same size, you only need to check the size of one of them:
for (int i = 0; i < strVector.size(); ++i )
Next step: at() performs bounds checking. This loop ensures that i will never be out of bounds, so at()'s bound checking is also some overhead you can get rid of safely:
pairVector.push_back(pair<string, int> (strVector[i], intVector[i]));
Next: with a modern C++ compiler, the compiler should be able to optimize away, automatically, several redundant temporaries, and temporary copies here. It's possible you may need to help the compiler, a little bit, and use emplace_back() instead of push_back() (assuming C++11, or later):
pairVector.emplace_back(strVector[i], intVector[i]);
Going back to the loop condition, strVector.size() gets evaluated on each iteration of the loop. It's very likely that a modern C++ compiler will optimize it away, but just in case you can also help your compiler check the vector's size() only once:
int i=strVector.size();
for (int i = 0; i < n; ++i )
This is really a stretch, but it might eke out a few extra quantums of execution time. And that pretty much all obvious optimizations here. Realistically, the most to be gained here is by using reserve(). The other optimizations might help things a little bit more, but it all boils down to moving a certain number of bytes from one area in memory to another area. There aren't really special ways of doing that, that's faster than other ways.
We can use std:generate() to achieve this:
#include <bits/stdc++.h>
using namespace std;
vector <string> strVector{ "hello", "world" };
vector <int> intVector{ 2, 3 };
pair<string, int> f()
{
static int i = -1;
++i;
return make_pair(strVector[i], intVector[i]);
}
int main() {
int min_Size = min(strVector.size(), intVector.size());
vector< pair<string,int> > pairVector(min_Size);
generate(pairVector.begin(), pairVector.end(), f);
for( int i = 0 ; i < 2 ; i++ )
cout << pairVector[i].first <<" " << pairVector[i].second << endl;
}
I'll try and summarize what you want with some possible answers depending on your situation. You say you want a new vector that is essentially a zipped version of two other vectors which contain two heterogeneous types. Where you can access the two types as some sort of pair?
If you want to make this more efficient, you need to think about what you are using the new vector for? I can see three scenarios with what you are doing.
The new vector is a copy of your data so you can do stuff with it without affecting the original vectors. (ei you still need the original two vectors)
The new vector is now the storage mechanism for your data. (ei you
no longer need the original two vectors)
You are simply coupling the vectors together to make use and representation easier. (ei where they are stored doesn't actually matter)
1) Not much you can do aside from copying the data into your new vector. Explained more in Sam Varshavchik's answer.
3) You do something like Shakil's answer or here or some type of customized iterator.
2) Here you make some optimisations here where you do zero coping of the data with the use of a wrapper class. Note: A wrapper class works if you don't need to use the actual std::vector < std::pair > class. You can make a class where you move the data into it and create access operators for it. If you can do this, it also allows you to decompose the wrapper back into the original two vectors without copying. Something like this might suffice.
class StringIntContainer {
public:
StringIntContaint(std::vector<std::string>& _string_vec, std::vector<int>& _int_vec)
: string_vec_(std::move(_string_vec)), int_vec_(std::move(_int_vec))
{
assert(string_vec_.size() == int_vec_.size());
}
std::pair<std::string, int> operator[] (std::size_t _i) const
{
return std::make_pair(string_vec_[_i], int_vec_[_i]);
}
/* You may want methods that return reference to data so you can edit it*/
std::pair<std::vector<std::string>, std::vector<int>> Decompose()
{
return std::make_pair(std::move(string_vec_), std::move(int_vec_[_i])));
}
private:
std::vector<std::string> _string_vec_;
std::vector<int> int_vec_;
};

Inspecting pointers to objects inside a vector C++

I am currently going through some code and I currently have a road class, with a vector of pointers to lanes (a private member), and this road class includes a lane class. This lane class contains a vector of pointers to vehicles, which is another class that contains simple get and set functions to update and obtain a vehicle's position, velocity etc. Now, I have vehicles moving in separate lanes and I allow them to switch lanes, as it is so in traffic flow. However, I would like my vehicles to continuously find a distance from it and the vehicle in front, i.e., look in the vehicles vector and find the closest vehicle. Then I intend to use that to instruct whether a car should decelerate or not. I would also like to make sure that cars which are leading the rest, since once a vehicle leaves the displaywindow height, they should be deleted.
My attempt at this is as follows:
void Lane::Simulate(double time)
{ // This simulate allows check between other vehicles.
double forwardDistance = 0;
for (unsigned int iV = 0; iV < fVehicles.size(); iV++)
{
for(unsigned int jV = 0; jV < fVehicles.size(); jV++)
{
forwardDistance = fVehicles[iV]->getPosition() - fVehicles[jV]->getPosition();
}
}
if(fVehicles.size() < 15)
{
addRanVehicle(); // Adds a vehicle, with position zero but random velocities, to each lane.
}
for (unsigned int iVehicle = 0; iVehicle < fVehicles.size(); iVehicle++)
{
fVehicles[iVehicle]->Simulate(time); // Updates position based on time, velocity and acceleration.
}
}
There may be a much better method than using this forwardDistance parameter. The idea is to loop over each pair of vehicles, avoid the point iV == jV, and find the vehicle which is in front of the iVth vehicle, and record the distance between the two vehicles into a setDistance() function (which is a function of my Vehicle class). I should then be able to use this to check whether a car is too close, check whether it can overtake, or whether it just has to brake.
Currently, I am not sure how to make an efficient looping mechanism for this.
Investigate the cost of performing an ordered insert of Vehicles into the lane. If the Vehicles are ordered according to position on the road, detecting the distance of two Vehicles is child's play:
Eg
for (size_t n = 0; n < fVehicles.size() - 1; n++)
{
distance = fVehicles[n].getPosition() - fVehicles[n+1].getPosition();
}
This is O(N) vs O(N^2) (using ^ as exponent, not XOR). The price of this simplification is the requiring ordered insert into fVehicles, and that should be O(N): One std::binary_search to detect the insertion point and whatever shuffling is required by fVehicles to free up space to place the Vehicle.
Maintaining ordering of fVehicles may be beneficial in other places as well. Visualizing the list (graphically or by print statements) will be much easier, debugging is generally easier on the human brain when everything is in a nice predictable order, and CPUs... They LOVE going in a nice, predictable straight line. Sometimes you get a performance boost that you didn't see coming. Great write-up on that here: Why is it faster to process a sorted array than an unsorted array?
Only way to be sure if this is better is to try it and measure it.
Other Suggestions:
Don't use pointers to the vehicles.
Not only are they harder to manage, they can slow you down quite a bit. As mentioned above, modern CPUs are really good at going in straight lines, and pointers can throw a kink in that straight line.
You never really know where in dynamic memory a pointer is going to be relative to the last pointer you looked at. But with a contiguous block of Vehicles , when the CPU loads Vehicle N it can possibly also grab Vehicles N+1 and N+2. If it can't because they are too big, it doesn't matter much because it already knows where they are, and while the CPU is processing, and idle memory channel could be reading ahead and grabbing the data you're going to need soon.
With the pointer you save a bit every time you move a Vehicle from lane to lane (pointers are usually much cheaper than objects to copy), but may suffer on each and every loop iteration in each and every simulation tick and the volume really adds up. Bjarne Stroustrup, God-Emperor of C++, has an excellent write up on this problem using linked lists as an example (Note linked list is often worse than vector of pointer, but the idea is the same).
Take advantage of std::deque.
std::vector Is really good at stack-like behaviour. You can add to and remove from the end lightning fast, but if you add to or remove from the beginning, everything in the vector is moved.
Most of the lane insertions are likely to be at one end and the removals at the other simply because older Vehicles will gravitate toward the end as Vehicles are added to the beginning or vise versa. This is a certainty if suggestion 1 is taken and fVehicles is ordered. New vehicles will be added to the lane at the beginning, a few will change lanes into or out of the middle, and old vehicles will be removed from the end. deque is optimized for inserting and removing at both ends so adding new cars is cheap, removing old cars is cheap and you only pay full price for cars that change lanes.
Documentation on std::deque
Addendum
Take advantage of range-based for where possible. Range-based for takes most of the iteration logic away and hides it from you.
Eg this
for (unsigned int iV = 0; iV < fVehicles.size(); iV++)
{
for(unsigned int jV = 0; jV < fVehicles.size(); jV++)
{
forwardDistance = fVehicles[iV]->getPosition() - fVehicles[jV]->getPosition();
}
}
becomes
for (auto v_outer: fVehicles)
{
for (auto v_inner: fVehicles)
{
forwardDistance = v_outer->getPosition() - v_inner->getPosition();
}
}
It doesn't look much better if you are counting lines, but you can't accidentally
iV <= fVehicles.size()
or
fVehicles[iV]->getPosition() - fVehicles[iV]->getPosition()
It removes the possibility for you to make silly, fatal, and hard-to-spot errors.
Let's break one down:
for (auto v_outer: fVehicles)
^ ^ ^
type | |
variable name |
Container to iterate
Documentation on Range-based for
In this case I'm also taking advantage of auto. auto allows the compiler to select the type of the data. The compiler knows that fVehicles contains pointers to Vehicles, so it replaces auto with Vehicle * for you. This takes away some of the headaches if you find yourself refactoring the code later.
Documentation on auto
Unfortunately in this cans it can also trap you. If you follow the suggestions above, fVehicles becomes
std::dequeue<Vehicle> fVehicles;
which means auto is now Vehicle. Which makes v_outer a copy, costing you copying time and meaning if you change v_outer, you change a copy and the original goes unchanged. to avoid that, tend toward
for (auto &v_outer: fVehicles)
The compiler is good at deciding how best to handle that reference or if it even needs it.

C++ row and columns matrix manipulation

I've created a 2D matrix as a vector of vectors like this :
vector<vector<int>> mat;
now I need to swap the row and columns of my matrix for example :
row 0 swapped with row 4
column 5 swapped with column 1
the rows aren't a problem since there is the swap() function of the stl library. Exchanging rows though seems quite problematic because, of course, they are not considered as one atomic structure. so at this point I'm really stuck... I've considered doing it brutally swapping every element of the rows I'm interested in, but it seems quite inelegant. Any idea of how I could achieve my goal ?
If you consider "elenance" as a STL function that can do all this stuff for you, then there's no function like this. The aim of STL is not about making your code as simple as possible, the creators of C++ only add to STL things that:
Is really hard to implement with the current language's instrument
Things that need a special support from your compiler (special optimization, etc.)
Some elements that became common
So, just implement by your own.
If you don't want to use for (;;) loops because it's not "elegant" at some point, then you can do something like this:
/* swapping column i and j */
std::vector<std::vector<T>> mat;
std::for_each(mat.begin(), mat.end(), [i,j](std::vector<int>& a)
{ std::swap(a[i], a[j]); });
Update: If the speed is important for you and you want to swap columns as fast as swapping rows (in O(1) ), then you can use this implementation (that takes extra space)):
std::vector<std::vector<int>> mat;
/* preprocessing */
std::vector<int> permutation(mat[0].size());
std::iota(permutation.begin(), permutation.end(), 0);
/* now, if you need to get the element mat[i][j] */
mat_i_j = mat[i][ permutation[j] ];
/* if you want to swap column i and j */
std::swap(permutation[i], permutation[j]);

Pass vector position in std::for_each

I have a data structure in sparse compressed column format.
For my given algorithm, I need to iterate over all the values in a "column" of data and do a bunch of stuff. Currently, it is working nicely using a regular for loop. The boss wants me to re-code this as a for_each loop for future parallelization.
For those not familiar with sparse compressed column, it use 2 (or 3) vectors to represent the data. One vector is just a long list of values. The second vector is the index of where each column starts.
The current version
// for processing data in column 5
vector values;
vector colIndex;
vector rowIndex;
int column = 5;
for(int i = conIndex[5]; i != colIndex[6]; i++){
value = values[i];
row = rowIndex[i];
// do stuff
}
The key is that I need to know the location(as an integer) in my values column in order to lookup the row position (And a bunch of other stuff I'm not bothering to list here.)
If I use the std::for_each() function, I get the value at the position, not the position. I need the position itself.
One thought, and clearly not efficient, would be to create a vector of integers the same length as my data. That way, I could pass an iterator over that dummy vector to the function in for_each and the value passed to my function would be the postion. However, this seems like the least efficient way.
Any thoughts?
My challenge is that I need to know the position in the vector. for_each takes an iterator and sends the value of that iterator to the function.
Use boost::counting_iterator<int>, or implement your own.
#n.m.'s answer is probably the best, but it is possible with only what the standard library provides, though painfully slow I assume:
void your_loop_func(const T& val){
iterator it = values.find(val);
std::ptrdiff_t index = it - values.begin();
value = val;
row = rowIndices[index];
}
And after writing that, I really can only recommend the Boost counting_iterator version. ;)