Creating objects in a loop

Creating objects in a loop - c++

Here is a method creating a Clustering object and returning it by value.
Clustering ClusteringGenerator::makeOneClustering(Graph& G) {
int64_t n = G.numberOfNodes();
Clustering zeta(n);
cluster one = zeta.addCluster();
for (node v = G.firstNode(); v <= n; ++v) {
zeta.addToCluster(one, v);
}
return zeta;
}
This loop calls the method multiple times and adds the pointer to the return value to a vector.
int z = 3
for (int i = 0; i < z; ++i) {
// FIXME: why is zeta the same each iteration?
Clustering zeta = clusterGen.makeOneClustering(G);
DEBUG(&zeta);
clusterings.push_back(&zeta);
}
The output of the DEBUG statement is
0x7fff4ff894d0
0x7fff4ff894d0
0x7fff4ff894d0
So this means that &zeta is the same pointer in each iteration. Why?
How can I get the desired result (create one Clustering object per iteration and remember it in a vector)?

Because zeta is an automatic variable (the one in the loop, well the other one is a local variable, too, but there's nothing inherently wrong with ClusteringGenerator::makeOneClustering), which doesn't exist anymore once the current loop iteration ends (and zeta's destructor has been called). The compiler is thus free to reuse its underlying storage for further variables (like the zeta from the next loop iteration), and would be pretty stupid not to do so.
Likewise is your code error-prone, since it stores the address of a local variable in a container, although this variable doesn't exist anymore after the push_back, like described above.
To solve this, well, either just use a std::vector<Clustering> and put those things in by value, or, if you really need to store pointers (maybe because you don't use/profit from C++11's move semantics and fear the copying overhead), then allocate those loop objects dynamically, to prevent them from being destroyed automatically. But in the latter case (whose usage you should thoroughly overthink anyway, given that the Clustering seems to be copyable pretty well) you should rather use some kind of smart pointer to care for proper destruction of the dynamically allocated objects.

you could define
std::vector<Clustering> clusterings;
and then use
clusterings.push_back(clusterGen.makeOneClustering(G));
if you are using c++11 and Clustering is movable you not even generating a copy. This solution is faster and you dont have to deal with raw pointers.

Thats because you are printing out the address of the variable you created, and it is always the same. The same thing with the vector. You are storing the address and not the actual value. If you want to store the value, try using this.
clustering.push_back(zeta);
Now you are storing the value and not the address....

Clustering * ClusteringGenerator::makeOneClustering(Graph& G) {
int64_t n = G.numberOfNodes();
Clustering * zeta = new Clustering(n);
cluster one = zeta.addCluster();
for (node v = G.firstNode(); v <= n; ++v) {
zeta.addToCluster(one, v);
}
return zeta;
}
This loop calls the method multiple times and adds the pointer to the return value to a vector.
int z = 3
for (int i = 0; i < z; ++i) {
// FIXME: why is zeta the same each iteration?
Clustering * zeta = clusterGen.makeOneClustering(G);
DEBUG(zeta);
clusterings.push_back(zeta);
}

Related

Error Iterating Through Members of a Struct in Vector

I have a struct and two vectors in my .h file:
struct FTerm {
int m_delay;
double m_weight;
};
std::vector<FTerm> m_xterms;
std::vector<FTerm> m_yterms;
I've already read in a file to populate values to m_xterms and m_yterms and I'm trying to iterate through those values:
vector<FTerm>::iterator terms;
for (terms = m_xterms.begin(); terms < m_xterms.end(); terms++)
{
int delaylength = m_xterms->m_delay * 2; // Assume stereo
double weight = m_xterms->m_weight;
}
Although I'm pretty sure I have the logic wrong, I currently get the error Error expression must have a pointer type. Been stuck at this for a while, thanks.

Change
int delaylength = m_xterms->m_delay * 2;
double weight = m_xterms->m_weight;
to
int delaylength = terms->m_delay * 2;
// ^^^^^
double weight = terms->m_weight;
// ^^^^^
as you want to access values through
vector<FTerm>::iterator terms;
within the loop
for (terms = m_xterms.begin(); terms < m_xterms.end(); terms++)
// ^^^^^
"Although I'm pretty sure I have the logic wrong, ..."
That can't be answered, unless you give more context about the requirements for the logic.

Along with the problem πάντα ῥεῖ pointed out, your code currently has a problem that it simply doesn't accomplish anything except wasting some time.
Consider:
for (terms = m_xterms.begin(); terms < m_xterms.end(); terms++)
{
int delaylength = m_xterms->m_delay * 2; // Assume stereo
double weight = m_xterms->m_weight;
}
Both delaylength and weight are created upon entry to the block, and destroyed on exit--so we create a pair of values, then destroy them, and repeat for as many items as there are in the vector--but never do anything with the values we compute. They're just computed, then destroyed.
Assuming you fix that, I'd also write the code enough differently that this problem simply isn't likely to happen to start with. For example, let's assume you really wanted to modify each item in your array, instead of just computing something from it and throwing away the result. You could do that with code like this:
std::transform(m_xterms.begin(), m_xterms.end(), // Source
m_xterms.begin(), // destination
[](FTerm const &t) { return {t.m_delay * 2, t.m_weight}; });// computation
Now the code actually accomplishes something, and it seems a lot less likely that we'd end up accidentally writing it incorrectly.
Bottom line: standard algorithms are your friends. Unlike human friends, they love to be used.

Understanding a function return

I am a novice programmer and have only briefly covered the anatomy of a function call (setting up the stack, etc.). I can write a function two different ways and I'm wondering which (if either) is more efficient. This is for a finite element program so this function could be called several thousand times. It is using the linear algebra library Aramdillo.
First way:
void Q4::stiffness(mat &stiff)
{
stiff.zeros; // sets all elements of the matrix to zero
// a bunch of linear algebra calculations
// ...
stiff *= h;
}
int main()
{
mat elementStiffness(Q4__DOF, Q4__DOF);
mat globalStiffness(totalDOF, totalDOF);
for (int i = 0; i < reallyHugeNumber; i++)
{
elements[i].stiffness(&elementStiffness, PSTRESS);
assemble(&globalStiffness, &elementStiffness);
}
return 0;
}
Second way:
mat Q4::stiffness()
{
mat stiff(Q4__DOF, Q4__DOF); // initializes element stiffness matrix
// a bunch of linear algebra calculations
// ...
return stiff *= h;
}
int main()
{
mat elementStiffness(Q4__DOF, Q4__DOF);
mat globalStiffness(totalDOF, totalDOF);
for (int i = 0; i < reallyHugeNumber; i++)
{
elementStiffness = elements[i].stiffness(PSTRESS);
assemble(&globalStiffness, &elementStiffness);
}
return 0;
}
I think what I'm asking is: using the second way is mat stiff pushed to the stack and then copied into elementStiffness? Because I imagine the matrix being pushed to the stack and then being copied is much more expensive than passing a matrix be reference and setting its elements to zero.

Passing a variable by reference and doing your calculations on that variable is a lot cheaper. When c++ returns a variable, it pretty much copies it twice.
First inside the function, and then it calls the copy constructor or assignment operator, depending on if the value is being assigned to a new variable or to an existing variable, to initialize the variable. If you have a user-defined variable with a long list of internal state variables then this assignment operation is going to take a big chunk of the operator's processing time.
EDIT#1: I forgot about c++11 and the std::move. Many compilers can optimize functions like this so they can use std::move and instead of copying an lvaue it can copy an rvalue which is just the memory location.

On the surface, I think the second way will be much more expensive as it both constructs a new mat and copies it to the stack on every call. Of course that depends a bit on how often the mat construction takes place in the first way.
That said, I think the best thing to do is setup an experiment and test to make sure (agreeing with the suggestion to research).

Passing a QVector pointer as an argument

1) I want to pass a the pointer of a QVector to a function and then do things with it. I tried this:
void MainWindow::createLinearVector(QVector<float> *vector, float min, float max )
{
float elementDiff=(max-min)/(vector->size()-1);
if(max>min) min -= elementDiff;
else min += elementDiff;
for(int i=0; i< vector->size()+1 ; i++ )
{
min += elementDiff;
*(vector+i) = min; //Problematic line
}
}
However the compiler gives me "no match for operator =" for the *(vector+i) = min; line. What could be the best way to perform actions like this on a QVector?
2) The function is supposed to linearly distribute values on the vector for a plot, in a way the matlab : operator works, for instance vector(a:b:c). What is the simpliest and best way to perform such things in Qt?
EDIT:
With help from here the initial problem is solved. :)
I also improved the metod in itself. The precision could be improved a lot by using linear interpolation instead of multiple additions like above. With multiple addition an error is accumulating, which is eliminated in large part by linear interpolation.
Btw, the if statement in the first function was unecessary and possible to remove by just rearranging stuff a little bit even in the multiple addition method.
void MainWindow::createLinearVector(QVector<double> &vector, double min, double max )
{
double range = max-min;
double n = vector.size();
vector[0]=min;
for(int i=1; i< n ; i++ )
{
vector[i] = min+ i/(n-1)*range;
}
}
I considered using some enchanced loop for this, but would it be more practical?
With for instance a foreach loop I would still have to increment some variable for the interpolation right? And also make a conditional for skipping the first element?

I want to place a float a certain place in the QVector.
Then use this:
(*vector)[i] = min; //Problematic line
A vector is a pointer to a QVector, *vector will be a QVector, which can be indiced with [i] like any QVector. However, due to precedence, one needs parentheses to get the order of operations right.

I think, first u need use the Mutable iterator for this stuff: Qt doc link
Something like this:
QMutableVectorIterator<float> i(vector);
i.toBack();
while (i.hasPrevious())
qDebug() << i.{your code}

Right, so it does not make much sense to use a QVector pointer in here. These are the reasons for that:
Using a reference for the method parameter should be more C++'ish if the implicit sharing is not fast enough for you.
Although, most of the cases you would not even need a reference when just passing arguments around without getting the result back in the same argument (i.e. output argument). That is because *QVector is implicitly shared and the copy only happens for the write as per documentation. Luckily, the syntax will be the same for the calling and internal implementation of the method in both cases, so it is easy to change from one to another.
Using smart pointers is preferable instead of raw pointers, but here both are unnecessarily complex solutions in my opinion.
So, I would suggest to refactor your code into this:
void MainWindow::createLinearVector(QVector<float> &vector, float min, float max)
{
float elementDiff = (max-min) / (vector.size()-1);
min += ((max>min) ? (-elementDiff) : elementDiff)
foreach (float f, vector) {
min += elementDiff;
f = min;
}
}
Note that I fixed up the following things in your code:
Reference type parameter as opposed to pointer
"->" member resolution to "." respectively
Ternary operation instead of the unnatural if/else in this case
Qt's foreach instead of low-level indexing in which case your original point becomes moot
This is then how you would invoke the method from the caller:
createLinearVector(vector, fmin, fmax);

Memory optimization in huge data set

Deal all, I have implemented some functions and like to ask some basic thing as I do not have a sound fundamental knowledge on C++. I hope, you all would be kind enough to tell me what should be the good way as I can learn from you. (Please, this is not a homework and i donot have any experts arround me to ask this)
What I did is; I read the input x,y,z, point data (around 3GB data set) from a file and then compute one single value for each point and store inside a vector (result). Then, it will be used in next loop. And then, that vector will not be used anymore and I need to get that memory as it contains huge data set. I think I can do this in two ways.
(1) By just initializing a vector and later by erasing it (see code-1). (2) By allocating a dynamic memory and then later de-allocating it (see code-2). I heard this de-allocation is inefficient as de-allocation again cost memory or maybe I misunderstood.
Q1)
I would like to know what would be the optimized way in terms of memory and efficiency.
Q2)
Also, I would like to know whether function return by reference is a good way of giving output. (Please look at code-3)
code-1
int main(){
//read input data (my_data)
vector<double) result;
for (vector<Position3D>::iterator it=my_data.begin(); it!=my_data.end(); it++){
// do some stuff and calculate a "double" value (say value)
//using each point coordinate
result.push_back(value);
// do some other stuff
//loop over result and use each value for some other stuff
for (int i=0; i<result.size(); i++){
//do some stuff
}
//result will not be used anymore and thus erase data
result.clear()
code-2
int main(){
//read input data
vector<double) *result = new vector<double>;
for (vector<Position3D>::iterator it=my_data.begin(); it!=my_data.end(); it++){
// do some stuff and calculate a "double" value (say value)
//using each point coordinate
result->push_back(value);
// do some other stuff
//loop over result and use each value for some other stuff
for (int i=0; i<result->size(); i++){
//do some stuff
}
//de-allocate memory
delete result;
result = 0;
}
code03
vector<Position3D>& vector<Position3D>::ReturnLabel(VoxelGrid grid, int segment) const
{
vector<Position3D> *points_at_grid_cutting = new vector<Position3D>;
vector<Position3D>::iterator point;
for (point=begin(); point!=end(); point++) {
//do some stuff
}
return (*points_at_grid_cutting);
}

For such huge data sets I would avoid using std containers at all and make use of memory mapped files.
If you prefer to go on with std::vector, use vector::clear() or vector::swap(std::vector()) to free memory allocated.

erase will not free the memory used for the vector. It reduces the size but not the capacity, so the vector still holds enough memory for all those doubles.
The best way to make the memory available again is like your code-1, but let the vector go out of scope:
int main() {
{
vector<double> result;
// populate result
// use results for something
}
// do something else - the memory for the vector has been freed
}
Failing that, the idiomatic way to clear a vector and free the memory is:
vector<double>().swap(result);
This creates an empty temporary vector, then it exchanges the contents of that with result (so result is empty and has a small capacity, while the temporary has all the data and the large capacity). Finally, it destroys the temporary, taking the large buffer with it.
Regarding code03: it's not good style to return a dynamically-allocated object by reference, since it doesn't provide the caller with much of a reminder that they are responsible for freeing it. Often the best thing to do is return a local variable by value:
vector<Position3D> ReturnLabel(VoxelGrid grid, int segment) const
{
vector<Position3D> points_at_grid_cutting;
// do whatever to populate the vector
return points_at_grid_cutting;
}
The reason is that provided the caller uses a call to this function as the initialization for their own vector, then something called "named return value optimization" kicks in, and ensures that although you're returning by value, no copy of the value is made.
A compiler that doesn't implement NRVO is a bad compiler, and will probably have all sorts of other surprising performance failures, but there are some cases where NRVO doesn't apply - most importantly when the value is assigned to a variable by the caller instead of used in initialization. There are three fixes for this:
1) C++11 introduces move semantics, which basically sort it out by ensuring that assignment from a temporary is cheap.
2) In C++03, the caller can play a trick called "swaptimization". Instead of:
vector<Position3D> foo;
// some other use of foo
foo = ReturnLabel();
write:
vector<Position3D> foo;
// some other use of foo
ReturnLabel().swap(foo);
3) You write a function with a more complicated signature, such as taking a vector by non-const reference and filling the values into that, or taking an OutputIterator as a template parameter. The latter also provides the caller with more flexibility, since they need not use a vector to store the results, they could use some other container, or even process them one at a time without storing the whole lot at once.

Your code seems like the computed value from the first loop is only used context-insensitively in the second loop. In other words, once you have computed the double value in the first loop, you could act immediately on it, without any need to store all values at once.
If that's the case, you should implement it that way. No worries about large allocations, storage or anything. Better cache performance. Happiness.

vector<double) result;
for (vector<Position3D>::iterator it=my_data.begin(); it!=my_data.end(); it++){
// do some stuff and calculate a "double" value (say value)
//using each point coordinate
result.push_back(value);
If the "result" vector will end up having thousands of values, this will result in many reallocations. It would be best if you initialize it with a large enough capacity to store, or use the reserve function :
vector<double) result (someSuitableNumber,0.0);
This will reduce the number of reallocation, and possible optimize your code further.
Also I would write : vector<Position3D>& vector<Position3D>::ReturnLabel(VoxelGrid grid, int segment) const
Like this :
void vector<Position3D>::ReturnLabel(VoxelGrid grid, int segment, vector<Position3D> & myVec_out) const //myVec_out is populated inside func
Your idea of returning a reference is correct, since you want to avoid copying.

`Destructors in C++ must not fail, therefore deallocation does not allocate memory, because memory can't be allocated with the no-throw guarantee.
Apart: Instead of looping multiple times, it is probably better if you do the operations in an integrated manner, i.e. instead of loading the whole dataset, then reducing the whole dataset, just read in the points one by one, and apply the reduction directly, i.e. instead of
load_my_data()
for_each (p : my_data)
result.push_back(p)
for_each (p : result)
reduction.push_back (reduce (p))
Just do
file f ("file")
while (f)
Point p = read_point (f)
reduction.push_back (reduce (p))
If you don't need to store those reductions, simply output them sequentially
file f ("file")
while (f)
Point p = read_point (f)
cout << reduce (p)

code-1 will work fine and is almost the same as code-2, with no major advantages or disadvantages.
code03 Somebody else should answer that but i believe the difference between a pointer and a reference in this case would be marginal, I do prefer pointers though.
That being said, I think you might be approaching the optimization from the wrong angle. Do you really need all points to compute the output of a point in your first loop? Or can you rewrite your algorithm to read only one point, compute the value as you would in your first loop and then use it immediately the way you want to? Maybe not with single Points, but with batches of points. That could potentially cut back on your memory require quite a bit with only a small increase in processing time.

c++ variable declaration

Im wondering if this code:
int main(){
int p;
for(int i = 0; i < 10; i++){
p = ...;
}
return 0
}
is exactly the same as that one
int main(){
for(int i = 0; i < 10; i++){
int p = ...;
}
return 0
}
in term of efficiency ?
I mean, the p variable will be recreated 10 times in the second example ?

It's is the same in terms of efficiency.
It's not the same in terms of readability. The second is better in this aspect, isn't it?
It's a semantic difference which the code keeps hidden because it's not making a difference for int, but it makes a difference to the human reader. Do you want to carry the value of whatever calculation you do in ... outside of the loop? You don't, so you should write code that reflects your intention.
A human reader will need to seek the function and look for other uses of p to confirm himself that what you did was just premature "optimization" and didn't have any deeper purpose.
Assuming it makes a difference for the type you use, you can help the human reader by commenting your code
/* p is only used inside the for-loop, to keep it from reallocating */
std::vector<int> p;
p.reserve(10);
for(int i = 0; i < 10; i++){
p.clear();
/* ... */
}

In this case, it's the same. Use the smallest scope possible for the most readable code.
If int were a class with a significant constructor and destructor, then the first (declaring it outside the loop) can be a significant savings - but inside you usually need to recreate the state anyway... so oftentimes it ends up being no savings at all.
One instance where it might make a difference is containers. A string or vector uses internal storage that gets grown to fit the size of the data it is storing. You may not want to reconstruct this container each time through the loop, instead, just clear its contents and it may not need as many reallocations inside the loop. This can (in some cases) result in a significant performance improvement.
The bottom-line is write it clearly, and if profiling shows it matters, move it out :)

They are equal in terms of efficiency - you should trust your compiler to get rid of the immeasurably small difference. The second is better design.
Edit: This isn't necessarily true for custom types, especially those that deal with memory. If you were writing a loop for any T, I'd sure use the first form just in case. But if you know that it's an inbuilt type, like int, pointer, char, float, bool, etc. I'd go for the second.

In second example the p is visible only inside of the for loop. you cannot use it further in your code.
In terms of efficiency they are equal.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Creating objects in a loop - c++

you could define std::vector<Clustering> clusterings; and then use clusterings.push_back(clusterGen.makeOneClustering(G)); if you are using c++11 and Clustering is movable you not even generating a copy. This solution is faster and you dont have to deal with raw pointers.

Related

Error Iterating Through Members of a Struct in Vector

Understanding a function return

Passing a QVector pointer as an argument

Memory optimization in huge data set

c++ variable declaration

Categories

Resources