Exploiting fact that elements of vector are stored in heap? - c++

Lets say you have something like this
#include <iostream>
#include <vector>
using namespace std;
vector<int> test()
{
vector <int> x(1000);
for (int i = 0; i < 1000; i++)
{
x[i] = 12345;
}
return x;
}
int main(int argc, const char * argv[])
{
vector<int> a = test();
return 0;
}
where within a function you create a vector and fill it with some elements (in this case I chose 12345 but they won't necessarily all be the same).
I have read that the elements of the vector are stored on the heap whereas the reference and header data are stored on the stack. In the above code, when x is returned a copy-constructor must be called, and this takes O(n) time to copy all the elements into a new vector.
However, is it possible to take advantage of the fact that all the elements already exist on the heap in order to just return something like a pointer to those elements and later just create a vector that uses that pointer in order to point to those exact same elements — thus avoiding the need to make a copy all the elements of a vector?

The compiler does this for you, freeing you up to write nice , easy-to-read code, rather than mangling your code for the sake of optimization.
When you return a value for a function, the compiler is allowed to elide the return value object. The net effect is that the compiler can just create x in the actual memory location of a.
Even if it doesn't do this (e.g. it chooses not to for some reason, or you disable it by a compiler switch), then there is still the possibility of a move.
When a move happens, the vector will just transfer ownership of the pointer from x to the return value, and then from the return value to a. This leaves x etc. as an empty vector, which is then correctly destroyed.
You could explore this by writing a test class (instead of vector<int>) which prints something out for its default constructor, copy-constructor, and move-constructor, e.g.
#include <iostream>
struct A
{
A() { std::cout << "default\n"; }
A(A const &) { std::cout << "copy\n"; }
A(A &&) { std::cout << "move\n"; }
};
A func() { A a; return a; }
int main()
{
A b (func());
}
Output with g++:
default
Output with g++ -fno-elide-constructors:
default
move
move

Related

Pass vector by reference to constructor of class

I have a class called test with which I want to associate a large vector with in the order of million elements. I have tried doing this by passing a pointer to the constructor:
#include <iostream>
#include <vector>
using namespace std;
class test{
public:
vector<double>* oneVector;
test(vector<double>* v){
oneVector = v;
}
int nElem(){return oneVector->size();}
};
int main(){
vector<double> v(1000000);
cout << v.size() << endl;
vector<double>* ptr;
test t(ptr);
cout << t.nElem()<< endl;
return 0;
}
However, this results in a Segmentation Fault:11, precisely when I try to do t.nElem(). What could be the problem?
This is C++, don't work with raw pointers if you don't absolutely need to. If the goal is to take ownership of a std::vector without copying, and you can use C++11, make your constructor accept an r-value reference, and give it sole ownership of the std::vector that you're done populating with std::move, which means only vector's internal pointers get copied around, not the data, avoiding the copy (and leaving the original vector an empty shell):
class test{
public:
vector<double> oneVector;
test(vector<double>&& v):oneVector(std::move(v)){
}
int nElem(){return oneVector.size();}
};
int main(){
vector<double> v(1000000);
cout << v.size() << endl;
test t(std::move(v));
cout << t.nElem()<< endl;
return 0;
}
If you really want a pointer to a vector "somewhere else", make sure to actually assign ptr = &v; in your original code. Or new the vector and manage the lifetime across test and main with std::shared_ptr. Take your pick.
ptr is not initialized. What you "want" to do is:
test t(&v);
However, I think you'd be better suited with references here (it's in the title of your question after all!). Using references avoids unnecessary syntax (like -> over .) which just unnecessarily hinder the reading of the code as written.
class test
{
std::vector<double>& oneVector;
public:
test(vector<double>& v) : oneVector(v) {}
size_t nElem() const { return oneVector.size(); }
};
ptr is an uninitialized pointer. This unpredictable value gets copied to t.oneVector. Dereferencing it is undefined behavior.
You need your pointer to actually point at a valid vector.
You forgot to give your pointer the desired value, namely the address of the vector:
vector<double>* ptr = &v;
// ^^^^^^
In your code, ptr remains uninitialized, and your program has undefined behaviour.

Unwanted value changes in 1D and 2D arrays returning a value from a function in c++ code

I have a multi-file program that reads data from a file and stores the values in various arrays. The size of the arrays is not known during the compiling. After the values are stored, I use another function to determine the maximum and minimum of each array and return the max/min. Before the "return maximum" statement, the values in the array are correct. After "return maximum", the values are changed or erased.
Here is some of the code including one of the 2D arrays and one of the 1D arrays (there are a few more of those but I removed them so there's less code for you to look at)
**EDITED:
FunctionValues.h: ** removed destructor block
class FunctionValues
{
//define variables, set up arrays of unknown size
public:
float **xvel;
int *imax;
int vessels;
int tot_gridpt;
public:
//Constructor -- initialization of an object performed here
FunctionValues(): xvel(NULL), imax(NULL) {}
//Destructor
~FunctionValues() {
}
void read_function(string filename);
};
FunctionValues.cpp: (this reads a file with some imax values, vessel numbers and velocities and stores them in the appropriate arrays, the other includes are also there) All the arrays made are stored in FunctionValues myval object
#include "FunctionValues.h"
using namespace std;
void FunctionValues::read_function(string filename)
{
std::ifstream myfile(filename.c_str());
//acquire variables
myfile >> vessels; //number of vessels
imax = new int[vessels];
//... code reading the file and storing them, then imax and some other values are multiplied to get int tot_gridpt
xvel = new float *[vessels];
for (int i = 0; i < vessels; i++)
{
xvel[i] = new float[tot_gridpt];
}
//arrays filled
for (int i = 0; i < limiter; i++)
{
myfile >> xvel[count][i];
}
}
Gridpts.cpp: ** range() arguments and parameters
#include "FunctionValues.h"
#include "Gridpts.h"
using namespace std;
// forward declarations
float range(float **velocities, const FunctionValues *myval, int num);
void Gridpts::create_grid(FunctionValues *myval, int ptsnum)
{
//find range, 1 for max, 0 for min from smooth wall simulation results rounded to the nearest integer
float maximum = range(myval->xvel, &myval, 1);
float minimum = range(myval->xvel, &myval, 0);
}
range.cpp: ** arguments changed to pass by pointer
float range(float **velocities, const FunctionValues *myval, int num)
{
if (num == 1)
{
float maximum = 0;
for (int round = 0; round < myval->vessels; round++)
{
for (int count = 0; count < myval->tot_gridpt; count++)
{
if (velocities[round][count] > maximum)
{
maximum = velocities[round][count];
}
}
}
maximum = ceil(maximum);
return maximum;
}
main.cpp:
corner_pts.create_grid(&myval, ptsnum);
This is where the error occurs. cout << "CHECKPOINT: " << myval.xvel[0][0] before "return maximum;" gives -0.39032 which is correct. After "return maximum", causes nothing to be printed and then the program crashes when trying run range() again using the xvel array. Similarly for myval.imax[0].
I apologize for copying in so much code. I tried to only include the essential to what is happening with the array. I have only started programming for about a month so I'm sure this is not the most efficient way to write code but I would greatly appreciate any insight as to why the arrays are being changed after returning a float. Thank you in advance for your time. (And if I have broken any rule about posting format, please let me know!)
So your program crashes when you call range() the second time. Therefore, your issue is most likely there.
Your program is crashing because you are taking your FunctionValues parameter by value, which is then destroyed at the end of the scope of the function, since it is local to the function.
// issue with myval being taken as a copy
float range(float **velocities, FunctionValues myval, int num)
{
//...
} // destructor for local function arguments are called, including myval's destructor
Explanation
Your function parameter FunctionValues myval is taken by copy. Since you have no copy constructor defined, this means that the default copy behavior is used. The default copy behavior simply copies the object data from the supplied argument at the call site.
For pointers, since they hold addresses, this means that you are copying the addresses of those pointers into an object local to the range() function.\
Since myval is local to the range() function, its destructor is called at the end of the scope of the function. You are left with dangling pointers; pointers holding the memory addresses of memory that you have already given back to the free store.
Simplified example of your error:
#include <iostream>
class X
{
public:
X() : p{ new int{ 0 } }
{
}
~X()
{
std::cout << "Deleting!" << std::endl; // A
delete p; // B
}
private:
int* p;
};
void func(X param_by_value) // C
{
// ...
}
int main()
{
X x; // D
func(x); // E
func(x); // F
}
You have variable x (D). You use it to call the function func() (E).
func() takes a parameter of type X by value, for which the variable name is param_by_value (C).
The data of x is copied onto param_by_value. Since param_by_value is local to func(), its destructor is called at the end of func().
Both x and param_by_value have an int* data member called p that holds the same address, because of 3..
When param_by_value's destructor is called, we call delete on param_by_value's p (B), but x's p still holds the address that was deleted.
You call func() again, this time the same steps are repeated. x is copied onto param_by_value. However, this time around, you try to use memory that has been given back to the free store (by calling delete on the address) and (luckily) get an error. Worse yet, when main() exits, it will attempt to call x's destructor again.
You need to do some research into function parameters in C++. Passing by value, passing by reference, passing by pointer, and all of those combined with const.
As user #MichaelBurr points out, you should also look up the rule of three (and rule of five).
I'm just wondering why you opted not to use functionality like std::max/min_element in and std::valarray/vector to allocate a contiguous chunk of memory?
Worse case scenario, if you're a fan of the explicit nature of 2d arrays x[a][b] you could create a basic matrix:
template <typename T>
class Matrix {
public:
Matrix(std::valarray<int>& dims) : dims(dims) {}
Matrix(std::valarray<int>& dims, std::valarray<T>& data) : dims(dims), data(data) {}
std::Matrix<T> Matrix::operator[](int i) {
auto newDims = std::valarray<int>(dims[1], dims.size() - 1);
auto stride = std::accumulate(std::begin(newDims), std::begin(newDims) + newDims.size(), 1, [](int a, int b){ return a * b; })
auto newData = std::valarray<T>(data[i * stride], data.size() - (i * stride));
return Matrix<T>(newDims, newData);
}
protected:
std::valarray<T> data;
std::valarray<int> dims;
}
I think more reliance on the standard libraries for their correctness will likely solve any memory access/integrity issues.

randomizing constructor parameters in c++'s vector resize() function

I have the following code snippet. I'd like it to fill up a vector with different instances of my Object object. That is, every time it adds an Object to the vector, it should call rand() and populate that object with a set of values unique to the other Object objects. Instead, this populates all the objects with the same values.
std::vector<Object> *objects;
Image::Image(unsigned nObjects)
{
srand(2);
objects = new std::vector<Object>();
this->nObjects = nObjects;
objects->resize(nObjects, Object(rand(), rand(), rand(), rand()));
for(int i = 0; i < objects->size(); ++i)
std::cout << objects->at(i).getX1() << std::endl;
}
That's because you're calling the parameterized constructor once to create a prototype object. After that the copy constructor is being called. As in the documentation:
If the current size is less than count, additional elements are appended and initialized with copies of value.
If you want to add N new items then AFAIK you need to use push_back (or preferably, emplace_back in C++11) in a loop:
#include <iostream>
#include <vector>
using namespace std;
struct Object {
int alpha;
int beta;
Object (int alpha, int beta) : alpha (alpha), beta (beta) {}
};
int main() {
vector<Object> objs;
int nObjects = 5;
for (int i = 0; i < nObjects; i++) {
objs.emplace_back(rand(), rand());
}
for (Object & o : objs) {
cout << o.alpha << "," << o.beta << "\n";
}
}
There are some routines in <algorithm> like generate_n to overwrite a range of items using the result of a function that is called repeatedly. But I don't think that's what you want here; as it requires the collection to already have objects in it that you overwrite. That's my impression. So when adding new items to the end when each object needs to be generated uniquely, I think emplace_back is the way to go.
As an additional note, always remember to reduce your example. If it's not about memory allocation, don't include a new. If two fields are enough, don't use four. And always if possible, submit your question in the form of a "Minimal, Complete, Verifiable Example" (like the code above).

Deep copy of a matrix-like class

I've got a class that shall behave like matrix.
So the usecase is something like:
Matrix matrix(10,10);
matrix[0][0]=4;
//set the values for the rest of the matrix
cout<<matrix[1][2]<<endl;
code:
#include <iostream>
#include <cstdlib>
#include <cstdio>
#include <cstring>
#include <sstream>
using namespace std;
class Matrix {
public:
Matrix(int x, int y);
class Proxy {
public:
Proxy(int* _array) : _array(_array) {
}
int &operator[](int index) const {
return _array[index];
}
private:
int* _array;
};
Proxy operator[](int index) const {
return Proxy(_arrayofarrays[index]);
}
Proxy operator[](int index) {
return Proxy(_arrayofarrays[index]);
}
const Matrix& operator=(const Matrix& othersales);
private:
int** _arrayofarrays;
int x, y;
};
Matrix::Matrix(int x, int y) {
_arrayofarrays = new int*[x];
for (int i = 0; i < x; ++i)
_arrayofarrays[i] = new int[y];
}
const Matrix& Matrix::operator=(const Matrix& othermatrix) {
new (this) Matrix(x, y);
for (int i = 0; i < 3; i++)
for (int j = 0; j < 3; j++)
_arrayofarrays[i][j] = othermatrix._arrayofarrays[i][j];
return *this;
}
int main() {
Matrix a(2, 3);
a[0][0] = 1;
a[0][1] = 2;
a[0][2] = 3;
a[1][0] = 4;
a[1][1] = 5;
a[1][2] = 6;
cout << a[1][2] << endl;
//prints out 6
const Matrix b = a;
cout << b[1][2] << endl;
a[1][2] = 3;
cout << a[1][2] << endl;
// prints out 3
cout << b[1][2] << endl;
// prints out 3 as well
}
By calling const Matrix b = a; I want to create new instance of Matrix, that will have the same values as a has in that moment. Nevertheless b is being affected by changing the values in a. So if I change some value in a, then it changes in b as well. And I don't want it to behave like this.
So I need to create a copy of b that would not be affected by a itself.
Those might be stupid question, but for me, as a java guy and a C++ newbie are all those stuff really confusing, so thanks for any helpful advices...
There are a few issues with your implementation. The simple one is the error you are getting...
In your Matrix class, operator[] is a non-const member function, which means that it can only be executed on non-const objects. Your operator= takes the right hand side object by const &, and thus you cannot call operator[] on it. The issue here is that you are not offering an implementation of operator[] that promises not to modify the object, once you add that to your type it should compile.
More important than that is the fact that you are leaking memory. When you call operator= on an object you are creating a different Matrix in place, without previously releasing the memory that it held. That is a memory leak.
The implementation of operator= is also not thread-safe. If allocation of memory for any of the internal arrays fails and throws an exception you are leaving your object in a state that is neither the original one nor a valid state. This is bad in itself.
Related to the previous, in as much as correcting one probably leads to the other, your implementation of operator= is not safe if there is aliasing, that is, it fails if you self-assign. The first line will leak the memory and create the new buffers, and from there on you will copy the new buffer into itself, loosing the original information.
Finally, the implementation of the type could be improved if you drop the requirement of using operator[] and use instead operator() with the two indices. User code will have to be adapted (and look less like a bidimensional array) but it provides a bit more freedom of representation (you can store the information internally in any way you want). At the same time, there is no need to allocate an array of pointers and then N arrays of int. You can perform a single memory allocation of NxM ints and do pointer arithmetic to address each location (this is independent of the use of operator[]/operator()), which will reduce the memory footprint and make the layout more compact, improving cache performance (not to mention reducing the number of dynamic allocations by a factor of M)
By calling const Matrix b = a; I want to create new instance of Matrix, that will have the same values of a in that moment. Nevertheless b is being affected by changing the values in a.
Well, this is yet another issue I missed in the first read. The expression const Matrix b = a; does not involve operator=, but rather the copy constructor. Another thing to google: Rule of the Three (basically, if you implement one of copy-constructor, assignment or destructor manually, you probably want to implement all three). Without defining your own copy constructor the compiler will implicitly define one for you that does a shallow copy (i.e. copies the pointers stored in Matrix but does not allocate memory for it). After the copy is made both Matrix share the same memory, and if your destructor releases the memory, you will run into Undefined Behavior when the second destructor runs and tries to delete [] the already deleted memory.

Efficient way to return a std::vector in c++

How much data is copied, when returning a std::vector in a function and how big an optimization will it be to place the std::vector in free-store (on the heap) and return a pointer instead i.e. is:
std::vector *f()
{
std::vector *result = new std::vector();
/*
Insert elements into result
*/
return result;
}
more efficient than:
std::vector f()
{
std::vector result;
/*
Insert elements into result
*/
return result;
}
?
In C++11, this is the preferred way:
std::vector<X> f();
That is, return by value.
With C++11, std::vector has move-semantics, which means the local vector declared in your function will be moved on return and in some cases even the move can be elided by the compiler.
You should return by value.
The standard has a specific feature to improve the efficiency of returning by value. It's called "copy elision", and more specifically in this case the "named return value optimization (NRVO)".
Compilers don't have to implement it, but then again compilers don't have to implement function inlining (or perform any optimization at all). But the performance of the standard libraries can be pretty poor if compilers don't optimize, and all serious compilers implement inlining and NRVO (and other optimizations).
When NRVO is applied, there will be no copying in the following code:
std::vector<int> f() {
std::vector<int> result;
... populate the vector ...
return result;
}
std::vector<int> myvec = f();
But the user might want to do this:
std::vector<int> myvec;
... some time later ...
myvec = f();
Copy elision does not prevent a copy here because it's an assignment rather than an initialization. However, you should still return by value. In C++11, the assignment is optimized by something different, called "move semantics". In C++03, the above code does cause a copy, and although in theory an optimizer might be able to avoid it, in practice its too difficult. So instead of myvec = f(), in C++03 you should write this:
std::vector<int> myvec;
... some time later ...
f().swap(myvec);
There is another option, which is to offer a more flexible interface to the user:
template <typename OutputIterator> void f(OutputIterator it) {
... write elements to the iterator like this ...
*it++ = 0;
*it++ = 1;
}
You can then also support the existing vector-based interface on top of that:
std::vector<int> f() {
std::vector<int> result;
f(std::back_inserter(result));
return result;
}
This might be less efficient than your existing code, if your existing code uses reserve() in a way more complex than just a fixed amount up front. But if your existing code basically calls push_back on the vector repeatedly, then this template-based code ought to be as good.
It's time I post an answer about RVO, me too...
If you return an object by value, the compiler often optimizes this so it doesn't get constructed twice, since it's superfluous to construct it in the function as a temporary and then copy it. This is called return value optimization: the created object will be moved instead of being copied.
A common pre-C++11 idiom is to pass a reference to the object being filled.
Then there is no copying of the vector.
void f( std::vector & result )
{
/*
Insert elements into result
*/
}
If the compiler supports Named Return Value Optimization (http://msdn.microsoft.com/en-us/library/ms364057(v=vs.80).aspx), you can directly return the vector provide that there is no:
Different paths returning different named objects
Multiple return paths (even if the same named object is returned on
all paths) with EH states introduced.
The named object returned is referenced in an inline asm block.
NRVO optimizes out the redundant copy constructor and destructor calls and thus improves overall performance.
There should be no real diff in your example.
vector<string> getseq(char * db_file)
And if you want to print it on main() you should do it in a loop.
int main() {
vector<string> str_vec = getseq(argv[1]);
for(vector<string>::iterator it = str_vec.begin(); it != str_vec.end(); it++) {
cout << *it << endl;
}
}
follow code will works without copy constructors:
your routine:
std::vector<unsigned char> foo()
{
std::vector<unsigned char> v;
v.resize(16, 0);
return std::move(v); // move the vector
}
After, You can use foo routine for get the vector without copy itself:
std::vector<unsigned char>&& moved_v(foo()); // use move constructor
Result: moved_v size is 16 and it filled by [0]
As nice as "return by value" might be, it's the kind of code that can lead one into error. Consider the following program:
#include <string>
#include <vector>
#include <iostream>
using namespace std;
static std::vector<std::string> strings;
std::vector<std::string> vecFunc(void) { return strings; };
int main(int argc, char * argv[]){
// set up the vector of strings to hold however
// many strings the user provides on the command line
for(int idx=1; (idx<argc); ++idx){
strings.push_back(argv[idx]);
}
// now, iterate the strings and print them using the vector function
// as accessor
for(std::vector<std::string>::interator idx=vecFunc().begin(); (idx!=vecFunc().end()); ++idx){
cout << "Addr: " << idx->c_str() << std::endl;
cout << "Val: " << *idx << std::endl;
}
return 0;
};
Q: What will happen when the above is executed? A: A coredump.
Q: Why didn't the compiler catch the mistake? A: Because the program is
syntactically, although not semantically, correct.
Q: What happens if you modify vecFunc() to return a reference? A: The program runs to completion and produces the expected result.
Q: What is the difference? A: The compiler does not
have to create and manage anonymous objects. The programmer has instructed the compiler to use exactly one object for the iterator and for endpoint determination, rather than two different objects as the broken example does.
The above erroneous program will indicate no errors even if one uses the GNU g++ reporting options -Wall -Wextra -Weffc++
If you must produce a value, then the following would work in place of calling vecFunc() twice:
std::vector<std::string> lclvec(vecFunc());
for(std::vector<std::string>::iterator idx=lclvec.begin(); (idx!=lclvec.end()); ++idx)...
The above also produces no anonymous objects during iteration of the loop, but requires a possible copy operation (which, as some note, might be optimized away under some circumstances. But the reference method guarantees that no copy will be produced. Believing the compiler will perform RVO is no substitute for trying to build the most efficient code you can. If you can moot the need for the compiler to do RVO, you are ahead of the game.
vector<string> func1() const
{
vector<string> parts;
return vector<string>(parts.begin(),parts.end()) ;
}
This is still efficient after c++11 onwards as complier automatically uses move instead of making a copy.