How to translate properly the following Java code to C++?
Vector v;
v = getLargeVector();
...
Vector getLargeVector() {
Vector v2 = new Vector();
// fill v2
return v2;
}
So here v is a reference. The function creates a new Vector object and returns a reference to it. Nice and clean.
However, let's see the following C++ mirror-translation:
vector<int> v;
v = getLargeVector();
...
vector<int> getLargeVector() {
vector<int> v2;
// fill v2
return v2;
}
Now v is a vector object, and if I understand correctly, v = getLargeVector() will copy all the elements from the vector returned by the function to v, which can be expensive. Furthermore, v2 is created on the stack and returning it will result in another copy (but as I know modern compilers can optimize it out).
Currently this is what I do:
vector<int> v;
getLargeVector(v);
...
void getLargeVector(vector<int>& vec) {
// fill vec
}
But I don't find it an elegant solution.
So my question is: what is the best practice to do it (by avoiding unnecessary copy operations)? If possible, I'd like to avoid normal pointers. I've never used smart pointers so far, I don't know if they could help here.
Most C++ compilers implement return value optimization which means you can efficiently return a class from a function without the overhead of copying all the objects.
I would also recommend that you write:
vector<int> v(getLargeVector());
So that you copy construct the object instead of default construct and then operator assign to it.
void getLargeVector(vector<int>& vec) {
// fill the vector
}
Is a better approach for now. With c++0x , the problem with the first approach would go by making use of move operations instead copy operations.
RVO can be relied upon to make this code simple to write, but relying RVO can also bite you. RVO is a compiler-dependent feature, but more importantly an RVO-capable compiler can disable RVO depending on the code itself. For example, if you were to write:
MyBigObject Gimme(bool condition)
{
if( condition )
return MyBigObject( oneSetOfValues );
else
return MyBigObject( anotherSetOfValues );
}
...then even an RVO-capable compiler won't be able to optimize here. There are many other conditions under which the compiler won't be able to optimize, and so by my reckoning any code that by design relies on RVO for performance or functionality smells.
If you buy in to the idea that one function should have one job (I only sorta do), then your dilema as to how to return a populated vector becomes much simpler when you realize that your code is broken at the design level. Your function really does two jobs: it instantiates the vector, then it fills it in. Even with all this pedantary aside, however, a more generic & reliable solution exists than to rely on RVO. Simply write a function that populates an arbitrary vector. For example:
#include <cstdlib>
#include <vector>
#include <algorithm>
#include <iostream>
using namespace std;
template<typename Iter> Iter PopulateVector(Iter it, size_t howMany)
{
for( size_t n = 0; n < howMany; ++n )
{
*(it++) = n;
}
return it;
}
int main()
{
vector<int> ints;
PopulateVector(back_inserter(ints), 42);
cout << "The vector has " << ints.size() << " elements" << endl << "and they are..." << endl;
copy(ints.begin(), ints.end(), ostream_iterator<int>(cout, " "));
cout << endl << endl;
static const size_t numOtherInts = 42;
int otherInts[numOtherInts] = {0};
PopulateVector(&otherInts[0], numOtherInts);
cout << "The other vector has " << numOtherInts << " elements" << endl << "and they are..." << endl;
copy(&otherInts[0], &otherInts[numOtherInts], ostream_iterator<int>(cout, " "));
return 0;
}
Why would you like to avoid normal pointers? Is it because you don't want to worry about memory management, or is it because you are not familiar with pointer syntax?
If you don't want to worry about memory management, then a smart pointer is the best approach. If you are uncomfortable with pointer syntax, then use references.
You have the best solution. Pass by reference is the way to handle that situation.
Sounds like you could do this with a class... but this could be unnecessary.
#include <vector>
using std::vector;
class MySpecialArray
{
vector<int> v;
public:
MySpecialArray()
{
//fill v
}
vector<int> const * getLargeVector()
{
return &v;
}
};
Related
I have following C++ object
std::vector<std::vector<SomeClass>> someClassVectors(sizeOFOuter);
where I know the size of "outer" vector, but sizes of "inner" vectors varies. I need to copy the elements of this structure into 1D array like this:
SomeClass * someClassArray;
I have a solution where I use std::copy like this
int count = 0;
for (int i = 0; i < sizeOfOuter; i++)
{
std::copy(someClassVectors[i].begin(), someClassVectors[i].end(), &someClassArray[count]);
count += someClassVectors[i].size();
}
but the class includes large matrices which means I cannot have the "vectors" structure and 1D array allocated twice at the same time.
Any ideas?
Do you previously preallocate someClassArray to a given size? I'd suggest using 1D vector for getting rid of known problems with the plain array if possible.
what about something like this:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
int main() {
std::vector<std::vector<int>> someClassVectors {
{1,2,3},
{4,5,6},
{7,8,9}
};
std::vector<int> flat;
while (!someClassVectors.empty())
{
auto& last = someClassVectors.back();
std::move(std::rbegin(last), std::rend(last), std::back_inserter(flat));
someClassVectors.pop_back();
}
std::reverse(std::begin(flat), std::end(flat));
int * someClassArray = flat.data();
std::copy(someClassArray, someClassArray + flat.size(), std::ostream_iterator<int>(std::cout, " "));
}
The extra reverse operation doesn't have an effect on memory metrics - such an approach helps to avoid unneeded memory reallocations resulting from removing vector elements from beginning to end.
EDIT
Inspired by comments I changed copy to move semantics
Embrace Range-v3 (or whatever will be introduced in C++20) and write a solution in (almost) a single line:
auto flattenedRange = ranges::views::join(someClassVectors);
this gives you a range in flattenedRange, which you can loop over or copy somewhere else easily.
This is a possible use case:
#include <iostream>
#include <vector>
#include <range/v3/view/join.hpp>
int main()
{
std::vector<std::vector<int>> Ints2D = {
{1,2,3},
{4},
{5,6}
};
auto Ints1D = ranges::views::join(Ints2D);
// here, going from Ints1D to a C-style array is easy, and shown in the other answer already
for (auto const& Int : Ints1D) {
std::cout << Int << ' ';
}
std::cout << '\n';
// output is: 1 2 3 4 5 6
}
In case you want to get a true std::vector instead of a range, before writing it into a C-style array, you can include this other header
#include <range/v3/range/conversion.hpp>
and pipe join's output into a conversion function:
auto Ints1D = ranges::views::join(Ints2D) | ranges::to_vector;
// auto deduces std::vector<int>
In terms of standard and versions, it doesn't really require much. In this demo you can see that it compiles and runs just fine with
compiler GCC 7.3
library Range-v3 0.9.1
C++14 standard (option -std=c++14 to g++)
As regards the copies
ranges::views::join(Ints2D) is only creating a view on Ints2D, so no copy happens; if view doesn't make sense to you, you might want to give a look at Chapter 7 from Functional Programming in C++, which has a very clear explanation of ranges, with pictures and everything;¹
even assigning that output to a variable, auto Ints1D = ranges::views::join(Ints2D);, does not trigger a copy; Ints1D in this case is not a std::vector<int>, even though it behaves as one when we loop on it (behaves as a vector because it's a view on it);
converting it to a vector, e.g. via | ranges::to_vector, obviously triggers a copy, because you are no more requesting a view on a vector, but a true one;
passing the range to an algorithm which loops on its elements doesn't trigger a copy.
Here's an example code that you can try out:
// STL
#include <iostream>
#include <vector>
// Boost and Range-v3
#include <boost/range/algorithm/for_each.hpp>
#include <range/v3/view/join.hpp>
#include <range/v3/range/conversion.hpp>
struct A {
A() = default;
A(A const&) { std::cout << "copy ctor\n"; };
};
int main()
{
std::vector<std::vector<A>> Ints2D = {
{A{},A{}},
{A{},A{}}
};
using boost::range::for_each;
using ranges::to_vector;
using ranges::views::join;
std::cout << "no copy, because you're happy with the range\n";
auto Ints1Dview = join(Ints2D);
std::cout << "copy, because you want a true vector\n";
auto Ints1D = join(Ints2D) | to_vector;
std::cout << "copy, despite the refernce, because you need a true vector\n";
auto const& Ints1Dref = join(Ints2D) | to_vector;
std::cout << "no copy, because we movedd\n";
auto const& Ints1Dref_ = join(std::move(Ints2D)) | to_vector;
std::cout << "no copy\n";
for_each(join(Ints2D), [](auto const&){ std::cout << "hello\n"; });
}
¹ In an attempt to try giving a clue of what a range is, I would say that you can imagine it as a thing wrapping two iterators, one poiting to the end of the range, the other one pointing to the begin of the range, the latter being incrementable via operator++; this opearator will take care of the jumps in the correct way, for instance, after viewing the element 3 in Ints2D (which is in Ints2D[0][2]), operator++ will make the iterator jump to view the elment Ints[1][0].
I am coming from a C#/Java background into C++, using visual studio community 2017 & plenty of tutorials. I came to the point where am unsure of what is a correct way to write a function to process a vector of data. Should I force a function to use a pointer / reference? Should I let compiler sort it out? What is best practice?
This is my main, I ask for an input on vector size, then pass a pointer to the integer value to function that creates and populates vector with values through a simple for loop.
I then pass the array to another function that performs a shuffle.
vector<int> intVector(int* count)
{
vector<int> vi;
for (int i = 1; i <= *count; i++)
vi.push_back(i);
return vi;
}
vector<int> &randVector(vector<int> *v)
{
shuffle(v->begin(), v->end(), default_random_engine());
return *v;
}
int _tmain(int argc, _TCHAR* argv[])
{
int count;
cout << "Enter vector array size: ";
cin >> count; cout << endl;
cout << "Vector of integers: " << endl;
vector<int> vi = intVector(&count);
for_each(vi.begin(), vi.end(), [](int i) {cout << i << " ";});
cout << endl;
vi = randVector(&vi);
cout << "Randomized vector of integers: " << endl;
for_each(vi.begin(), vi.end(), [](int i) {cout << i << " ";});
cout << endl;
return 0;
}
So my question is, what is the best practice in my case to avoid unnecessary copying. Should I even care about it? Should I rely on compiler to solve it for me?
I am planing to use C++ for game development on desktop and consoles. Understanding memory and performance management is important for me.
You are in charge of enforcing (or avoiding) the copy of objects around.
Regarding your example:
You can avoid using pointers and use a reference instead.
Like in the following:
vector<int>& randVector(vector<int>& v)
{
shuffle(v->begin(), v->end(), default_random_engine());
return v;
}
Note that since you are using a reference, the shuffle operation is already modifying the parameter of randVector so there is no real need to return a reference to it.
As a rule of thumb when you need to pass an object around and you want to avoid a potentially expensive copy you can use references:
void function(<const> Object& v)
{
// do_something_with_v
}
The rules on passing in C++ for typical code are pretty straightforward (though obviously still more complex than languages without references/pointers).
In general, prefer references to pointers, unless passing in null is actually something you might do
Prefer to write functions that don't mutate their inputs, and return an output by value
Inputs should be passed by const reference, unless it is a primitive type like an integer, which should be passed by value
If you need to mutate data in place, pass it by non-const reference
See https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rf-conventional for more details.
The upshot of this is that here are the "correct" signatures for your two functions:
vector<int> intVector(int count);
void randVector(vector<int> &v);
This doesn't take into account iterators which is probably really the correct "generic" way to write the second function but that is a bit more advanced. But, see std::shuffle which lets you randomize any arbitrary container by leveraging iterators: http://en.cppreference.com/w/cpp/algorithm/random_shuffle.
Since you mentioned unnecessary copying, I will mention that when you return things like vector by value, they should never be copied (I'm assuming you're using C++11 or newer). They will instead be "moved", which doesn't have significant overhead. Thus, in newer C++ code, "out parameters" (passing in arguments by reference to mutate them) is significantly discouraged compared to older versions. Good to know in case you encounter dated advice. However, passing in by reference for something like shuffling or sorting is considered an "in/out" parameter: you want to mutate it in place and the existing data is important, not simply being overwritten.
If I define a pointer to an object that defines the [] operator, is there a direct way to access this operator from a pointer?
For example, in the following code I can directly access Vec's member functions (such as empty()) by using the pointer's -> operator, but if I want to access the [] operator I need to first get a reference to the object and then call the operator.
#include <vector>
int main(int argc, char *argv[])
{
std::vector<int> Vec(1,1);
std::vector<int>* VecPtr = &Vec;
if(!VecPtr->empty()) // this is fine
return (*VecPtr)[0]; // is there some sort of ->[] operator I could use?
return 0;
}
I might very well be wrong, but it looks like doing (*VecPtr).empty() is less efficient than doing VecPtr->empty(). Which is why I was looking for an alternative to (*VecPtr)[].
You could do any of the following:
#include <vector>
int main () {
std::vector<int> v(1,1);
std::vector<int>* p = &v;
p->operator[](0);
(*p)[0];
p[0][0];
}
By the way, in the particular case of std::vector, you might also choose: p->at(0), even though it has a slightly different meaning.
return VecPtr->operator[](0);
...will do the trick. But really, the (*VecPtr)[0] form looks nicer, doesn't it?
(*VecPtr)[0] is perfectly OK, but you can use the at function if you want:
VecPtr->at(0);
Keep in mind that this (unlike operator[]) will throw an std::out_of_range exception if the index is not in range.
There's another way, you can use a reference to the object:
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector<int> v = {7};
vector<int> *p = &v;
// Reference to the vector
vector<int> &r = *p;
cout << (*p)[0] << '\n'; // Prints 7
cout << r[0] << '\n'; // Prints 7
return 0;
}
This way, r is the same as v and you can substitute all occurrences of (*p) by r.
Caveat: This will only work if you won't modify the pointer (i.e. change which object it points to).
Consider the following:
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector<int> v = {7};
vector<int> *p = &v;
// Reference to the vector
vector<int> &r = *p;
cout << (*p)[0] << '\n'; // Prints 7
cout << r[0] << '\n'; // Prints 7
// Caveat: When you change p, r is still the old *p (i.e. v)
vector<int> u = {3};
p = &u; // Doesn't change who r references
//r = u; // Wrong, see below why
cout << (*p)[0] << '\n'; // Prints 3
cout << r[0] << '\n'; // Prints 7
return 0;
}
r = u; is wrong because you can't change references:
This will modify the vector referenced by r (v)
instead of referencing another vector (u).
So, again, this only works if the pointer won't change while still using the reference.
The examples need C++11 only because of vector<int> ... = {...};
You can use it as VecPrt->operator [] ( 0 ), but I'm not sure you'll find it less obscure.
It is worth noting that in C++11 std::vector has a member function 'data' that returns a pointer to the underlying array (both const and non-const versions), allowing you to write the following:
VecPtr->data()[0];
This might be an alternative to
VecPtr->at(0);
which incurs a small runtime overhead, but more importantly it's use implies you aren't checking the index for validity before calling it, which is not true in your particular example.
See std::vector::data for more details.
People are advising you to use ->at(0) because of range checking. But here is my advise (with other point of view):
NEVER use ->at(0)! It is really slower. Would you sacrifice performance just because you are lazy enough to not check range by yourself? If so, you should not be programming in C++.
I think (*VecPtr)[0] is ok.
I have a class called test with which I want to associate a large vector with in the order of million elements. I have tried doing this by passing a pointer to the constructor:
#include <iostream>
#include <vector>
using namespace std;
class test{
public:
vector<double>* oneVector;
test(vector<double>* v){
oneVector = v;
}
int nElem(){return oneVector->size();}
};
int main(){
vector<double> v(1000000);
cout << v.size() << endl;
vector<double>* ptr;
test t(ptr);
cout << t.nElem()<< endl;
return 0;
}
However, this results in a Segmentation Fault:11, precisely when I try to do t.nElem(). What could be the problem?
This is C++, don't work with raw pointers if you don't absolutely need to. If the goal is to take ownership of a std::vector without copying, and you can use C++11, make your constructor accept an r-value reference, and give it sole ownership of the std::vector that you're done populating with std::move, which means only vector's internal pointers get copied around, not the data, avoiding the copy (and leaving the original vector an empty shell):
class test{
public:
vector<double> oneVector;
test(vector<double>&& v):oneVector(std::move(v)){
}
int nElem(){return oneVector.size();}
};
int main(){
vector<double> v(1000000);
cout << v.size() << endl;
test t(std::move(v));
cout << t.nElem()<< endl;
return 0;
}
If you really want a pointer to a vector "somewhere else", make sure to actually assign ptr = &v; in your original code. Or new the vector and manage the lifetime across test and main with std::shared_ptr. Take your pick.
ptr is not initialized. What you "want" to do is:
test t(&v);
However, I think you'd be better suited with references here (it's in the title of your question after all!). Using references avoids unnecessary syntax (like -> over .) which just unnecessarily hinder the reading of the code as written.
class test
{
std::vector<double>& oneVector;
public:
test(vector<double>& v) : oneVector(v) {}
size_t nElem() const { return oneVector.size(); }
};
ptr is an uninitialized pointer. This unpredictable value gets copied to t.oneVector. Dereferencing it is undefined behavior.
You need your pointer to actually point at a valid vector.
You forgot to give your pointer the desired value, namely the address of the vector:
vector<double>* ptr = &v;
// ^^^^^^
In your code, ptr remains uninitialized, and your program has undefined behaviour.
Well I am questioning myself if there is a way to pass a vector directly in a parameter, with that I mean, like this:
int xPOS = 5, yPOS = 6, zPOS = 2;
//^this is actually a struct but
//I simplified the code to this
std::vector <std::vector<int>> NodePoints;
NodePoints.push_back(
std::vector<int> {xPOS,yPOS,zPOS}
);
This code ofcourse gives an error; typename not allowed, and expected a ')'
I would have used a struct, but I have to pass the data to a Abstract Virtual Machine where I need to access the node positions as Array[index][index] like:
public GPS_WhenRouteIsCalculated(...)
{
for(new i = 0; i < amount_of_nodes; ++i)
{
printf("Point(%d)=NodeID(%d), Position(X;Y;Z):{%f;%f;%f}",i,node_id_array[i],NodePosition[i][0],NodePosition[i][1],NodePosition[i][2]);
}
return 1;
}
Ofcourse I could do it like this:
std::vector <std::vector<int>> NodePoints;//global
std::vector<int> x;//local
x.push_back(xPOS);
x.push_back(yPOS);
x.push_back(zPOS);
NodePoints.push_back(x);
or this:
std::vector <std::vector<int>> NodePoints;//global
std::vector<int> x;//global
x.push_back(xPOS);
x.push_back(yPOS);
x.push_back(zPOS);
NodePoints.push_back(x);
x.clear()
but then I'm wondering which of the two would be faster/more efficient/better?
Or is there a way to get my initial code working (first snippet)?
Use C++11, or something from boost for this (also you can use simple v.push_back({1,2,3}), vector will be constructed from initializer_list).
http://liveworkspace.org/code/m4kRJ$0
You can use boost::assign as well, if you have no C++11.
#include <vector>
#include <boost/assign/list_of.hpp>
using namespace boost::assign;
int main()
{
std::vector<std::vector<int>> v;
v.push_back(list_of(1)(2)(3));
}
http://liveworkspace.org/code/m4kRJ$5
and of course you can use old variant
int ptr[1,2,3];
v.push_back(std::vector<int>(ptr, ptr + sizeof(ptr) / sizeof(*ptr));
If you don't have access to either Boost or C++11 then you could consider quite a simple solution based around a class. By wrapping a vector to store your three points within a class with some simple access controls, you can create the flexibility you need. First create the class:
class NodePoint
{
public:
NodePoint( int a, int b, int c )
{
dim_.push_back( a );
dim_.push_back( b );
dim_.push_back( c );
}
int& operator[]( size_t i ){ return dim_[i]; }
private:
vector<int> dim_;
};
The important thing here is to encapsulate the vector as an aggregate of the object. The NodePoint can only be initialised by providing the three points. I've also provided operator[] to allow indexed access to the object. It can be used as follows:
NodePoint a(5, 6, 2);
cout << a[0] << " " << a[1] << " " << a[2] << endl;
Which prints:
5 6 2
Note that this will of course throw if an attempt is made to access an out of bounds index point but that's still better than a fixed array which would most likely seg fault. I don't see this as a perfect solution but it should get you reasonably safely to where you want to be.
If your main goal is to avoid unnecessary copies of vector<> then here how you should deal with it.
C++03
Insert an empty vector into the nested vector (e.g. Nodepoints) and then use std::swap() or std::vector::swap() upon it.
NodePoints.push_back(std::vector<int>()); // add an empty vector
std::swap(x, NodePoints.back()); // swaps contents of `x` and last element of `NodePoints`
So after the swap(), the contents of x will be transferred to NodePoints.back() without any copying.
C++11
Use std::move() to avoid extra copies
NodePoints.push_back(std::move(x)); // #include<utility>
Here is the explanation of std::move and here is an example.
Both of the above solutions have somewhat similar effect.