Replace vector of vector with flat memory structure - c++

I have the following type:
std::vector<std::vector<int>> indicies
where the size of the inner vector is always 2. The problem is, that vectors are non-contiguous in memory. I would like to replace the inner vector with something contiguous so that I can cast the flattened array:
int *array_a = (int *) &(a[0][0])
It would be nice if the new type has the [] operator, so that I don't have to change the whole code. (I could also implement it myself if necessary). My ideas are either:
std::vector<std::array<int, 2>>
or
std::vector<std::pair<int, int>>
How do these look in memory? I wrote a small test:
#include <iostream>
#include <array>
#include <vector>
int main(int argc, char *argv[])
{
using namespace std;
vector<array<int, 2>> a(100);
cout << sizeof(array<int, 2>) << endl;
for(auto i = 0; i < 10; i++){
for(auto j = 0; j < 2; j++){
cout << "a[" << i << "][" << j << "] "
<<&(a[i][j]) << endl;
}
}
return 0;
}
which results in:
8
a[0][0] 0x1b72c20
a[0][1] 0x1b72c24
a[1][0] 0x1b72c28
a[1][1] 0x1b72c2c
a[2][0] 0x1b72c30
a[2][1] 0x1b72c34
a[3][0] 0x1b72c38
a[3][1] 0x1b72c3c
a[4][0] 0x1b72c40
a[4][1] 0x1b72c44
a[5][0] 0x1b72c48
a[5][1] 0x1b72c4c
a[6][0] 0x1b72c50
a[6][1] 0x1b72c54
a[7][0] 0x1b72c58
a[7][1] 0x1b72c5c
a[8][0] 0x1b72c60
a[8][1] 0x1b72c64
a[9][0] 0x1b72c68
a[9][1] 0x1b72c6c
It seems to work in this case. Is this behavior in the standard or just a lucky coincidence? Is there a better way to do this?

An array<int,2> is going to be a struct containing an array int[2]; the standard does not directly mandate it, but there really is no other sane and practical way to do it.
See 23.3.7 [array] within the standard. There is nothing in the standard I can find that requires sizeof(std::array<char, 10>)==1024 to be false. It would be a ridiculous QOI (quality of implementation); every implementation I have seen has sizeof(std::array<T,N>) == N*sizeof(T), and anything else I would consider hostile.
Arrays must be contiguous containers which are aggregates that can be initialized by up to N arguments of types convertible to T.
The standard permits padding after such an array. I am aware of 0 compilers who insert such padding.
A buffer of contiguous std::array<int,2> is not guaranteed to be safely accessed as a flat buffer of int. In fact, aliasing rules almost certainly ban such access as undefined behaviour. You cannot even do this with a int[3][7]! See this SO question and answer, and here, and here.
Most compilers will make what you describe work, but the optimizer might decide that access through an int* and through the array<int,2>* cannot access the same memory, and generate insane results. It does not seem worth it.
A standards compliant approach would be to write an array view type (that takes two pointers and forms an iterable range with [] overloaded). Then write a 2d view of a flat buffer, with the lower dimension either a runtime or compile time value. Its [] would then return an array view.
There is going to be code in boost and other "standard extension" libraries to do this for you.
Merge the 2d view with a type owning a vector, and you get your 2d vector.
The only behaviour difference is that when the old vector of vector code copies the lower dimension (like auto inner=outer[i]) it copies data, afer it will instead create a view.

Is there a better way to do this?
I recently finished yet-another-version of Game-of-Life.
The game board is 2d, and yes, the vector of vectors has wasted space in it.
In my recent effort I chose to try a 1d vector for the 2d game board.
typedef std::vector<Cell_t*> GameBoard_t;
Then I created a simple indexing function, for when use of row/col added to the code's readability:
inline size_t gbIndx(int row, int col)
{ return ((row * MAXCOL) + col); }
Example: accessing row 27, col 33:
Cell_t* cell = gameBoard[ gbIndx(27, 33) ];
All the Cell_t* in gameBoard are now packed back to back (definition of vector) and trivial to access (initialize, display, etc) in row/col order using gbIndx().
In addition, I could use the simple index for various efforts:
void setAliveRandom(const GameBoard_t& gameBoard)
{
GameBoard_t myVec(m_gameBoard); // copy cell vector
time_t seed = std::chrono::system_clock::
now().time_since_epoch().count();
// randomize copy's element order
std::shuffle (myVec.begin(), myVec.end(), std::default_random_engine(seed));
int count = 0;
for ( auto it : myVec )
{
if (count & 1) it->setAlive(); // touch odd elements
count += 1;
}
}
I was surprised by how often I did not need row/col indexing.

As far as I know, std::vector are contiguous in memory. Take a look at this questions:
Why is std::vector contiguous?,
Are std::vector elements guaranteed to be contiguous?
In case you have to resize an inner vector, you wouldn't have the whole structure contiguous, but the inner vectors would still be it. If you use a vector of vectors, though, you'd have a fully contiguous structure (and I edit here, sorry I misunderstood your question) meaning that the pointers that point to your inner vectors will also be contiguous.
If you want to implement a structure that is always contiguous, from the first element of the first vector to the last element of the last vector, you can implement it as a custom class that has a vector<int> and elems_per_vector that indicates the number of elements in each inner vector.
Then, you can overload the operator(), so to access to a(i,j) you are actually accessing a.vector[a.elems_per_vector*i+j]. To insert new elements, though, and in order to keep the inner vectors at constant size between them, you'll have to make as many inserts as inner vectors you have.

Related

Pointers vs vectors for arrays c++

In the case I am creating an 'array' on stack in c++, is it better to initialise an empty vector with a reserved number of elements and then pass this to a function like foo() as a reference as below. Or is it better to set an array arrb of size nelems, then using a pointer p_arrb to the address of the first element increment the pointer and assign some value?
#include <iostream>
#include <vector>
void foo(std::vector<int>& arr){
int nelems = arr.capacity();
for (int i = 0; i < nelems; i++){
arr[i] = i;
}
}
int main()
{
int nelems;
std::cout << "Type a number: "; // Type a number and press enter
std::cin >> nelems;
std::vector<int> arr;
arr.reserve(nelems); // Init std lib vector
foo(arr);
int arrb[nelems];
int* p_arrb = &(arrb[0]); // pointer to arrb
for (int i = 0; i < nelems; i ++){
*(p_arrb++) = i; // populate using pointer
}
p_arrb -= nelems; // decrement pointer
return 0;
}
It seems people prefer the use of vector as it is standardised and easier to read? Apart from that, is there any performance benefit to using vector instead of a basic pointer in this case where I do not need to change the size of my vector/array at any point in the code?
What you should use depends on the exact goal you have. In general the best approach is to avoid using "raw arrays" (both dynamic and static) wherever possible.
If you need dynamic array, use std::vector. If you need static array, use std::array.
You can't use the arrb variant because the size of an array must be a compile-time constant in C++, but you are trying to use a runtime size here.
If your compiler is compiling this, then it is doing so only because it supports these so-called variable-length arrays as a non-standard extension. Other compilers will not support them or have differing degree of support or behavior. These arrays are optionally-supported in C, but even there they are probably not worth the trouble they cause.
There is no way to allocate a runtime-dependent amount of memory on the stack in C++ (except if you misuse recursive function calls to simulate it).
So yes, you should use the vector approach. But as discussed in the comments under the question, what you are doing is wrong and causes undefined behavior. You need to either reserve memory and then emplace_back/push_back elements into the vector or you need to resize the vector to the expected size and then you may index it directly. Indexing a vector outside the the range of elements already created in it causes undefined behavior.

Is it possible to initialize a vector of strings from an array? If so, how?

So for example, on GeeksForGeeks.org, contributing user "Kartik" offers the following example for initializing a vector of integers:
// CPP program to initialize a vector from
// an array.
#include <bits/stdc++.h>
using namespace std;
int main()
{
int arr[] = { 10, 20, 30 };
int n = sizeof(arr) / sizeof(arr[0]);
vector<int> vect(arr, arr + n);
for (int x : vect)
cout << x << " ";
return 0;
}
If I understand what I'm reading correctly, sizeof(arr) is some number (which I assume is the length of the array arr; i.e. 3, please correct me if I'm wrong) divided by sizeof(arr[0]) (which I assume to be 1) -- basically just being a roundabout way of saying 3/1 = 3.
At this point, vector<int> vect(arr, arr + n) appears to be a vector of size 3, with all values initialized to arr + n (which I'm assuming is a way of saying "use the 3 items from arr to instantiate; again, please correct me if I'm wrong).
Through whatever sorcery, the output is 10 20 30.
Now, regardless of whether or not any of my above rambling is coherent or even remotely correct, my main question is this: can the same technique be used to instantiate some example vector<string> stringVector such that it would iterate through strings designated by some example string stringArray[] = { "wordA", "wordB", "wordC" }? Because, as I understand it, strings have no numeric values, so I imagine it would be difficult to just say vector<string> stringVector(stringArray, stringArray + n) without encountering some funky junk. So if it is possible, how would one go about doing it?
As a rider, why, or in what type of instance, would anyone want to do this for a vector? Does instantiating it from an array (which as I understand it has constant size) defeat the purpose of the vector?
Just as a disclaimer, I'm new to C++ and a lot of the object-oriented syntax involving stuff like std::vector<_Ty, _Alloc>::vector...etc. makes absolutely no sense to me, so I may need that explained in an answer.
To whoever reads this, thank you for taking the time. I hope you're having a good day!
Clarifications:
sizeof(arr): returns the size in bytes of the array, which is 12 because it has 3 ints, and each int in most implementations has a size of 4 bytes, so 3 bytes x 4 = 12 bytes.
sizeof(arr[0]): returns the size in bytes of the first element of the array, which is 4 because it is an int array.
vector<int> vect(arr, arr + n): the vector class has multiple constructors. Here we are not using the constructor you are thinking of. We are using a constructor that takes begin and end iterators for a range of elements, making a copy of those elements. Pointers can be used as iterators, where in this case arr is the begin iterator and arr + n is the end iterator.
Note: int* + int returns int*.
Note: We should also consider that the "end" of an array is a pointer to the next space after the last item in the array, and the constructor will copy all the items except the item past the end.
Answer:
Yes, remember that here, the constructor is taking iterators, not any item of the array, so we can do it easily like this with little changes:
#include <bits/stdc++.h>
using namespace std;
int main()
{
// changed int to string and the array values
string arr[] = { "one", "two", "three" };
int n = sizeof(arr) / sizeof(arr[0]);
// changed int to string
vector<string> vect(arr, arr + n);
// changed int to string
for (string x : vect)
cout << x << " ";
return 0;
}
sizeof(arr)
sizeof gets the size of an object in bytes. The size of an object is the total number of bytes required by the object. Note that I'm using "object" in the C++ context, not the OOP context (an instance of a class).
The size of an object of a given type is always the same. A std::string containing "a" is the same size as a string containing the unabridged text of War and Peace. Any object that appears to have a variable size really contains a reference to variable length data stored elsewhere. In the case of std::string at its most basic, it is a pointer to a dynamically allocated array and an integer keeping track of how much of the dynamically allocated array is actually in use by the string. std::vector is similar, typically it's a pointer to the start of its data, a pointer to the end of its data, and a pointer to the first empty position in the data. No matter how big the vector is, sizeof(vector) will return the size of the pointers, any other book-keeping variables in the vector implementation, and any padding needed to guarantee correct memory alignment.
This means every item in an array is always the same size and thus the same distance from one another.
Through whatever sorcery...
The above means that the total size of the array divided by the size of one element in the array, sizeof(arr) / sizeof(arr[0]), will always provide the number of elements in the array. It doesn't matter what the array contains, numerical or otherwise. There are of course prettier ways like
template <class T, size_t N>
size_t getsize (const T (&array)[N])
{
return N;
}
and later
size_t n = getsize(arr);
As a rider, why, or in what type of instance, would anyone want to do this for a vector?
In the old days one could not directly construct a vector pre-loaded with data. No one wants to write some arbitrary number of lines of push_back to pound all the values in manually, It's boring as hell, a programmer almost always has better things to do, and the odds of injecting an error are too high. But you could nicely and easily format an array and feed the array into the vector, if you needed a vector at all. A lot of the time you could live off the array by itself because the contents were unchanging or at worst would only be shuffled around.
But if the number of contents could change, it could be time for a vector. If you're going to add items and you don't know the upper limit, it's time for vector. If you're calling into an API that requires a vector, it's time for a vector.
I can't speak for everybody, but I'm going to assume that like me a lot of people would have loved to have that easy-peasy array-style initialization for vectors, lists, maps, and the rest of the usual gang.
We were forced to write programs that generated the appropriate code to fill up the vector or define an array and copy the array into the vector much like the above example.
In C++11 we got our wish with std::initialzer_list and a variety of new initialization options1 that allowed
vector<string> vect{"abc","def","ghi"};
eliminating most cases where you would find yourself copying an array into a library container. And the masses rejoiced.
This coincided with a number of tools like std::size, std::begin and std::end to make converting an array into a vector a cakewalk. Assuming you don't pass the array into a function first.
1 Unfortunately the list of initialization options can get a lil' bewildering
Yes, you can do so - you just need to define something that the constructor for String will take (which is a 'const char')
const char * arr[] = { "abc","def","ghi" };
int n = sizeof(arr) / sizeof(arr[0]);
vector<string> vect(arr, arr + n);
for (string &x : vect)
cout << x << " ";
What this is effectively doing is creating the vector from two iterators (a pointer is, loosely, an iterator):
https://en.cppreference.com/w/cpp/container/vector/vector
Constructs the container with the contents of the range [first, last).
This constructor has the same effect as vector(static_cast<size_type>(first), static_cast<value_type>(last), a) if InputIt is an integral type.
And as #MartinYork pointed out, it's much more readable to use the C++ syntax:
const char * arr[] = { "abc","def","ghi" };
vector<string> vect(std::begin(arr), std::end(arr));
So if it is possible, how would one go about doing it?
Simply use vector constructor number 5, which accepts iterators to start and end of range
Constructs the container with the contents of the range [first,
last).
#include <iostream>
#include <vector>
#include <string>
int main()
{
std::string arr[] = { "wordA", "wordB", "wordC" };
std::vector<std::string> v {std::begin(arr), std::end(arr)};
for (auto& str : v)
std::cout << str << "\n";
return 0;
}
Here's how you'd do it. Note that it's a tad awkward to get the length of the array, but that's just because arrays don't carry that information around with them (use a vector!).
#include<string>
#include<vector>
#include<iterator>
#include<iostream>
int main()
{
std::string arr[] = {"abc", "def", "ghi"};
std::vector<std::string> tmp;
std::copy(arr, arr + sizeof(arr)/sizeof(arr[0]), std::back_inserter(tmp));
for(auto str : tmp) {
std::cout<<str<<"\n";
}
}
Update: Yes good point about using std::begin and std::end for the array.

Swap rows in a 2D array with std::swap. How does it work?

I'm mostly just documenting this question as someone may stumble upon it, and may find it useful. And also, I'm very curios with, how does std::swap works on a 2D array like: Arr[10][10].
My question arised because as to my understanding an array like this is just a 1D array with some reindexing.
For reference:
How are 2-Dimensional Arrays stored in memory?
int main()
{
const int x = 10;
const int y = 10;
int Arr[y][x];
// fill the array with some elements...
for (int i = 0; i < x*y; i++)
{
Arr[i / y][i % x] = i;
}
// swap 'row 5 & 2'
// ??? how does swap know how many elements to swap?
// if it is in fact stored in a 1D array, just the
// compiler will reindex it for us
std::swap(Arr[5], Arr[2]);
return 0;
}
I could understand swapping two 'rows' if our data type is, say a pointer to a pointer like int** Arr2D then swap with std::swap(Arr2D[2], Arr2D[5]) as we do not need to know the length here, we just need to swap the two pointers, pointing to '1D arrays'.
But how does std::swap work with Arr[y][x]?
Is it using a loop maybe, to swap all elements within x length?
std::swap has an overload for arrays that effectively swaps each two elements, again, using std::swap.
As for the size information, it is embedded within the array type (Arr[i] is int[x]), so the compiler knows to deduce T2 as int and N as 10.
OT: Why aren't variable-length arrays part of the C++ standard? (but this particular case is OK)

Resizing std::vector without destroying elements

I am using all the time the same std::vector<int> in order to try to avoid allocating an deallocating all the time. In a few lines, my code is as follows:
std::vector<int> myVector;
myVector.reserve(4);
for (int i = 0; i < 100; ++i) {
fillVector(myVector);
//use of myVector
//....
myVector.resize(0);
}
In each for iteration, myVector will be filled with up to 4 elements. In order to make efficient code, I want to use always myVector. However, in myVector.resize() the elements in myVector are being destroyed. I understand that myVector.clear() will have the same effect.
I think if I could just overwrite the existing elements in myVector I could save some time. However I think the std::vector is not capable of doing this.
Is there any way of doing this? Does it make sense to create a home-grown implementation which overwrites elements ?
Your code is already valid (myVector.clear() has better style than myVector.resize(0) though).
'int destructor' does nothing.
So resize(0) just sets the size to 0, capacity is untouched.
Simply don't keep resizing myVector. Instead, initialise it with 4 elements (with std::vector<int> myVector(4)) and just assign to the elements instead (e.g. myVector[0] = 5).
However, if it's always going to be fixed size, then you might prefer to use a std::array<int, 4>.
Resizing a vector to 0 will not reduce its capacity and, since your element type is int, there are no destructors to run:
#include <iostream>
#include <vector>
int main() {
std::vector<int> v{1,2,3};
std::cout << v.capacity() << ' ';
v.resize(0);
std::cout << v.capacity() << '\n';
}
// Output: 3 3
Therefore, your code already performs mostly optimally; the only further optimisation you could make would be to avoid the resize entirely, thereby losing the internal "set size to 0" inside std::vector that likely comes down to an if statement and a data member value change.
std::vector is not a solution in this case. You don't want to resize/clear/(de)allocate all over again? Don't.
fillVector() fills 'vector' with number of elements known in each iteration.
Vector is internally represented as continuous block of memory of type T*.
You don't want to (de)allocate memory each time.
Ok. Use simple struct:
struct upTo4ElemVectorOfInts
{
int data[4];
size_t elems_num;
};
And modify fillVector() to save additional info:
void fillVector(upTo4ElemVectorOfInts& vec)
{
//fill vec.data with values
vec.elems_num = filled_num; //save how many values was filled in this iteration
}
Use it in the very same way:
upTo4ElemVectorOfInts myVector;
for (int i = 0; i < 100; ++i)
{
fillVector(myVector);
//use of myVector:
//- myVector.data contains data (it's equivalent of std::vector<>::data())
//- myVector.elems_num will tell you how many numbers you should care about
//nothing needs to be resized/cleared
}
Additional Note:
If you want more general solution (to operate on any type or size), you can, of course, use templates:
template <class T, size_t Size>
struct upToSizeElemVectorOfTs
{
T data[Size];
size_t elems_num;
};
and adjust fillVector() to accept template instead of known type.
This solution is probably the fastest one. You can think: "Hey, and if I want to fill up to 100 elements? 1000? 10000? What then? 10000-elem array will consume a lot of storage!".
It would consume anyway. Vector is resizing itself automatically and this reallocs are out of your control and thus can be very inefficient. If your array is reasonably small and you can predict max required size, always use fixed-size storage created on local stack. It's faster, more efficient and simpler. Of course this won't work for arrays of 1.000.000 elements (you would get Stack Overflow in this case).
In fact what you have at present is
for (int i = 0; i < 100; ++i) {
myVector.reserve(4);
//use of myVector
//....
myVector.resize(0);
}
I do not see any sense in that code.
Of course it would be better to use myVector.clear() instead of myVector.resize(0);
If you always overwrite exactly 4 elements of the vector inside the loop then you could use
std::vector<int> myVector( 4 );
instead of
std::vector<int> myVector;
myVector.reserve(4);
provided that function fillVector(myVector); uses the subscript operator to access these 4 elements of the vector instead of member function push_back
Otherwise use clear as it was early suggested.

Making only the outer vector in vector<vector<int>> fixed

I want to create a vector<vector<int>> where the outer vector is fixed (always containing the same vectors), but the inner vectors can be changed. For example:
int n = 2; //decided at runtime
assert(n>0);
vector<vector<int>> outer(n); //outer vector contains n empty vectors
outer.push_back(vector<int>()); //modifying outer vector - this should be error
auto outer_it = outer.begin();
(*outer_it).push_back(3); //modifying inner vector. should work (which it does).
I tried doing simply const vector<vector<int>>, but that makes even the inner vectors const.
Is my only option to create my own custom FixedVectors class, or are there better ways out there to do this?
by definition,
Vectors are sequence containers representing arrays that can change in
size. Just like arrays, vectors use contiguous storage locations for
their elements, which means that their elements can also be accessed
using offsets on regular pointers to its elements, and just as
efficiently as in arrays. But unlike arrays, their size can change
dynamically, with their storage being handled automatically by the
container.
if you aren't looking to have a data structure that changes in size, a vector probably isn't the best choice for an outer layer, How about using an array of vectors. This way the array is of a fixed size and cannot be modified, while still having the freedom of having its size declared in runtime.
vector<int> *outer;
int VectSize;
cout >> "size of vector array?"
cin >> VectSize;
outer = new vector<int>[VectSize]; //array created with fixed size
outer.push_back() //not happening
Wrap the outer vector into a class which just provides at, begin, end and operator []. Let the class take only have one constructor taking its capacity.
This most probably the best way.
const vector<unique_ptr<vector<int>>> outer = something(n);
For the something, you might write a function, like this:
vector<unique_ptr<vector<int>>> something(int n)
{
vector<unique_ptr<vector<int>>> v(n);
for (auto & p : v)
p.reset(new vector<int>);
return v;
}