Vectors sharing values - c++

I'm trying to find how to create a vector inside a bigger one.
It is not a vector of vectors but a vector whose data are shared inside a bigger vector:
vector<char> bigger_v(10);
bigger_v.push_back('h');
bigger_v.push_back('e');
bigger_v.push_back('l');
bigger_v.push_back('l');
bigger_v.push_back('o');
vector<char> smaller_v {'w','o','r','l','d'}; //sharing memory with bigger vector
//some magic..
cout<<bigger_v<<endl; // "helloworld"
I don't mind if I have to declare first the bigger vector and reserve space enough to contain the smaller vector. And then declare smaller vector inside the reserved space of the bigger vector.
My target is create a frame (bigger vector) with some fixed size values (header) and some variable size values (smaller vector created from another variable size incoming frame). The smaller vector lets me manipulate incoming data easier.

Vectors fundamentally own and manage their contents. This is what vectors do; they may not share memory in any reasonable way. (Hacks to do so result in undefined behavior).
std::span, however, is a view into memory owned by something else.
std::span smaller{ bigger_v.begin()+5, bigger_v.end() };
std::copy_n( "world", 5, smaller.begin() );
std::cout<<'"'<<bigger_v<<"\"\n"; // "helloworld"

I do not think there is any way to have to vectors have the same memory, but I can think of two similar ways to have it:
#include <vector>
#include <utility>
#include <iostream>
using namespace std;
int main() {
vector<char> bigger_v(10);
bigger_v.push_back('h');
bigger_v.push_back('e');
bigger_v.push_back('l');
bigger_v.push_back('l');
bigger_v.push_back('o');
vector<char> smaller_v{ 'w','o','r','l','d' }; //sharing memory with bigger vector
const int sz = bigger_v.size();
for (int i = 0; i < smaller_v.size(); ++i)
bigger_v.push_back(move(smaller_v[i]));
for (int x = 0; x < bigger_v.size(); ++x)
cout << bigger_v[x]; // "helloworld"
cout << endl;
}
The way above moves the memory from smaller_v to bigger_v, so that smaller_v no longer has any contents, and bigger_v has the memory previously owned by smaller_v.
The second way I can think of, the one closer to your idea, would involve having the vectors contain pointers. That is, both smaller_v and bigger_v would have pointers to the same memory. I won't explain it here, but if you want that, you could try.

Related

Replace vector of vector with flat memory structure

I have the following type:
std::vector<std::vector<int>> indicies
where the size of the inner vector is always 2. The problem is, that vectors are non-contiguous in memory. I would like to replace the inner vector with something contiguous so that I can cast the flattened array:
int *array_a = (int *) &(a[0][0])
It would be nice if the new type has the [] operator, so that I don't have to change the whole code. (I could also implement it myself if necessary). My ideas are either:
std::vector<std::array<int, 2>>
or
std::vector<std::pair<int, int>>
How do these look in memory? I wrote a small test:
#include <iostream>
#include <array>
#include <vector>
int main(int argc, char *argv[])
{
using namespace std;
vector<array<int, 2>> a(100);
cout << sizeof(array<int, 2>) << endl;
for(auto i = 0; i < 10; i++){
for(auto j = 0; j < 2; j++){
cout << "a[" << i << "][" << j << "] "
<<&(a[i][j]) << endl;
}
}
return 0;
}
which results in:
8
a[0][0] 0x1b72c20
a[0][1] 0x1b72c24
a[1][0] 0x1b72c28
a[1][1] 0x1b72c2c
a[2][0] 0x1b72c30
a[2][1] 0x1b72c34
a[3][0] 0x1b72c38
a[3][1] 0x1b72c3c
a[4][0] 0x1b72c40
a[4][1] 0x1b72c44
a[5][0] 0x1b72c48
a[5][1] 0x1b72c4c
a[6][0] 0x1b72c50
a[6][1] 0x1b72c54
a[7][0] 0x1b72c58
a[7][1] 0x1b72c5c
a[8][0] 0x1b72c60
a[8][1] 0x1b72c64
a[9][0] 0x1b72c68
a[9][1] 0x1b72c6c
It seems to work in this case. Is this behavior in the standard or just a lucky coincidence? Is there a better way to do this?
An array<int,2> is going to be a struct containing an array int[2]; the standard does not directly mandate it, but there really is no other sane and practical way to do it.
See 23.3.7 [array] within the standard. There is nothing in the standard I can find that requires sizeof(std::array<char, 10>)==1024 to be false. It would be a ridiculous QOI (quality of implementation); every implementation I have seen has sizeof(std::array<T,N>) == N*sizeof(T), and anything else I would consider hostile.
Arrays must be contiguous containers which are aggregates that can be initialized by up to N arguments of types convertible to T.
The standard permits padding after such an array. I am aware of 0 compilers who insert such padding.
A buffer of contiguous std::array<int,2> is not guaranteed to be safely accessed as a flat buffer of int. In fact, aliasing rules almost certainly ban such access as undefined behaviour. You cannot even do this with a int[3][7]! See this SO question and answer, and here, and here.
Most compilers will make what you describe work, but the optimizer might decide that access through an int* and through the array<int,2>* cannot access the same memory, and generate insane results. It does not seem worth it.
A standards compliant approach would be to write an array view type (that takes two pointers and forms an iterable range with [] overloaded). Then write a 2d view of a flat buffer, with the lower dimension either a runtime or compile time value. Its [] would then return an array view.
There is going to be code in boost and other "standard extension" libraries to do this for you.
Merge the 2d view with a type owning a vector, and you get your 2d vector.
The only behaviour difference is that when the old vector of vector code copies the lower dimension (like auto inner=outer[i]) it copies data, afer it will instead create a view.
Is there a better way to do this?
I recently finished yet-another-version of Game-of-Life.
The game board is 2d, and yes, the vector of vectors has wasted space in it.
In my recent effort I chose to try a 1d vector for the 2d game board.
typedef std::vector<Cell_t*> GameBoard_t;
Then I created a simple indexing function, for when use of row/col added to the code's readability:
inline size_t gbIndx(int row, int col)
{ return ((row * MAXCOL) + col); }
Example: accessing row 27, col 33:
Cell_t* cell = gameBoard[ gbIndx(27, 33) ];
All the Cell_t* in gameBoard are now packed back to back (definition of vector) and trivial to access (initialize, display, etc) in row/col order using gbIndx().
In addition, I could use the simple index for various efforts:
void setAliveRandom(const GameBoard_t& gameBoard)
{
GameBoard_t myVec(m_gameBoard); // copy cell vector
time_t seed = std::chrono::system_clock::
now().time_since_epoch().count();
// randomize copy's element order
std::shuffle (myVec.begin(), myVec.end(), std::default_random_engine(seed));
int count = 0;
for ( auto it : myVec )
{
if (count & 1) it->setAlive(); // touch odd elements
count += 1;
}
}
I was surprised by how often I did not need row/col indexing.
As far as I know, std::vector are contiguous in memory. Take a look at this questions:
Why is std::vector contiguous?,
Are std::vector elements guaranteed to be contiguous?
In case you have to resize an inner vector, you wouldn't have the whole structure contiguous, but the inner vectors would still be it. If you use a vector of vectors, though, you'd have a fully contiguous structure (and I edit here, sorry I misunderstood your question) meaning that the pointers that point to your inner vectors will also be contiguous.
If you want to implement a structure that is always contiguous, from the first element of the first vector to the last element of the last vector, you can implement it as a custom class that has a vector<int> and elems_per_vector that indicates the number of elements in each inner vector.
Then, you can overload the operator(), so to access to a(i,j) you are actually accessing a.vector[a.elems_per_vector*i+j]. To insert new elements, though, and in order to keep the inner vectors at constant size between them, you'll have to make as many inserts as inner vectors you have.

Safe to iterate over std::vector<some_container> while modifying <some_container>'s size?

Suppose I have a vector of some other container type. While iterating over the vector I change the size of the containers. Given that vectors try to remain contiguous in system memory, could the pointer arithmetic fail in loops like this? For example,
#include <stdlib.h>
#include <vector>
using namespace std;
int main(){
vector<vector<double> > vec_vec(4);
for (auto i=vec_vec.begin(); i!=vec_vec.end(); ++i){
for (double j=0; j<100; j+=1.0){
i->push_back(j)
};
};
return 0;
}
I've had no issues using code like this so far, but now I'm wondering if I just got lucky. Is this safe? Does it depend on the kind of container used inside the vector?
That's perfectly OK, you are not changing the outer vector. However there is no guarantee that all vectors will be contiguous in the memory. Each individual inner one will be, but don't expect that they are arranged one after the other in memory.
You are modifying the contents of the std::vector you are iterating over. No the vector you are iterating over. They are different things.
First one is safe. Second one wouldn't be safe due to eventual memory reallocations.
A vector is a fixed size management object (size,reserved, pointer) with its contiguous memory pointed to by pointer.
Thus you are not changing object's size

Resizing std::vector without destroying elements

I am using all the time the same std::vector<int> in order to try to avoid allocating an deallocating all the time. In a few lines, my code is as follows:
std::vector<int> myVector;
myVector.reserve(4);
for (int i = 0; i < 100; ++i) {
fillVector(myVector);
//use of myVector
//....
myVector.resize(0);
}
In each for iteration, myVector will be filled with up to 4 elements. In order to make efficient code, I want to use always myVector. However, in myVector.resize() the elements in myVector are being destroyed. I understand that myVector.clear() will have the same effect.
I think if I could just overwrite the existing elements in myVector I could save some time. However I think the std::vector is not capable of doing this.
Is there any way of doing this? Does it make sense to create a home-grown implementation which overwrites elements ?
Your code is already valid (myVector.clear() has better style than myVector.resize(0) though).
'int destructor' does nothing.
So resize(0) just sets the size to 0, capacity is untouched.
Simply don't keep resizing myVector. Instead, initialise it with 4 elements (with std::vector<int> myVector(4)) and just assign to the elements instead (e.g. myVector[0] = 5).
However, if it's always going to be fixed size, then you might prefer to use a std::array<int, 4>.
Resizing a vector to 0 will not reduce its capacity and, since your element type is int, there are no destructors to run:
#include <iostream>
#include <vector>
int main() {
std::vector<int> v{1,2,3};
std::cout << v.capacity() << ' ';
v.resize(0);
std::cout << v.capacity() << '\n';
}
// Output: 3 3
Therefore, your code already performs mostly optimally; the only further optimisation you could make would be to avoid the resize entirely, thereby losing the internal "set size to 0" inside std::vector that likely comes down to an if statement and a data member value change.
std::vector is not a solution in this case. You don't want to resize/clear/(de)allocate all over again? Don't.
fillVector() fills 'vector' with number of elements known in each iteration.
Vector is internally represented as continuous block of memory of type T*.
You don't want to (de)allocate memory each time.
Ok. Use simple struct:
struct upTo4ElemVectorOfInts
{
int data[4];
size_t elems_num;
};
And modify fillVector() to save additional info:
void fillVector(upTo4ElemVectorOfInts& vec)
{
//fill vec.data with values
vec.elems_num = filled_num; //save how many values was filled in this iteration
}
Use it in the very same way:
upTo4ElemVectorOfInts myVector;
for (int i = 0; i < 100; ++i)
{
fillVector(myVector);
//use of myVector:
//- myVector.data contains data (it's equivalent of std::vector<>::data())
//- myVector.elems_num will tell you how many numbers you should care about
//nothing needs to be resized/cleared
}
Additional Note:
If you want more general solution (to operate on any type or size), you can, of course, use templates:
template <class T, size_t Size>
struct upToSizeElemVectorOfTs
{
T data[Size];
size_t elems_num;
};
and adjust fillVector() to accept template instead of known type.
This solution is probably the fastest one. You can think: "Hey, and if I want to fill up to 100 elements? 1000? 10000? What then? 10000-elem array will consume a lot of storage!".
It would consume anyway. Vector is resizing itself automatically and this reallocs are out of your control and thus can be very inefficient. If your array is reasonably small and you can predict max required size, always use fixed-size storage created on local stack. It's faster, more efficient and simpler. Of course this won't work for arrays of 1.000.000 elements (you would get Stack Overflow in this case).
In fact what you have at present is
for (int i = 0; i < 100; ++i) {
myVector.reserve(4);
//use of myVector
//....
myVector.resize(0);
}
I do not see any sense in that code.
Of course it would be better to use myVector.clear() instead of myVector.resize(0);
If you always overwrite exactly 4 elements of the vector inside the loop then you could use
std::vector<int> myVector( 4 );
instead of
std::vector<int> myVector;
myVector.reserve(4);
provided that function fillVector(myVector); uses the subscript operator to access these 4 elements of the vector instead of member function push_back
Otherwise use clear as it was early suggested.

Error: Deallocating a 2D array

I am developing a program in which one of the task is to read points (x,y and z) from a text file and then store them in an array. Now the text file may contain 10^2 or even 10^6 points, depending upon the text file user selects. Therefore I am defining a dynamic array.
For allocating a dynamic 2D array, I wrote as below and it works fine:
const int array_size = 100000;
float** array = new float* [array_size];
for(int i = 0; i < array_size; ++i){
ary[i] = new float[2]; // 0,1,2 being the columns for x,y,z co-ordinates
}
After the points are saved in the array, I write the following to deallocate the unallocated memory :
for (int i = 0; i < array_size; i++){
delete [] array[i];
}
delete [] array;
and then my program stops working and shows "Project.exe stopped working".
If I don't deallocate, the program works just fine.
In your comment you say 0,1,2 being the columns for x,y,z co-ordinates, if that's the case, you need to be allocating as float[3]. When you allocate an array of float[N], you are allocating a chunk of the memory of the size N * sizeof(float), and you will index them in the array from 1 to N - 1. Therefore if you need indeces 0,1,2, you will need to allocate a memory of the size 3 * sizeof(float), which makes it float[3].
Because other than that, I can compile and run the code without an error. If you fix it and still get an error, it might be your compiler problem. Then try to decrease 100000 to a small number and try again.
You are saying that you are trying to implement a dynamic array, this is what std::vector does and I would highly recommend that you use it. This way you are using something from the standard library that's extremely well tested and you won't run into issues by essentially trying to roll your own version of std::vector. Additionally this approach wraps memory better as it uses RAII which leverages the language to solve a lot of memory management issues. This has other benefits too like making your code more exception safe.
Also if you are storing x,y,z coordinates consider using a struct or a tuple, I think that enhances readability a lot. You can typedef the coordinate type too. Something like std::vector< coord_t > is more readable to me.
(Thanx a lot for suggestions!!)
Finally I am using vectors for the stated problem for reasons as below:
1.Unlike Arrays (not array object ofcourse), I don't need to manually deallocate unallocated memory.
2.There are numerous built in methods defined under vector class
Vector size can be extended at later stages
Below is how I used 2D Vector to store points (x,y,z co-ordinates)
Initialized (allocated memory) a 2D vector:
vector<vector<float>> array (1000, vector<float> array (3));
Where 1000 is the number of rows, and 3 is the number of columns
Once declared, values can be passed simply as:
array[i][j] = some value;
Also, at later stage I declared functions taking vector arguments and returning vectors as:
vector <vector <float>> function_name ( vector <vector <float>>);
vector <vector <float>> function_name ( vector <vector <float>> input_vector_name)
{
return output_vector_name_created_inside_function
}
Note: This method crates a copy of vector while returning, use pointer to return by reference. Even though mine is not working when I return vector by reference :(
For multi arrays I recommended use boost::multi_array.
Example:
typedef boost::multi_array<double, 3> array_type;
array_type A(boost::extents[3][4][2]);
A[0][0][0] = 3.14;

Different addresses while filling a std::vector

Wouldn't you expect the addresses printed by the two loops to be the same? I was, and I cannot understand why (sometimes) they are different.
#include <iostream>
#include <vector>
using namespace std;
struct S {
void print_address() {
cout << this << endl;
}
};
int main(int argc,char *argv[]) {
vector<S> v;
for (size_t i = 0; i < 10; i++) {
v.push_back( S() );
v.back().print_address();
}
cout << endl;
for (size_t i = 0; i < v.size(); i++) {
v[i].print_address();
}
return 0;
}
I tested this code with many local and on-line compilers and the output I get looks like this (the last three figures are always the same):
0xaec010
0xaec031
0xaec012
0xaec013
0xaec034
0xaec035
0xaec036
0xaec037
0xaec018
0xaec019
0xaec010
0xaec011
0xaec012
0xaec013
0xaec014
0xaec015
0xaec016
0xaec017
0xaec018
0xaec019
I spotted this because making some initialization in the first loop I obtained uninitialized object in the subsequent part of the program. Am I missing something?
Because when vector capicity changes, it reallocates elements. If you std::vector::reserve enough capacity, no reallcation is needed, it will print same address.
vector<S> v;
v.reserve(10);
Note: properly use std::vector::reserve will increase application performance, because no unnecessary reallocation and objects copy.
The vector is performing re-allocations in order to grow as needed. Each time it does this, it allocates a larger buffer for the data and copies the elements across. You can see this clearly in the first loop, where each address jump is followed by a larger sequence of consecutive addresses. In the second loop, you just look at the addresses after the final reallocation.
0xaec010
0xaec031 <--
0xaec012 <--
0xaec013
0xaec034 <--
0xaec035
0xaec036
0xaec037
0xaec018 <--
0xaec019
The simplest way to instantiate a vector with 10 S objects would be
std::vector<S> v(10);
This would involve no re-allocations. See also std::vector::reserve.
Vector elements are stored contiguously; that is, they're all in a row in memory. Your vector object has to allocate space for this contiguous block of elements.
Your vector can't just keep having things added to it indefinitely. It has to grow the space it has allocated. The memory model typically doesn't allow us to expand a memory block — we have to create a new one instead. When the vector does this, it has to move all its elements to the new space. This is occurring several times within your first loop.
If you'd done:
vector<S> v;
v.reserve(10);
(which you can, since you know you'll end up with 10 elements), then no re-allocation would have been necessary, and the addresses would not have changed.
I'm not really surprised that they can change. As the vector initially has no size, it's likely to reallocate the vector once or twice during the initial loop. That'll change the base address of the vector. It's not impossible that after a resize, you'll end up using an address you used before (though I find that somewhat surprising. Are you sure about the first part of the addresses?)
If you want to ensure they don't change, you need to add a v.reserve() before you start pushing stuff on it.