How do I serialise/deserialise a std::vector<bool> most efficiently? - c++

I'm trying to write the contents of a std::vector<bool> to disk into a binary file. As the write() method of many of the STL output streams takes in a pointer to the array itself, as well as the number of bytes to write, for a 'normal' vector I'd end up doing something like this:
std::vector<unsigned int> dataVector = {0, 1, 2, 3, 4};
std::fstream outStream = std::fstream("vectordump.bin", std::ios::out | std::ios::binary);
outStream.write((char*) dataVector.data(), dataVector.size() * sizeof(unsigned int));
outStream.close();
However, the std::vector<bool> is a special case, as the STL implementation is allowed to pack the bools into single bits. The above approach will therefore technically not consistently work, because it's unspecified how the data is precisely laid out in memory.
Is there any way of serialising/deserialising my bool vector without having to pack/unpack the data?

I think you're better off to just translate that vector into std::vector<std::byte>/std::vector<unsigned char>.
std::vector<bool> isn't even required to have contiguous memory so writing starting from data() is implementation defined too.

No, there isn't.
Sorry.
A good reason to avoid this container!

Related

How to store / load big C++ containers

I was wondering how can I store C++ containers for efficient loading, for example how can I store very large vectors of integers. I know I can save them in a file, and make new vector out of that data
#include <fstream>
#include <vector>
int main()
{
vector<int> data = {1, 2, 3, 4, 5}; // some elements
std::ifstream file(path);
for (const auto &c : data)
file << c << " ";
return 0;
}
but if I want to save 1 gigabyte of data, loading it every time from a file takes a loooooooooong time. So is there a way to store this kind of data, for fast loading that doesn't take forever, if possible I would like to store my own classes this way as well.
std::vector is stored in a contiguous memory block.
If you want to store/load data from a vector to file you should be able to do something like this.
std::string filename{ "test.dat" };
std::vector<int> vec_source = { 1, 2, 3, 4, 5 }; // some elements
// Save to file
std::ofstream OutFile;
OutFile.open(filename, std::ofstream::out | std::ofstream::binary);
OutFile.write(reinterpret_cast<char*>(vec_source.data()), vec_source.size() * sizeof(int));
OutFile.close();
// Prepare
std::vector<int> vec_target;
vec_target.resize(vec_source.size());
// Load from file
std::ifstream InFile;
InFile.open(filename, std::ofstream::in| std::ofstream::binary);
InFile.read(reinterpret_cast<char*>(vec_target.data()), vec_target.size() * sizeof(int));
InFile.close();
See working example here:
https://wandbox.org/permlink/oQuwXxU8q230FaJC
[EDIT]
Few notes and limitations:
Note 1: If you plan to do more then just save/load the whole array. Like changing the data and storing only the changes you should consider a better method (like split the data into chunks, save each chunk separately)
Note 2: This method is correct only for containers which use contiguous memory block like std::vector, std::array and std::string. It will certainly not work for std::list or std::map
Note 3: Following interesting discussion between #DavidSchwartz and #Acorn in the comments of this post. This code example will work correctly only if the endianness of the platform is constant and same when storing and loading the data from the file. It will certainly will not work in case the platform changes its endianness across runs or if mixing platforms!.

File I/O for a vector of arrays

This questions has good answers on how to write an std::vector into a file: Reading and writing a std::vector into a file correctly
In my case, I have a vector of arrays:
vector<array<double, 3> > vec;
I would like to write into a file in order to get a file having the following format, where the values are doubles and the first number is the position in the vector and the second is the position in the array:
vec0_0 vec0_1 vec0_2 vec1_0 vec1_1 vec1_2 vec2_0 ...
Can I just use...
std::copy(vec.begin(), vec.end(), std::ostreambuf_iterator<char>(FILE));
...or...
size_t sz = vec.size();
FILE.write(reinterpret_cast<const char*>(&vec[0]), sz * sizeof(vec[0]));
...as proposed in the mentioned question for a scalar type, or do I need to do it differently because the type in the vector is an array?
From what I understand, std::array has contiguous storage. However, I don't think that guarantees there is no padding. If that were just a double[3], it would work out of the box, but I think you'd have to test very carefully and worry about portability with a std::array inside the container.
In fact, looking around there is already an example out there of a system that pads.
std::array alignment
sizeof(int) = 4;
sizeof( std::tr1::array< int,3 > ) = 16;
sizeof( std::tr1::array< int,4 > ) = 16;
sizeof( std::tr1::array< int,5 > ) = 32;
Presumably this padding is implementation defined, or maybe you can find it in the standard somewhere. In any case, I'd just iterate the thing or use a non-stl array.
I'd guess the concept is similar to a struct where there is often padding introduced to optimize memory access, however the compiler is optimizing that padding, and it can be turned off on most compilers with #pragma pack statements. Not true of stl containers to my knowledge.

Is there a Qt C++ data container to use in dynamic size structs?

I thought this problem should be identified and solved by the community but it seems either I'm searching with the wrong keywords or it's really that elusive.
The problem is simple. I want to define a struct in which a dynamic data container (Vector, List, Queue, whatever works) should be defined.
#pragma pack(1)
struct Example
{
int foo;
QVector<int> bar;
};
I need to insert integer values to this Vector (or take any other Qt/STL container) and I want to copy this struct's contents to a byte array in order to write its raw data to a file.
What I encounter is that when I write the following code:
Example exstr;
qDebug()<<sizeof(exstr);
exstr.bar.push_back(12);
exstr.bar.push_back(5);
qDebug()<<sizeof(exstr);
It displays the values:
8
8
Now, this is probably because QVector is just an ordinary pointer which points to the contiguous data but what I need is a dynamically resizeable data container (which would also resize the struct it's in) and allow me to use the contents byte by byte when I try to serialize it.
Thanks for the help in advance.
The QDataStream allows you to serialize some of the Qt data types: Serializing Qt Data Types
QDataStream stream(&file); // we will serialize the data into the file
stream << your_qvector_obj;
//...
QVector<int> new_vec;
stream >> new_vec;
//...
You're on the right track with QVector. The STL vector class will also work. Both of these classes are guaranteed to store their values in contiguous memory.
For QVector, you need something like this:
memcpy (dest, bar.data(), bar.count () * sizeof (int));
For an STL vector, you take the address of the first element:
memcpy (dest, &bar[0], bar.size () * sizeof (int));
In both cases, "dest" must be large enough to hold the data you're copying. I'm using copying here as an example; if you just need a pointer to the data then QVector::data provides that, for an STL vector, the address of the first element provides it.

Copying an array into a std::vector

I was searching about this topic and I found many ways to convert an array[] to an std::vector, like using:
assign(a, a + n)
or, direct in the constructor:
std::vector<unsigned char> v ( a, a + n );
Those solve my problem, but I am wondering if it is possible (and correct) to do:
myvet.resize( 10 );
memcpy( &myvet[0], buffer, 10 );
I am wondering this because I have the following code:
IDiskAccess::ERetRead nsDisks::DiskAccess::Read( std::vector< uint8_t >& bufferRead, int32_t totalToRead )
{
uint8_t* data = new uint8_t[totalToRead];
DWORD totalRead;
ReadFile( mhFile, data, totalToRead, &totalRead, NULL );
bufferRead.resize( totalRead );
bufferRead.assign( data, data + totalRead );
delete[] data;
return IDiskAccess::READ_OK;
}
And I would like to do:
IDiskAccess::ERetRead nsDisks::DiskAccess::Read( std::vector< uint8_t >& bufferRead, int32_t totalToRead )
{
bufferRead.resize( totalToRead );
DWORD totalRead;
ReadFile( mhFile, &bufferRead[0], totalToRead, &totalRead, NULL );
bufferRead.resize( totalRead );
return IDiskAccess::READ_OK;
}
(I have removed the error treatment of the ReadFile function to simplify the post).
It is working, but I am affraid that it is not safe. I believe it is ok, as the memory used by the vector is continuous, but I've never seen someone using vectors this way.
Is it correct to use vectors like this? Is there any other better option?
Yes it is safe with std::vector C++ standard guarantees that the elements will be stored at contiguous memory locations.
C++11 Standard:
23.3.6.1 Class templatevector overview [vector.overview]
A vector is a sequence container that supports random access iterators. In addition,itsupports(amortized) constant time insert and erase operations at the end; insert and erase in the middle take linear time. Storage management is handled automatically, though hints can be given to improve efficiency. The elements of a vector are stored contiguously, meaning that ifv is avector whereT is some type other than bool, then it obeys the identity&v[n] == &v[0] + n for all0 <= n < v.size().
Yes, it is fine to do that. You might want to do myvet.data() instead of &myvet[0] if it looks better to you, but they both have the same effect. Also, if circumstances permit, you can use std::copy instead and have more type-safety and all those other C++ standard library goodies.
The storage that a vector uses is guaranteed to be contiguous, which makes it suitable for use as a buffer or with other functions.
Make sure that you don't modify the vector (such as calling push_back on it, etc) while you are using the pointer you get from data or &v[0] because the vector could resize its buffer on one of those operations and invalidate the pointer.
That approach is correct, it only depends on the vector having contiguous memory which is required by the standard. I believe that in c++11 there is a new data() member function in vectors that returns a pointer to the buffer. Also note that in the case of `memcpy you need to pass the size in bytes not e size of the array
The memory in vector is guaranteed to be allocated contiguously, and unsigned char is POD, therefore it is totally safe to memcpy into it (assuming you don't copy more than you have allocated, of course).
Do your resize first, and it should work fine.
vector<int> v;
v.resize(100);
memcpy(&v[0], someArrayOfSize100, 100 * sizeof(int));
Yes, the solution using memcpy is correct; the buffer held by a vector is contiguous. But it's not quite type-safe, so prefer assign or std::copy.

How do you copy the contents of an array to a std::vector in C++ without looping?

I have an array of values that is passed to my function from a different part of the program that I need to store for later processing. Since I don't know how many times my function will be called before it is time to process the data, I need a dynamic storage structure, so I chose a std::vector. I don't want to have to do the standard loop to push_back all the values individually, it would be nice if I could just copy it all using something similar to memcpy.
There have been many answers here and just about all of them will get the job done.
However there is some misleading advice!
Here are the options:
vector<int> dataVec;
int dataArray[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
unsigned dataArraySize = sizeof(dataArray) / sizeof(int);
// Method 1: Copy the array to the vector using back_inserter.
{
copy(&dataArray[0], &dataArray[dataArraySize], back_inserter(dataVec));
}
// Method 2: Same as 1 but pre-extend the vector by the size of the array using reserve
{
dataVec.reserve(dataVec.size() + dataArraySize);
copy(&dataArray[0], &dataArray[dataArraySize], back_inserter(dataVec));
}
// Method 3: Memcpy
{
dataVec.resize(dataVec.size() + dataArraySize);
memcpy(&dataVec[dataVec.size() - dataArraySize], &dataArray[0], dataArraySize * sizeof(int));
}
// Method 4: vector::insert
{
dataVec.insert(dataVec.end(), &dataArray[0], &dataArray[dataArraySize]);
}
// Method 5: vector + vector
{
vector<int> dataVec2(&dataArray[0], &dataArray[dataArraySize]);
dataVec.insert(dataVec.end(), dataVec2.begin(), dataVec2.end());
}
To cut a long story short Method 4, using vector::insert, is the best for bsruth's scenario.
Here are some gory details:
Method 1 is probably the easiest to understand. Just copy each element from the array and push it into the back of the vector. Alas, it's slow. Because there's a loop (implied with the copy function), each element must be treated individually; no performance improvements can be made based on the fact that we know the array and vectors are contiguous blocks.
Method 2 is a suggested performance improvement to Method 1; just pre-reserve the size of the array before adding it. For large arrays this might help. However the best advice here is never to use reserve unless profiling suggests you may be able to get an improvement (or you need to ensure your iterators are not going to be invalidated). Bjarne agrees. Incidentally, I found that this method performed the slowest most of the time though I'm struggling to comprehensively explain why it was regularly significantly slower than method 1...
Method 3 is the old school solution - throw some C at the problem! Works fine and fast for POD types. In this case resize is required to be called since memcpy works outside the bounds of vector and there is no way to tell a vector that its size has changed. Apart from being an ugly solution (byte copying!) remember that this can only be used for POD types. I would never use this solution.
Method 4 is the best way to go. It's meaning is clear, it's (usually) the fastest and it works for any objects. There is no downside to using this method for this application.
Method 5 is a tweak on Method 4 - copy the array into a vector and then append it. Good option - generally fast-ish and clear.
Finally, you are aware that you can use vectors in place of arrays, right? Even when a function expects c-style arrays you can use vectors:
vector<char> v(50); // Ensure there's enough space
strcpy(&v[0], "prefer vectors to c arrays");
If you can construct the vector after you've gotten the array and array size, you can just say:
std::vector<ValueType> vec(a, a + n);
...assuming a is your array and n is the number of elements it contains. Otherwise, std::copy() w/resize() will do the trick.
I'd stay away from memcpy() unless you can be sure that the values are plain-old data (POD) types.
Also, worth noting that none of these really avoids the for loop--it's just a question of whether you have to see it in your code or not. O(n) runtime performance is unavoidable for copying the values.
Finally, note that C-style arrays are perfectly valid containers for most STL algorithms--the raw pointer is equivalent to begin(), and (ptr + n) is equivalent to end().
If all you are doing is replacing the existing data, then you can do this
std::vector<int> data; // evil global :)
void CopyData(int *newData, size_t count)
{
data.assign(newData, newData + count);
}
std::copy is what you're looking for.
Since I can only edit my own answer, I'm going to make a composite answer from the other answers to my question. Thanks to all of you who answered.
Using std::copy, this still iterates in the background, but you don't have to type out the code.
int foo(int* data, int size)
{
static std::vector<int> my_data; //normally a class variable
std::copy(data, data + size, std::back_inserter(my_data));
return 0;
}
Using regular memcpy. This is probably best used for basic data types (i.e. int) but not for more complex arrays of structs or classes.
vector<int> x(size);
memcpy(&x[0], source, size*sizeof(int));
int dataArray[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };//source
unsigned dataArraySize = sizeof(dataArray) / sizeof(int);
std::vector<int> myvector (dataArraySize );//target
std::copy ( myints, myints+dataArraySize , myvector.begin() );
//myvector now has 1,2,3,...10 :-)
Yet another answer, since the person said "I don't know how many times my function will be called", you could use the vector insert method like so to append arrays of values to the end of the vector:
vector<int> x;
void AddValues(int* values, size_t size)
{
x.insert(x.end(), values, values+size);
}
I like this way because the implementation of the vector should be able to optimize for the best way to insert the values based on the iterator type and the type itself. You are somewhat replying on the implementation of stl.
If you need to guarantee the fastest speed and you know your type is a POD type then I would recommend the resize method in Thomas's answer:
vector<int> x;
void AddValues(int* values, size_t size)
{
size_t old_size(x.size());
x.resize(old_size + size, 0);
memcpy(&x[old_size], values, size * sizeof(int));
}
avoid the memcpy, I say. No reason to mess with pointer operations unless you really have to. Also, it will only work for POD types (like int) but would fail if you're dealing with types that require construction.
In addition to the methods presented above, you need to make sure you use either std::Vector.reserve(), std::Vector.resize(), or construct the vector to size, to make sure your vector has enough elements in it to hold your data. if not, you will corrupt memory. This is true of either std::copy() or memcpy().
This is the reason to use vector.push_back(), you can't write past the end of the vector.
Assuming you know how big the item in the vector are:
std::vector<int> myArray;
myArray.resize (item_count, 0);
memcpy (&myArray.front(), source, item_count * sizeof(int));
http://www.cppreference.com/wiki/stl/vector/start