I m trying to save all the member variables of an object in a binary file. However, the member variables are vectors that is dynamically allocated. So, is there any way to combine all the data and save it in a binary file. As of now, it just saves the pointer, which is of little help. Following is my running code.
#include <vector>
#include <iostream>
#include <fstream>
class BaseSaveFile {
protected:
std::vector<float> first_vector;
public:
void fill_vector(std::vector<float> fill) {
first_vector = fill;
}
void show_vector() {
for ( auto x: first_vector )
std::cout << x << std::endl;
}
};
class DerivedSaveFile : public BaseSaveFile {
};
int main ( int argc, char **argv) {
DerivedSaveFile derived;
std::vector<float> fill;
for ( auto i = 0; i < 10; i++) {
fill.push_back(i);
}
derived.fill_vector(fill);
derived.show_vector();
std::ofstream save_object("../save_object.bin", std::ios::out | std::ios::binary);
save_object.write((char*)&derived, sizeof(derived));
}
Currently size of the binary file is just 24 bytes. But I was execting much larger because of the vector of 10 floats.
"is there any way to combine all the data and save it in a binary file" - of course there is. You write code to iterate over all the data and convert it into a form suitable for writing to a file (that you know how to later parse when reading it back in). Then you write code to read the file, parse it into meaningful variables classes and construct new objects from the read-in data. There's no built-in facility for it, but it's not rocket science - just a bunch of work/code you need to do.
It's called serialisation/de-serialisation btw, in case you want to use your preferred search engine to look up more details.
The problem
You can write the exact binary content of an object to a file:
save_object.write((char*)&derived, sizeof(derived));
However, it is not guaranteed that you you read it back into memory with the reverse read operation. This is only possible for a small subset of objects that have a trivially copyable type and do not contain any pointer.
You can verify if your type matches this definition with std::is_trivially_copyable<BaseSaveFile>::value but I can already tell you that it's not because of the vector.
To simplify a bit the formal definition, trivially copyable types are more or less the types that are composed only of other trivially copiable elements and very elementary data types such as int, float, char, or fixed-size arrays.
The solution: introduction to serialization
The general solution, as mentionned int he other response it called serialization. But for a more tailored answer, here is how it would look like.
You would add the following public method to your type:
std::ostream& save(std::ostream& os){
size_t vsize=first_vector.size();
os.write((char*)&vsize, sizeof(vsize));
os.write((char*)first_vector.data(), vsize*sizeof(float));
return os;
}
This method has access to all the members and can write them to the disk. For the case of the vector, you'd first write down its size (so that you know how big it is when you'll read the file later on).
You would then add the reverse method:
std::istream& load(std::istream& is){
size_t vsize;
if(is.read((char*)&vsize, sizeof(vsize))) {
first_vector.resize(vsize);
is.read((char*)first_vector.data(), vsize*sizeof(float));
}
return is;
}
Here the trick is to first read the size of the vector on disk, and then resize the vector before loading it.
Note the use of istream and ostream. This allows you to store the data on a file, but you could use any other kind of stream such as in memory string stream if you want.
Here a full online example (it uses stringstream because the online service doesn't provide for files to be written).
More serialization ?
There are some serialization tricks to know. First, if you have derived types, you'd need to make load() and save() virtual and provide the derived types with their own overridden version.
If one of your data member is not trivially copyable, it would need its own load() and save() that you could then invoke recursively. Or you'd need to handle the thing yourself, which is only possible if you can access all the members you'd need to restore its state.
Finally, you don't need to reinvent the wheel. There are some libraries outside that may help, like boost serialisation or cereal
Related
I have a structure that looks like this:
struct SoA
{
int arr1[COUNT];
int arr2[COUNT];
};
And I want it to look like this:
struct AoS
{
int arr1_data;
int arr2_data;
};
std::vector<AoS> points;
as quickly as possible. Order must be preserved.
Is constructing each AoS object individually and pushing it back the fastest way to do this, or is there a faster option?
SoA before;
std::vector<AoS> after;
for (int i = 0; i < COUNT; i++)
points.push_back(AoS(after.arr1[i], after.arr2[i]));
There are SoA/AoS related questions on StackOverflow, but I haven't found one related to fastest-possible conversion. Because of struct packing differences I can't see any way to avoid copying the data from one format to the next, but I'm hoping someone can tell me there's a way to simply reference the data differently and avoid a copy.
Off the wall solutions especially encouraged.
Binary layout of SoA and AoS[]/std::vector<AoS> is different, so there is really no way to transform one to another without copy operation.
Code you have is pretty close to optimal - one improvement maybe to pre-allocate vector with expected number of elements. Alternatively try raw array with both constructing whole element and per-property initialization. Changes need to be measured carefully (definitely measure using fully optimized build with array sizes you expect) and weighted against readabilty/correctness of the code.
If you don't need exact binary layout (seem to be that case as you are using vector) you may be able to achieve similarly looking syntax by creating couple custom classes that would expose existing data differently. This will avoid copying altogether.
You would need "array" type (provide indexing/iteration over instance of SoA) and "element" type (initialized with referece to instance of SoA and index, exposing accessors for separate fields at that index)
Rough sketch of code (add iterators,...):
class AoS_Element
{
SoA& soa;
int index;
public:
AoS_Element(SoA& soa, int index) ...
int arr1_data() { return soa.arr1[index];}
int arr2_data() { return soa.arr2[index];}
}
class AoS
{
SoA& soa;
public:
AoS(SoA& _soa):soa(_soa){}
AoS_Element operator[](int index) { return AoS_Element(soa, index);}
}
In the book "Essential C++" (more specifically, part 2.7), the author briefly discusses the usage of template functions with the following example, which displays a diagnostic message and then iterates through the elements of a vector
template <typename T>
void display_message(const string& msg, const vector<T>& vec)
{
cout << msg;
for (int i = 0; i < vec.size(); ++i)
cout << vec[i] << ' ';
}
So, this example got me interested, because i (as many other hobbyist developers, probably) have always taken for granted that in most applications, the standard input/output streams are being used for communication and data processing. The author then mentions that this way of implementing display_message is more flexible. Can you give me an example of a situation where this flexability "shines", so to speak? In other words, is there a case where the optional 3rd parameter takes on another input/output representation (say, an embedded device) or it is just a simple addition that is supposed to be used with, well, simple constructions instead of the extreme situations i am trying to describe?
EDIT: As #Matteo Italia noticed, this is the function declaration
void display_message(const string&, const vector<T>&, ostream& = cout);
You are confusing two "flexibilities" available in this function.
the template part (which I think is the one the author is talking about) allows you to pass any std::vector<T> given that T can be output on the stream; i.e. you can pass a vector of integers, doubles, or even of your custom objects and the function will happily output it on the given stream;1
the stream part (which caught your attention) is instead to allow you to specify any (narrow) output stream for the output part; it's useful because you may want to output your message (and your vector) on some other streams; for example, if it's an error message you'll want cerr; and, most importantly, if you are writing to file, you'll pass your file stream.
Notes
notice that in more "STL2-like" interfaces typically you won't receive a vector like that, but more probably a couple of iterators. Actually, the standard library prefers an even more abstract way to solve this problem (std::ostream_iterator, which allow you to use std::copy to copy data from container iterators to the output stream);
to nitpickers: I know and you won't convince me.
Problem
Suppose I have a large array of bytes (think up to 4GB) containing some data. These bytes correspond to distinct objects in such a way that every s bytes (think s up to 32) will constitute a single object. One important fact is that this size s is the same for all objects, not stored within the objects themselves, and not known at compile time.
At the moment, these objects are logical entities only, not objects in the programming language. I have a comparison on these objects which consists of a lexicographical comparison of most of the object data, with a bit of different functionality to break ties using the remaining data. Now I want to sort these objects efficiently (this is really going to be a bottleneck of the application).
Ideas so far
I've thought of several possible ways to achieve this, but each of them appears to have some rather unfortunate consequences. You don't necessarily have to read all of these. I tried to print the central question of each approach in bold. If you are going to suggest one of these approaches, then your answer should respond to the related questions as well.
1. C quicksort
Of course the C quicksort algorithm is available in C++ applications as well. Its signature matches my requirements almost perfectly. But the fact that using that function will prohibit inlining of the comparison function will mean that every comparison carries a function invocation overhead. I had hoped for a way to avoid that. Any experience about how C qsort_r compares to STL in terms of performance would be very welcome.
2. Indirection using Objects pointing at data
It would be easy to write a bunch of objects holding pointers to their respective data. Then one could sort those. There are two aspects to consider here. On the one hand, just moving around pointers instead of all the data would mean less memory operations. On the other hand, not moving the objects would probably break memory locality and thus cache performance. Chances that the deeper levels of quicksort recursion could actually access all their data from a few cache pages would vanish almost completely. Instead, each cached memory page would yield only very few usable data items before being replaced. If anyone could provide some experience about the tradeoff between copying and memory locality I'd be very glad.
3. Custom iterator, reference and value objects
I wrote a class which serves as an iterator over the memory range. Dereferencing this iterator yields not a reference but a newly constructed object to hold the pointer to the data and the size s which is given at construction of the iterator. So these objects can be compared, and I even have an implementation of std::swap for these. Unfortunately, it appears that std::swap isn't enough for std::sort. In some parts of the process, my gcc implementation uses insertion sort (as implemented in __insertion_sort in file stl_alog.h) which moves a value out of the sequence, moves a number items by one step, and then moves the first value back into the sequence at the appropriate position:
typename iterator_traits<_RandomAccessIterator>::value_type
__val = _GLIBCXX_MOVE(*__i);
_GLIBCXX_MOVE_BACKWARD3(__first, __i, __i + 1);
*__first = _GLIBCXX_MOVE(__val);
Do you know of a standard sorting implementation which doesn't require a value type but can operate with swaps alone?
So I'd not only need my class which serves as a reference, but I would also need a class to hold a temporary value. And as the size of my objects is dynamic, I'd have to allocate that on the heap, which means memory allocations at the very leafs of the recusrion tree. Perhaps one alternative would be a vaue type with a static size that should be large enough to hold objects of the sizes I currently intend to support. But that would mean that there would be even more hackery in the relation between the reference_type and the value_type of the iterator class. And it would mean I would have to update that size for my application to one day support larger objects. Ugly.
If you can think of a clean way to get the above code to manipulate my data without having to allocate memory dynamically, that would be a great solution. I'm using C++11 features already, so using move semantics or similar won't be a problem.
4. Custom sorting
I even considered reimplementing all of quicksort. Perhaps I could make use of the fact that my comparison is mostly a lexicographical compare, i.e. I could sort sequences by first byte and only switch to the next byte when the firt byte is the same for all elements. I haven't worked out the details on this yet, but if anyone can suggest a reference, an implementation or even a canonical name to be used as a keyword for such a byte-wise lexicographical sorting, I'd be very happy. I'm still not convinced that with reasonable effort on my part I could beat the performance of the STL template implementation.
5. Completely different algorithm
I know there are many many kinds of sorting algorithms out there. Some of them might be better suited to my problem. Radix sort comes to my mind first, but I haven't really thought this through yet. If you can suggest a sorting algorithm more suited to my problem, please do so. Preferrably with implementation, but even without.
Question
So basically my question is this:
“How would you efficiently sort objects of dynamic size in heap memory?”
Any answer to this question which is applicable to my situation is good, no matter whether it is related to my own ideas or not. Answers to the individual questions marked in bold, or any other insight which might help me decide between my alternatives, would be useful as well, particularly if no definite answer to a single approach turns up.
The most practical solution is to use the C style qsort that you mentioned.
template <unsigned S>
struct my_obj {
enum { SIZE = S; };
const void *p_;
my_obj (const void *p) : p_(p) {}
//...accessors to get data from pointer
static int c_style_compare (const void *a, const void *b) {
my_obj aa(a);
my_obj bb(b);
return (aa < bb) ? -1 : (bb < aa);
}
};
template <unsigned N, typename OBJ>
void my_sort (const char (&large_array)[N], const OBJ &) {
qsort(large_array, N/OBJ::SIZE, OBJ::SIZE, OBJ::c_style_compare);
}
(Or, you can call qsort_r if you prefer.) Since STL sort inlines the comparision calls, you may not get the fastest possible sorting. If all your system does is sorting, it may be worth it to add the code to get custom iterators to work. But, if most of the time your system is doing something other than sorting, the extra gain you get may just be noise to your overall system.
Since there are only 31 different object variations (1 to 32 bytes), you could easily create an object type for each and select a call to std::sort based on a switch statement. Each call will get inlined and highly optimized.
Some object sizes might require a custom iterator, as the compiler will insist on padding native objects to align to address boundaries. Pointers can be used as iterators in the other cases since a pointer has all the properties of an iterator.
I'd agree with std::sort using a custom iterator, reference and value type; it's best to use the standard machinery where possible.
You worry about memory allocations, but modern memory allocators are very efficient at handing out small chunks of memory, particularly when being repeatedly reused. You could also consider using your own (stateful) allocator, handing out length s chunks from a small pool.
If you can overlay an object onto your buffer, then you can use std::sort, as long as your overlay type is copyable. (In this example, 4 64bit integers). With 4GB of data, you're going to need a lot of memory though.
As discussed in the comments, you can have a selection of possible sizes based on some number of fixed size templates. You would have to have pick from these types at runtime (using a switch statement, for example). Here's an example of the template type with various sizes and example of sorting the 64bit size.
Here's a simple example:
#include <vector>
#include <algorithm>
#include <iostream>
#include <ctime>
template <int WIDTH>
struct variable_width
{
unsigned char w_[WIDTH];
};
typedef variable_width<8> vw8;
typedef variable_width<16> vw16;
typedef variable_width<32> vw32;
typedef variable_width<64> vw64;
typedef variable_width<128> vw128;
typedef variable_width<256> vw256;
typedef variable_width<512> vw512;
typedef variable_width<1024> vw1024;
bool operator<(const vw64& l, const vw64& r)
{
const __int64* l64 = reinterpret_cast<const __int64*>(l.w_);
const __int64* r64 = reinterpret_cast<const __int64*>(r.w_);
return *l64 < *r64;
}
std::ostream& operator<<(std::ostream& out, const vw64& w)
{
const __int64* w64 = reinterpret_cast<const __int64*>(w.w_);
std::cout << *w64;
return out;
}
int main()
{
srand(time(NULL));
std::vector<unsigned char> buffer(10 * sizeof(vw64));
vw64* w64_arr = reinterpret_cast<vw64*>(&buffer[0]);
for(int x = 0; x < 10; ++x)
{
(*(__int64*)w64_arr[x].w_) = rand();
}
std::sort(
w64_arr,
w64_arr + 10);
for(int x = 0; x < 10; ++x)
{
std::cout << w64_arr[x] << '\n';
}
std::cout << std::endl;
return 0;
}
Given the enormous size (4GB), I would seriously consider dynamic code generation. Compile a custom sort into a shared library, and dynamically load it. The only non-inlined call should be the call into the library.
With precompiled headers, the compilation times may actually be not that bad. The whole <algorithm> header doesn't change, nor does your wrapper logic. You just need to recompile a single predicate each time. And since it's a single function you get, linking is trivial.
#define OBJECT_SIZE 32
struct structObject
{
unsigned char* pObject;
bool operator < (const structObject &n) const
{
for(int i=0; i<OBJECT_SIZE; i++)
{
if(*(pObject + i) != *(n.pObject + i))
return (*(pObject + i) < *(n.pObject + i));
}
return false;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
std::vector<structObject> vObjects;
unsigned char* pObjects = (unsigned char*)malloc(10 * OBJECT_SIZE); // 10 Objects
for(int i=0; i<10; i++)
{
structObject stObject;
stObject.pObject = pObjects + (i*OBJECT_SIZE);
*stObject.pObject = 'A' + 9 - i; // Add a value to the start to check the sort
vObjects.push_back(stObject);
}
std::sort(vObjects.begin(), vObjects.end());
free(pObjects);
To skip the #define
struct structObject
{
unsigned char* pObject;
};
struct structObjectComparerAscending
{
int iSize;
structObjectComparerAscending(int _iSize)
{
iSize = _iSize;
}
bool operator ()(structObject &stLeft, structObject &stRight)
{
for(int i=0; i<iSize; i++)
{
if(*(stLeft.pObject + i) != *(stRight.pObject + i))
return (*(stLeft.pObject + i) < *(stRight.pObject + i));
}
return false;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
int iObjectSize = 32; // Read it from somewhere
std::vector<structObject> vObjects;
unsigned char* pObjects = (unsigned char*)malloc(10 * iObjectSize);
for(int i=0; i<10; i++)
{
structObject stObject;
stObject.pObject = pObjects + (i*iObjectSize);
*stObject.pObject = 'A' + 9 - i; // Add a value to the start to work with something...
vObjects.push_back(stObject);
}
std::sort(vObjects.begin(), vObjects.end(), structObjectComparerAscending(iObjectSize));
free(pObjects);
I am new to C++ and data structure, I have code to approximate the nearest neighbors, and for that I implemented a Kd-tree in C++.
My question how can I write the kd-tree into a file and how to read it from that file?
Thanks for any help
See boost::serialization. You may choose between several output formats - plain text, xml, binary
If you're new to C++, you just have to understand what exactly do you need and implement it in a correct simple way. So no boost dependency is needed.
At first - your kd-tree likely stores pointers to objects and do not own them. Consider dumping\loading via structures that actually own objects (that is responsible for their life time), thus avoiding duplicates and leaks. At second - usually trees are not stored in files, instead they are constructed each time you load some geometry because they require more storage than just an array of objects and they can contain duplicates, that you need to track separately.
Thereby, if you figured out who owns your objects, your read\write procedures will look like
int main(int argc, char** argv) {
std::string filename = "geometty_dump.txt"
if (argc == 2) { // filename explicitly provided
filename = *argv[1];
}
ProblemDriver driver; // stores geometry owner\owners
bool res = driver.GetGeometry(filename);
if (res) res = driver.SolveProblem();
if (res) res = driver.DumpGeometry();
return res;
}
In the place where you access geometric data itself (like double x, y;) you must include <iostream>, try to read something about C++ i\o if your question is about it. Objects that own x, y must have friend correspondent functions
ostream& operator<< (ostream out&, const MyPoint& point) {
out << point.x() << point.y() << '\n';
}
ostream& operator>> (istream in&, MyPoint& point) {
double x, y;
in >> x >> y;
point.set(x, y);
}
Meaning you create ofstream and ifstream repectively in ProblemDriver methods (GetGeometry, DumpGeometry) that invoke these functions.
I have to serialize an object that contains a std::vector<unsigned char> that can contain thousand of members, with that vector sizes the serialization doesn't scale well.
According with the documentation, Boost provides a wrapper class array that wraps the vector for optimizations but it generates the same xml output. Diving in boost code, i've found a class named use_array_optimization that seems to control the optimization but is somehow deactivated by default. i've also tried to override the serialize function with no results.
I would like to know how to activate that optimizations since the documents at boost are unclear.
The idea behind the array optimization is that, for arrays of types that can be archived by simply "dumping" their representation as-is to the archive, "dumping" the whole array at once is faster than "dumping" one element after the other.
I understand from your question that you are using the xml archive. The array optimization does not apply in that case because the serialization of the elements implies a transformation anyway.
Finally, I used the BOOST_SERIALIZATION_SPLIT_MEMBER() macro and coded two functions for loading and saving. The Save function looks like:
template<class Archive>
void save(Archive & ar, const unsigned int version) const
{
using boost::serialization::make_nvp;
std::string sdata;
Vector2String(vData, sdata);
ar & boost::serialization::make_nvp("vData", sdata);
}
The Vector2String function simply takes the data in vector and format it to a std::string. The load function uses a function that reverses the encoding.
You have several ways to serialize a vector with Boost Serialization to XML.
From what I read in the comments your are looking for Case 2 below.
I think you cannot change how std::vector is serialized by library after including boost/serialization/vector.hpp, however you can replace the code there by your own and by something close to Case 2.
0. Library default, not optimized
The first is to use the default given by the library, that as far as I know won't optimize anything:
#include <boost/serialization/vector.hpp>
...
std::vector<double> vec(4);
std::iota(begin(vec), end(vec), 0);
std::ofstream ofs{"default.xml", boost::archive::no_header};
boost::archive::xml_oarchive xoa{ofs};
xoa<< BOOST_NVP(vec);
output:
<vec>
<count>4</count>
<item_version>0</item_version>
<item>0.00000000000000000e+00</item>
<item>1.00000000000000000e+00</item>
<item>2.00000000000000000e+00</item>
<item>3.00000000000000000e+00</item>
</vec>
1. Manually, use that data is contiguous
#include <boost/serialization/array_wrapper.hpp> // for make_array
...
std::ofstream ofs{"array.xml"};
boost::archive::xml_oarchive xoa{ofs, boost::archive::no_header};
auto const size = vec.size();
xoa<< BOOST_NVP(size) << boost::serialization::make_nvp("data", boost::serialization::make_array(vec.data(), vec.size()));
output:
<size>4</size>
<data>
<item>0.00000000000000000e+00</item>
<item>1.00000000000000000e+00</item>
<item>2.00000000000000000e+00</item>
<item>3.00000000000000000e+00</item>
</data>
2. Manually, use that data is binary and contiguous
#include <boost/serialization/binary_object.hpp>
...
std::ofstream ofs{"binary.xml"};
boost::archive::xml_oarchive xoa{ofs, boost::archive::no_header};
auto const size = vec.size();
xoa<< BOOST_NVP(size) << boost::serialization::make_nvp("binary_data", boost::serialization::make_binary_object(vec.data(), vec.size()*sizeof(double)));
<size>4</size>
<binary_data>
AAAAAAAAAAAAAAAAAADwPwAAAAAAAABAAAAAAAAACEA=
</binary_data>
I think this makes the XML technically not portable.