Write/Read a kdtree into a file - c++

I am new to C++ and data structure, I have code to approximate the nearest neighbors, and for that I implemented a Kd-tree in C++.
My question how can I write the kd-tree into a file and how to read it from that file?
Thanks for any help

See boost::serialization. You may choose between several output formats - plain text, xml, binary

If you're new to C++, you just have to understand what exactly do you need and implement it in a correct simple way. So no boost dependency is needed.
At first - your kd-tree likely stores pointers to objects and do not own them. Consider dumping\loading via structures that actually own objects (that is responsible for their life time), thus avoiding duplicates and leaks. At second - usually trees are not stored in files, instead they are constructed each time you load some geometry because they require more storage than just an array of objects and they can contain duplicates, that you need to track separately.
Thereby, if you figured out who owns your objects, your read\write procedures will look like
int main(int argc, char** argv) {
std::string filename = "geometty_dump.txt"
if (argc == 2) { // filename explicitly provided
filename = *argv[1];
}
ProblemDriver driver; // stores geometry owner\owners
bool res = driver.GetGeometry(filename);
if (res) res = driver.SolveProblem();
if (res) res = driver.DumpGeometry();
return res;
}
In the place where you access geometric data itself (like double x, y;) you must include <iostream>, try to read something about C++ i\o if your question is about it. Objects that own x, y must have friend correspondent functions
ostream& operator<< (ostream out&, const MyPoint& point) {
out << point.x() << point.y() << '\n';
}
ostream& operator>> (istream in&, MyPoint& point) {
double x, y;
in >> x >> y;
point.set(x, y);
}
Meaning you create ofstream and ifstream repectively in ProblemDriver methods (GetGeometry, DumpGeometry) that invoke these functions.

Related

saving object information into a binary file

I m trying to save all the member variables of an object in a binary file. However, the member variables are vectors that is dynamically allocated. So, is there any way to combine all the data and save it in a binary file. As of now, it just saves the pointer, which is of little help. Following is my running code.
#include <vector>
#include <iostream>
#include <fstream>
class BaseSaveFile {
protected:
std::vector<float> first_vector;
public:
void fill_vector(std::vector<float> fill) {
first_vector = fill;
}
void show_vector() {
for ( auto x: first_vector )
std::cout << x << std::endl;
}
};
class DerivedSaveFile : public BaseSaveFile {
};
int main ( int argc, char **argv) {
DerivedSaveFile derived;
std::vector<float> fill;
for ( auto i = 0; i < 10; i++) {
fill.push_back(i);
}
derived.fill_vector(fill);
derived.show_vector();
std::ofstream save_object("../save_object.bin", std::ios::out | std::ios::binary);
save_object.write((char*)&derived, sizeof(derived));
}
Currently size of the binary file is just 24 bytes. But I was execting much larger because of the vector of 10 floats.
"is there any way to combine all the data and save it in a binary file" - of course there is. You write code to iterate over all the data and convert it into a form suitable for writing to a file (that you know how to later parse when reading it back in). Then you write code to read the file, parse it into meaningful variables classes and construct new objects from the read-in data. There's no built-in facility for it, but it's not rocket science - just a bunch of work/code you need to do.
It's called serialisation/de-serialisation btw, in case you want to use your preferred search engine to look up more details.
The problem
You can write the exact binary content of an object to a file:
save_object.write((char*)&derived, sizeof(derived));
However, it is not guaranteed that you you read it back into memory with the reverse read operation. This is only possible for a small subset of objects that have a trivially copyable type and do not contain any pointer.
You can verify if your type matches this definition with std::is_trivially_copyable<BaseSaveFile>::value but I can already tell you that it's not because of the vector.
To simplify a bit the formal definition, trivially copyable types are more or less the types that are composed only of other trivially copiable elements and very elementary data types such as int, float, char, or fixed-size arrays.
The solution: introduction to serialization
The general solution, as mentionned int he other response it called serialization. But for a more tailored answer, here is how it would look like.
You would add the following public method to your type:
std::ostream& save(std::ostream& os){
size_t vsize=first_vector.size();
os.write((char*)&vsize, sizeof(vsize));
os.write((char*)first_vector.data(), vsize*sizeof(float));
return os;
}
This method has access to all the members and can write them to the disk. For the case of the vector, you'd first write down its size (so that you know how big it is when you'll read the file later on).
You would then add the reverse method:
std::istream& load(std::istream& is){
size_t vsize;
if(is.read((char*)&vsize, sizeof(vsize))) {
first_vector.resize(vsize);
is.read((char*)first_vector.data(), vsize*sizeof(float));
}
return is;
}
Here the trick is to first read the size of the vector on disk, and then resize the vector before loading it.
Note the use of istream and ostream. This allows you to store the data on a file, but you could use any other kind of stream such as in memory string stream if you want.
Here a full online example (it uses stringstream because the online service doesn't provide for files to be written).
More serialization ?
There are some serialization tricks to know. First, if you have derived types, you'd need to make load() and save() virtual and provide the derived types with their own overridden version.
If one of your data member is not trivially copyable, it would need its own load() and save() that you could then invoke recursively. Or you'd need to handle the thing yourself, which is only possible if you can access all the members you'd need to restore its state.
Finally, you don't need to reinvent the wheel. There are some libraries outside that may help, like boost serialisation or cereal

C++: Member functions that read and write from txt file

I apologize in advance for how poorly this question is asked, I'm really struggling here.
I am writing a class named Point in C++ with private members x and y, and member functions getX, getY, setX, setY, read and write. I have been able to do everything except read and write, as I am awful with input/output files. I have the following declaration for read and write:
void read(istream& ins);
void write(ostream& outs);
The RME is as follows for read:
* Requires: ins is in good state.
* Modifies: ins, x, y.
* Effects: Reads point in form (x,y)
and for write:
* Requires: outs is in good state.
* Modifies: outs.
* Effects: Writes point in form (x,y).
'read' takes ordered points like (1, 5), (2, 7), etc. from a given file "data1.txt" and extracts the x and y components (at least, I believe this is what it should do). I was provided with a test suite for read:
void test_point() {
Point pt1;
pt1.setX(15);
cout << "pt1 is: " << pt1 << endl;
ifstream input_file;
input_file.open("data1.txt");
pt1.read(input_file);
cout << "pt1 is: " << pt1 << endl;
return;}
I really have no idea how to write the read function. I have tried defining characters a, b, c, and integers u, v, and executing:
ins >> a >> u >> b >> v >> c;
but that didn't work. Could someone please help me see how to implement this?
Quite a few things missing from your question, that you will need such that the use of this class to be viable. For one, reading a file of ordered points should not be implemented as a member function. If anything, you could use a loop as such:
while(input_file) {
// set a point to have members x, y that were read from file
// store this point in a vector of points
}
Otherwise there is no reasonable way to store a bunch of points that you have read from a file.
Your idea for this solution within the read function should definitely work (assuming chars a, b, c, and ints u, v):
input_stream >> a >> u >> b >> v >> c;
In fact, with the format that you have given us (taking an ifstream as an argument), there really is no need for be much else in the read function, as it mutates the object and doesn't need to return anything. Your implementation is the simplest way to parse such "records" in a structured file.
After you read all these, set the x coordinate of the caller object to u, and the y to v. This should be a void function, does not need to return anything but should alter the Point object that it is being called for. This member function should be called on a temporary point object in the loop that I mentioned, then add that object to a vector of points as such:
vector<Point> points;
while(...) {
// declare Point
// initialize with values from read
points.push_back(//the point you just created);
}
If in fact you need to be able to read multiple points.
In summary, your read function needs:
To check that the ifstream is actually good first of all, which I didn't mention yet (but this is one of your requirements):
if (!input_file) { //however you want to handle this error }
The temporary char and int variables (serving as buffers) to read into (by the way, you can even read straight into x and y instead of reading into u and v then copying, just saying).
If you choose to not read straight into x and y, you must assign the u and v values to x and y. Now your object is complete.
As for the write function, instead of std::cout you will use the name of the ofstream that was passed as an argument to the write function, and write the records with the format that you have shown. This is (in essence) no different from printing to a console, except the output is on a text file.
Note: Make sure you understand the difference between iostream objects (istream, ostream) and fstream (ifstream, ofstream, fstream) objects, which are preferable in this case.

Comparing objects in C and C++

I'm working on a project where I should instrument a program (written in C and C++) by inserting a print statement before
the statements that respect some criteria. Then, I should compare those values for different executions.
Since in C there are structures, while in C++ one can also define classes, I was wondering if there is a particular method that:
Permits to print primitives as well as complex data structures.
Permits to compare those values, for different executions, based on the format used by the print module (point 1.).
Just an example to clarify my question. Suppose that I have two different executions with this data structure:
struct Point {
int x, y;
}
int main() {
int k = random();
Point p = foo(k);
some_print(p); // Print the value of 'p' in a file
return 0;
}
and then, another module will compare the two values of the point 'p' generated with the two executions.
The pragmatic C++-way of printing an object is defining a friend function:
std::ostream& operator<<(std::ostream& os, const Point& point) {
return os << "(" << point.x << "," << point.y << ")";
}
It's usually class-specific so you need to implement it yourself; however, you might use some form of reflection. Particularly interesting is a CppCon-talk from Antony Polukhin [1] which gives reflection for POD types (like Point above). Generic reflection without external tools is N/A yet (as of 2016), there's a proposal on it. If you can't / don't want to wait, you can do multiple things:
Parse C++ code: ctags comes to mind.
Macros: It's relatively easy to write a FIELDS macro that defines a reflection class and the fields.
FIELDS(
(int)x,
(int)y
)
Tuples: Works only if you define all your fields on the same inheritance level. Inherit privately from a std::tuple<> which contains all your fields. Make const and optionally non-const getters for fields in terms of std::get<>. Then you can iterate over the types of your tuple.
(Would love to add more - pls. write comments if you have ideas.)
All the reflection methods also give you operator==() basically for free. Note that it's more pragmatic to add operator<() when possible. The former can be defined in terms of the first (albeit suboptimally: a == b iff !(a < b) && !(b < a) ) and the latter gives you std::set<> and std::map<>. Or you can do all the comparisons in terms of reflection.
[1] https://www.youtube.com/watch?v=abdeAew3gmQ
what you could do in c++, is to implement an equals method in your specific class.
That way what you could do is have a boolean equals() method, that checks if the objects are similar.
object1.equals(object2) could return either true or false.
to give an example with this answer, take a look at the following(an example i found online):
class car
{
private:
std::string m_make;
std::string m_model;
bool operator== (const Car &c1, const Car &c2)
{
return (c1.m_make== c2.m_make &&
c1.m_model== c2.m_model);
}
}
something like this should be implemented in your own class.

Passing multiple variables back from a single function?

I have an assignment (see below for question) for a beginners c++ class, where i am asked to pass 2 values back from a single function. I am pretty sure of my understanding of how to use functions and the general structure of what the program should be, but i am having trouble fingin how to pass two variables back to "main" from the function.
Assignment:
Write a program that simulates an airplane race. The program will display a table showing the speed in km/hour and distance in km traveled by two airplanes every second until one of them has gone 10 kilometers.
These are the requirements for the program:
-The program will use a function that has the following parameters: time and acceleration.
-The function will pass back two data items: speed and distance.
You have two options (well, three really, but I'm leaving pointers out).
Take references to output arguments and assign them within the function.
Return a data structure which contains all of the return values.
Which option is best depends on your program. If this is a one off function that isn't called from many places then you may chose to use option #1. I assume by "speed" you mean the "constant velocity" which is reached after "time" of acceleration.
void calc_velocity_profile(double accel_time,
double acceleration,
double &out_velocity, // these last two are
double &out_distance); // assigned in the function
If this is a more general purpose function and/or a function which will be called by many clients I would probably prefer option #2.
struct velocity_profile {
double velocity;
double distance;
};
velocity_profile calc_velocity_profile(double accel_time, double acceleration);
Everything being equal, I prefer option 1. Given the choice, I like a function which returns a value instead of a function which mutates its input.
2017 Update: This is discussed in the C++ Core Guidelines :
F.21 To return multiple "out" values, prefer returning a tuple or struct
However, I would lean towards returning a struct over a tuple due to named, order-independent access that is encapsulated and reusable as a explicit strong type.
In the special case of returning a bool and a T, where the T is only filled if the bool is true , consider returning a std::optional<T>. See this CPPCon17 video for an extended discussion.
Struct version:
struct SpeedInfo{
float speed;
float distance;
};
SpeedInfo getInfo()
{
SpeedInfo si;
si.speed = //...
si.distance = //...
return si;
}
The benefit of this is that you get an encapsulated type with named access.
Reference version:
void getInfo(float& speed, float& distance)
{
speed = //...
distance = //...
}
You have to pass in the output vars:
float s;
float d;
getInfo(s, d);
Pointer version:
void getInfo(float* speed, float* distance)
{
if(speed)
{
*speed = //...
}
if(distance)
{
*distance= //...
}
}
Pass the memory address of the output variable:
float s;
float d;
getInfo(&s, &d);
Pointer version is interesting because you can just pass a nullptr/NULL/0 for things you aren't interested in; this can become useful when you are using such a function that potentially takes a lot of params, but are not interested in all the output values. e.g:
float d;
getInfo(nullptr, &d);
This is something which you cant do with references, although they are safer.
There is already such a data structure in C++ that is named as std::pair. It is declared in header <utility>. So the function could look the following way
std::pair<int, int> func( int time, int acceleration )
{
// some calculations
std::pair<int, int> ret_value;
ret_value.first = speed_value;
ret_value.second = distance_value;
return ( ret_value );
}

Sort objects of dynamic size

Problem
Suppose I have a large array of bytes (think up to 4GB) containing some data. These bytes correspond to distinct objects in such a way that every s bytes (think s up to 32) will constitute a single object. One important fact is that this size s is the same for all objects, not stored within the objects themselves, and not known at compile time.
At the moment, these objects are logical entities only, not objects in the programming language. I have a comparison on these objects which consists of a lexicographical comparison of most of the object data, with a bit of different functionality to break ties using the remaining data. Now I want to sort these objects efficiently (this is really going to be a bottleneck of the application).
Ideas so far
I've thought of several possible ways to achieve this, but each of them appears to have some rather unfortunate consequences. You don't necessarily have to read all of these. I tried to print the central question of each approach in bold. If you are going to suggest one of these approaches, then your answer should respond to the related questions as well.
1. C quicksort
Of course the C quicksort algorithm is available in C++ applications as well. Its signature matches my requirements almost perfectly. But the fact that using that function will prohibit inlining of the comparison function will mean that every comparison carries a function invocation overhead. I had hoped for a way to avoid that. Any experience about how C qsort_r compares to STL in terms of performance would be very welcome.
2. Indirection using Objects pointing at data
It would be easy to write a bunch of objects holding pointers to their respective data. Then one could sort those. There are two aspects to consider here. On the one hand, just moving around pointers instead of all the data would mean less memory operations. On the other hand, not moving the objects would probably break memory locality and thus cache performance. Chances that the deeper levels of quicksort recursion could actually access all their data from a few cache pages would vanish almost completely. Instead, each cached memory page would yield only very few usable data items before being replaced. If anyone could provide some experience about the tradeoff between copying and memory locality I'd be very glad.
3. Custom iterator, reference and value objects
I wrote a class which serves as an iterator over the memory range. Dereferencing this iterator yields not a reference but a newly constructed object to hold the pointer to the data and the size s which is given at construction of the iterator. So these objects can be compared, and I even have an implementation of std::swap for these. Unfortunately, it appears that std::swap isn't enough for std::sort. In some parts of the process, my gcc implementation uses insertion sort (as implemented in __insertion_sort in file stl_alog.h) which moves a value out of the sequence, moves a number items by one step, and then moves the first value back into the sequence at the appropriate position:
typename iterator_traits<_RandomAccessIterator>::value_type
__val = _GLIBCXX_MOVE(*__i);
_GLIBCXX_MOVE_BACKWARD3(__first, __i, __i + 1);
*__first = _GLIBCXX_MOVE(__val);
Do you know of a standard sorting implementation which doesn't require a value type but can operate with swaps alone?
So I'd not only need my class which serves as a reference, but I would also need a class to hold a temporary value. And as the size of my objects is dynamic, I'd have to allocate that on the heap, which means memory allocations at the very leafs of the recusrion tree. Perhaps one alternative would be a vaue type with a static size that should be large enough to hold objects of the sizes I currently intend to support. But that would mean that there would be even more hackery in the relation between the reference_type and the value_type of the iterator class. And it would mean I would have to update that size for my application to one day support larger objects. Ugly.
If you can think of a clean way to get the above code to manipulate my data without having to allocate memory dynamically, that would be a great solution. I'm using C++11 features already, so using move semantics or similar won't be a problem.
4. Custom sorting
I even considered reimplementing all of quicksort. Perhaps I could make use of the fact that my comparison is mostly a lexicographical compare, i.e. I could sort sequences by first byte and only switch to the next byte when the firt byte is the same for all elements. I haven't worked out the details on this yet, but if anyone can suggest a reference, an implementation or even a canonical name to be used as a keyword for such a byte-wise lexicographical sorting, I'd be very happy. I'm still not convinced that with reasonable effort on my part I could beat the performance of the STL template implementation.
5. Completely different algorithm
I know there are many many kinds of sorting algorithms out there. Some of them might be better suited to my problem. Radix sort comes to my mind first, but I haven't really thought this through yet. If you can suggest a sorting algorithm more suited to my problem, please do so. Preferrably with implementation, but even without.
Question
So basically my question is this:
“How would you efficiently sort objects of dynamic size in heap memory?”
Any answer to this question which is applicable to my situation is good, no matter whether it is related to my own ideas or not. Answers to the individual questions marked in bold, or any other insight which might help me decide between my alternatives, would be useful as well, particularly if no definite answer to a single approach turns up.
The most practical solution is to use the C style qsort that you mentioned.
template <unsigned S>
struct my_obj {
enum { SIZE = S; };
const void *p_;
my_obj (const void *p) : p_(p) {}
//...accessors to get data from pointer
static int c_style_compare (const void *a, const void *b) {
my_obj aa(a);
my_obj bb(b);
return (aa < bb) ? -1 : (bb < aa);
}
};
template <unsigned N, typename OBJ>
void my_sort (const char (&large_array)[N], const OBJ &) {
qsort(large_array, N/OBJ::SIZE, OBJ::SIZE, OBJ::c_style_compare);
}
(Or, you can call qsort_r if you prefer.) Since STL sort inlines the comparision calls, you may not get the fastest possible sorting. If all your system does is sorting, it may be worth it to add the code to get custom iterators to work. But, if most of the time your system is doing something other than sorting, the extra gain you get may just be noise to your overall system.
Since there are only 31 different object variations (1 to 32 bytes), you could easily create an object type for each and select a call to std::sort based on a switch statement. Each call will get inlined and highly optimized.
Some object sizes might require a custom iterator, as the compiler will insist on padding native objects to align to address boundaries. Pointers can be used as iterators in the other cases since a pointer has all the properties of an iterator.
I'd agree with std::sort using a custom iterator, reference and value type; it's best to use the standard machinery where possible.
You worry about memory allocations, but modern memory allocators are very efficient at handing out small chunks of memory, particularly when being repeatedly reused. You could also consider using your own (stateful) allocator, handing out length s chunks from a small pool.
If you can overlay an object onto your buffer, then you can use std::sort, as long as your overlay type is copyable. (In this example, 4 64bit integers). With 4GB of data, you're going to need a lot of memory though.
As discussed in the comments, you can have a selection of possible sizes based on some number of fixed size templates. You would have to have pick from these types at runtime (using a switch statement, for example). Here's an example of the template type with various sizes and example of sorting the 64bit size.
Here's a simple example:
#include <vector>
#include <algorithm>
#include <iostream>
#include <ctime>
template <int WIDTH>
struct variable_width
{
unsigned char w_[WIDTH];
};
typedef variable_width<8> vw8;
typedef variable_width<16> vw16;
typedef variable_width<32> vw32;
typedef variable_width<64> vw64;
typedef variable_width<128> vw128;
typedef variable_width<256> vw256;
typedef variable_width<512> vw512;
typedef variable_width<1024> vw1024;
bool operator<(const vw64& l, const vw64& r)
{
const __int64* l64 = reinterpret_cast<const __int64*>(l.w_);
const __int64* r64 = reinterpret_cast<const __int64*>(r.w_);
return *l64 < *r64;
}
std::ostream& operator<<(std::ostream& out, const vw64& w)
{
const __int64* w64 = reinterpret_cast<const __int64*>(w.w_);
std::cout << *w64;
return out;
}
int main()
{
srand(time(NULL));
std::vector<unsigned char> buffer(10 * sizeof(vw64));
vw64* w64_arr = reinterpret_cast<vw64*>(&buffer[0]);
for(int x = 0; x < 10; ++x)
{
(*(__int64*)w64_arr[x].w_) = rand();
}
std::sort(
w64_arr,
w64_arr + 10);
for(int x = 0; x < 10; ++x)
{
std::cout << w64_arr[x] << '\n';
}
std::cout << std::endl;
return 0;
}
Given the enormous size (4GB), I would seriously consider dynamic code generation. Compile a custom sort into a shared library, and dynamically load it. The only non-inlined call should be the call into the library.
With precompiled headers, the compilation times may actually be not that bad. The whole <algorithm> header doesn't change, nor does your wrapper logic. You just need to recompile a single predicate each time. And since it's a single function you get, linking is trivial.
#define OBJECT_SIZE 32
struct structObject
{
unsigned char* pObject;
bool operator < (const structObject &n) const
{
for(int i=0; i<OBJECT_SIZE; i++)
{
if(*(pObject + i) != *(n.pObject + i))
return (*(pObject + i) < *(n.pObject + i));
}
return false;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
std::vector<structObject> vObjects;
unsigned char* pObjects = (unsigned char*)malloc(10 * OBJECT_SIZE); // 10 Objects
for(int i=0; i<10; i++)
{
structObject stObject;
stObject.pObject = pObjects + (i*OBJECT_SIZE);
*stObject.pObject = 'A' + 9 - i; // Add a value to the start to check the sort
vObjects.push_back(stObject);
}
std::sort(vObjects.begin(), vObjects.end());
free(pObjects);
To skip the #define
struct structObject
{
unsigned char* pObject;
};
struct structObjectComparerAscending
{
int iSize;
structObjectComparerAscending(int _iSize)
{
iSize = _iSize;
}
bool operator ()(structObject &stLeft, structObject &stRight)
{
for(int i=0; i<iSize; i++)
{
if(*(stLeft.pObject + i) != *(stRight.pObject + i))
return (*(stLeft.pObject + i) < *(stRight.pObject + i));
}
return false;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
int iObjectSize = 32; // Read it from somewhere
std::vector<structObject> vObjects;
unsigned char* pObjects = (unsigned char*)malloc(10 * iObjectSize);
for(int i=0; i<10; i++)
{
structObject stObject;
stObject.pObject = pObjects + (i*iObjectSize);
*stObject.pObject = 'A' + 9 - i; // Add a value to the start to work with something...
vObjects.push_back(stObject);
}
std::sort(vObjects.begin(), vObjects.end(), structObjectComparerAscending(iObjectSize));
free(pObjects);