Writing objects to hard disk files in C++ - c++

I want to write an instance of a class that includes different data types into hard disk and read it whenever I need.
I used the below code to do this. The problem is that whenever I save the object into a file, it creates a file on the folder but it is just size of 1 KB. Also when I open the file from the same function that saves the file, I can read variables in the class, but when I move the read section to another function and open the file from there, variables cannot be read. How can I fix the problem? Thanks in advance.
Write to a file:
stream.open("configuration/KMeansModel.bin", std::ios::out | std::ios::binary);
stream.write((char *)& kmeans, sizeof(kmeans));
stream.close();
Read from the file:
KMeans::KMeans kmeans_(umapFeatureLabel_);
stream_.open("configuration/KMeansModel.bin", std::ios::in, std::ios::binary);
stream_.read((char *)& kmeans_, sizeof(kmeans_));
stream_.close();
Class definition:
class KMeans
{
private:
int m_K;
int m_iters;
int m_dimensions;
int m_total_features;
std::vector<Cluster> m_clusters;
std::unordered_map<std::string, std::string> m_umapFeatureLabel;
std::unordered_map<int, std::vector<std::vector<long double>>> m_umapClusterFeatureList;
int getNearestClusterId(Feature feature);
public:
KMeans::KMeans::KMeans();
KMeans(std::unordered_map<std::string, std::string>& umapFeatureLabel);
void run(std::vector<Feature>& allFeatures);
void predict(Feature feature);
void updateKMeans(std::vector<Feature>& allNewFeaturesRead);
std::string getLabelOfFeature(std::string feature);
};

The bad news:
Your file saving code uses function sizeof. Your data structure includes vector and map objects.
For example, as far as sizeof is concerned, a std::vector object takes 16 bytes, absolutely regardless of the number of elements. That's 8 bytes for the element count, plus 8 bytes for the pointer to the actual elements, assuming a 64 bits machine.
Say your vector has 100 elements, 8 bytes per element, and the elements are stored starting at memory address 424000. The write method will dutifully store into the file a) the number 100 and b) the number 424000; but it will make absolutely no attempt to save into the file memory locations from 424000 to 424800. For it has no way to know that 424000 is a pointer; that's just a number.
Hence, the file does not contain the information that is necessary to restore the vector state.
As mentioned in the comments above, the subject of saving complex pointer-based data structures into simple byte arrays for the purpose of file storage or network transmission is known as serialization or marshalling/unmarshalling.
It is a non obvious subject of its own, in the same way as sorting algorithms or matrix multiplication are non obvious subjects. It would probably take you a lot of time to come up with a properly debugged solution of your own, a solution that takes care of maintaining consistency between saving and restoring code, etc ...
The good news:
Serialization is a non-obvious subject, but it is also an old, well-known subject. So instead of painfully coming up with your own solution, you can rely on existing, publicly available code.
In similar fashion, the only situations where you would have to come up with your own matrix multiplication code is when:
you're doing it purely for fun and/or self-training
you are writing a PhD thesis on matrix multiplication
you're paid to write linear algebra code
Other than these, you would probably rely on say existing LAPACK code.
Regarding serialization, as hinted to by Botje in the comments above, the Boost web site provides a C++ serialization library, along with a suitable tutorial.
Sample code:
I am providing below a small code sample using the Boost library. A simple guinea pig object contains an integer value, a string and a map. Of course, I am shamelessly borrowing from the Boost tutorial.
We need to include a couple of header files:
#include <map>
#include <fstream>
#include <iostream>
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/serialization/utility.hpp>
#include <boost/serialization/map.hpp>
The object class, which pretends to store some token geographical info:
class CapitalMap
{
public:
CapitalMap(const std::string& myName, int myVersion) :
_name(myName), _version(myVersion)
{};
CapitalMap() = default; // seems required by serialization
inline void add(const std::string& country, const std::string& city)
{ _cmap[country] = city; }
void fdump(std::ostream& fh);
private:
std::string _name;
int _version;
std::map<std::string, std::string> _cmap;
friend class boost::serialization::access; // ALLOW FOR FILE ARCHIVAL
template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
ar & _name;
ar & _version; // mind the name conflict with plain "version" argument
ar & _cmap;
}
};
A small debugging utility function:
void CapitalMap::fdump(std::ostream& ofh) // text dumping utility for debug
{
ofh << "CapitalMap name = \"" << _name << "\" version = " <<
_version << '\n';
for (const auto& pair : _cmap) {
auto country = pair.first; auto city = pair.second;
ofh << city << " is the capital of " << country << '\n';
}
}
Code to create the object, save it on disk, and (implicitely) deallocate it:
void buildAndSaveCapitalMap (const std::string& archiveName,
const std::string& mapName,
int version)
{
CapitalMap euroCapitals(mapName, version);
euroCapitals.add("Germany", "Berlin");
euroCapitals.add("France", "Paris");
euroCapitals.add("Spain", "Madrid");
euroCapitals.fdump(std::cout); // just for checking purposes
// save data to archive file:
std::ofstream ofs(archiveName);
boost::archive::text_oarchive oa(ofs);
oa << euroCapitals;
// ofstream connexion closed automatically here
// archive object deleted here - because going out of scope
// CapitalMap object deleted here - because going out of scope
}
Small main program to create the file and then restore the object state from that file:
int main(int argc, char* argv[])
{
const std::string archiveName{"capitals.dat"};
std::cout << std::endl;
buildAndSaveCapitalMap(archiveName, "EuroCapitals", 42);
// go restore our CapitalMap object to its original state:
CapitalMap cm; // object created in its default state
std::ifstream ifs(archiveName);
boost::archive::text_iarchive inAr(ifs);
inAr >> cm; // read back object ...
std::cout << std::endl;
cm.fdump(std::cout); // check that it's actually back and in good shape ...
std::cout << std::endl;
return 0;
}
The problem of maintaining consistency between saving and restoring code is brilliantly solved by altering the meaning of operator “&” according to the direction of travel.
Minor problems along the way:
on a Linux distro, you need to get packages: boost, boost-devel, boost-serialization
it seems the object class needs to have a default constructor
you need to include files such as “boost/serialization/map.hpp” manually
Program execution:
$ g++ serialw00.cpp -lboost_serialization -o ./serialw00.x
$ ./serialw00.x
CapitalMap name = "EuroCapitals" version = 42
Paris is the capital of France
Berlin is the capital of Germany
Madrid is the capital of Spain
CapitalMap name = "EuroCapitals" version = 42
Paris is the capital of France
Berlin is the capital of Germany
Madrid is the capital of Spain
$
More details here: SO_q_523872

Related

Get variable value from its name as a string at runtime in C++

I'm trying to reproduce (*) something similar to Python fstring, or at least its format function (and while at it, I'd like to implement something like its "Mini-language").
(*) N.B.: please note that I am aware of the existence of the standard lib's format library, as well as the existence of the {fmt} library; but,
a: neither the g++ (11.2.1) nor the clang++ (12.0.1) that I have on my machine can compile code including <format>, and
b: I don't want to use the excellent {fmt} lib, because I'm precisely trying to do my own thing/thingy.
I'm going to use a string in input to my format object, and any number of additional arguments, like that:
// First, some vars
std::string stef{"Stéphane"};
std::string cpp{"C++"};
int ilu3t{3000};
// Then the big deal
std::string my_fstring = badabwe::format(
"My name is {stef}, and I love {cpp} {ilu3t} !",
cpp,
stef,
ilu3t
);
// Obviously, only the 1st parameter is positional!
// my_fstring should now be:
// My name is Stephane, and I love C++ 3000 !
That's one of the first problem, I have to solve. I think this process is called reflection (please let me know if it's the case).
Next I need to handle a variable number of arguments; the 1st parameter is the only positional and mandatory one (I'm still trying to find a way to iterate over a parameter pack), but its a subject for another question.
A function is not aware of name of parameters passed it. The parameter doen't even have to have a name:
void foo(int x); // name of the argument is x
foo(42); // 42 has no name
As suggested in a comment, if you want some mapping between strings (values to be replaced) and strings (their names) then you can use a map. To avoid the caller to spell out this mapping you can use a macro (usually to be avoided, but for now its the only way to get the name of a variable as a string):
#include <iostream>
#include <string>
#include <unordered_map>
using token_t = std::unordered_map<std::string,std::string>;
std::string format(const std::string& tokenized,const token_t& token) {
return "test";
}
#define tokenize(token) { #token , to_string(token) }
using std::to_string;
std::string to_string(const std::string& str) { return str; }
int main() {
std::string stef{"Stéphane"};
std::string cpp{"C++"};
int ilu3t{3000};
std::string my_fstring = format(
"My name is {stef}, and I love {cpp} {ilu3t} !",
{
tokenize(cpp),
tokenize(stef),
tokenize(ilu3t)
}
);
}
I assumed that you can use std::to_string, though there is no std::to_string(const std::string&) hence I added a custom implementation.

Is it possible to load/read shape_predictor_68_face_landmarks.dat at compile time?

I am trying to build a C++ application in Visual Studio using DLIB's face_landmark_detection_ex.cpp. The build application run from command promt and trained model and image file is passed as arguments.
face_landmark_detection_ex.exe shape_predictor_68_face_landmarks.dat image.jpg
this shape_predictor_68_face_landmarks.dat is the trained model for 68 landmarks to perform detection on input image and needs to load at run-time every time to perform any detection. I am trying to do following things.
Load this shape_predictor_68_face_landmarks.dat at building the application or compile time.
Read this shape_predictor_68_face_landmarks.dat inside the code so that every time my application strarts its execution, it will not take more amount of memory.
Is there any way to pack this file inside my application so that it will take less physical memory to run.
Update:
How can I store this shape_predictor_68_face_landmarks.dat file in a static buffer so that every time shape_predictor can read from this buffer.
Yes, its possible, but depends on Visual Studio and not cross-platform
You should create resource file and include hape_predictor_68_face_landmarks.dat into your project. See https://msdn.microsoft.com/ru-ru/library/7zxb70x7.aspx for details. This will make compiler to put this file into your exe/dll
Open resoure at runtime and get memory pointer https://msdn.microsoft.com/en-us/library/windows/desktop/ee719660(v=vs.85).aspx
Create memory stream (std::istream) from pointer.
deserialize from this stream with dlib::deserialize
Here is minimal example, but without resource reading:
#include <string>
#include <iostream>
#include <dlib/image_processing/shape_predictor.h>
struct membuf : std::streambuf {
membuf(char const* base, size_t size) {
char* p(const_cast<char*>(base));
this->setg(p, p, p + size);
}
};
struct imemstream : virtual membuf, std::istream {
imemstream(char const* base, size_t size)
: membuf(base, size)
, std::istream(static_cast<std::streambuf*>(this)) {
}
};
using namespace dlib; //its important to use namespace dlib for deserialize work correctly
using namespace std;
int main(int argc, const char* argv[])
{
const char* file_name = "shape_predictor_68_face_landmarks.dat";
ifstream fs(file_name, ios::binary | ios::ate);
streamsize size = fs.tellg();
fs.seekg(0, ios::beg);
std::vector<char> buffer(size);
if (fs.read(buffer.data(), size))
{
cout << "Successfully read " << size << " bytes from " << file_name << " into buffer" << endl;
imemstream stream(&buffer.front(), size); // here we are loading from memory buffer. you can change this line to use pointer from Resource
shape_predictor sp;
deserialize(sp, stream);
cout << "Deserialized shape_predictor" << endl;
}
else cout << "Failed to read " << file_name << " into buffer" << endl;
return 0;
}
And about memory usage.
First of all you should know that shape_predictor::operator() is const, and the documentation says that is safe to use one shape_predictor for different threads.
So, you can create one shape_predictor at the start of program and use it many times, even from different threads
Next, putting shape predictor inside resource will make it be loaded into RAM when program starts, but deserializing it from resource will make copy of this memory, and this will lead to RAM usage overhead. If you need minimal possible RAM usage - you should load it from file
And the last your question - how to initialize it by compiler. There is no ready-to-use solution for it, but you can use the code from shape_predictor.h/deserialize function and load it manually. I think, this is bad solution, because you will not get less RAM usage compared to loading file
So my recommendation is to load one shape_predictor from file and use it globally for all threads
I know this is an old question, but a Visual Studio only solution would not have worked in my case since I am using dlib in Linux/macOS. Here is a Unix compatible solution that I came up with.
What I did was to use the xxd tool to convert the model file into the unsigned char [] representation of the file contents, write that into a custom header file, and use that inside deserialize (rather than read in the file during execution).
The following command would generate the header file for shape_predictor_68_face_landmarks.dat:
xxd -i shape_predictor_68_face_landmarks.dat > shape_predictor_68_face_landmarks.hpp
If you look inside shape_predictor_68_face_landmarks.hpp, there will be 2 variables: shape_predictor_68_face_landmarks_dat of type unsigned char [] containing the contents of the model file and shape_predictor_68_face_landmarks_dat_len of type unsigned int.
Inside your dlib driver code, you would do the following
...
#include "shape_predictor_68_face_landmarks.hpp"
...
shape_predictor sp;
std::stringstream landmarksstream;
landmarksstream.write((const char*)shape_predictor_68_face_landmarks_dat, shape_predictor_68_face_landmarks_dat_len);
deserialize(sp, landmarksstream);
A word of warning: be careful about opening files generated by xxd because they can be quite large and cause your text editor to crash.
I can't answer to the efficiency of this method, but it does allow for the model file to be "read in" at compile time rather than execution time.

Using a std::stringstream from static member function for simplicity

I need an interface to write short messages to a log file, the messages often contains multiple parts such as an identifier together with a value.
In order to do this I've created a class that handles a lot of minor stuff such as creating filenames with timestamps and so on, although I don't want to use a variable argument list (int nargs, ...), so I thought my best option was to pass a std::stringstream to the write function instead.
I want to be able to write these calls as one-liners and not having to create a std::stringstream every time I need to do this, therefore I created a static member function to return a stringstream object I could use with my write function, although for some reason it doesn't work.
MyClass.h
class MyClass {
public:
static std::stringstream& stream();
void write(std::ostream& datastream);
private:
static std::stringstream* _stringstream;
};
MyClass.cpp
std::stringstream* MyClass::_stringstream = new std::stringstream();
std::stringstream& MyClass::stream() {
MyClass::_stringstream->str(std::string());
return *MyClass::_stringstream;
}
void MyClass::write(std::string data) {
this->_fhandle << data << std::endl;
}
void MyClass::write(std::ostream& datastream) {
std::string data = dynamic_cast<std::ostringstream&>(datastream).str();
this->write(data);
}
main.cpp
MyClass* info = new MyClass();
info->write("Hello, world");
info->write(MyClass::stream() << "Lorem" << ", " << "ipsum");
info->write(MyClass::stream() << "dolor sit" << " amet");
The code compiles, but when executing the application I get a std::bad_cast exception...
That's because you are creating an std::stringstream, which
doesn't derive from an std::ostringstream. Just create an
std::ostringstream, and the bad_cast should disappear.
Having said that, reusing the std::ostringstream many times
like this is generally not a good idea; the iostream classes are
full of state, which will not be reset between each use. It's
better to create new instance each time. (The classical
solution for this sort of thing is to create a copiable wrapper
class, which forwards to an std::ostream. An instance of this
is returned by info->write(), so you can write info->write() << "Hello, world" ....)

C++ struct serialization

I'm implementing a data buffer which receives audio data packages with procedure call (no network protocols just two applications running on same machine) from one application and puts it in a Struct and writes to a mapped file.
So the writer application may call my app's procedure, which would be smth like void writeData (DataItem data, Timestamp ts) for about 15 times a second with each data item size 2MB.
My app shall store the data into a struct like
Struct DataItem
{
long id;
... Data;
Time insertTime;
}
and write it to a file for future reading purposes.
So since its hard to save the struct to the file as it is, I think(?) I need to write it as binary. So I'm not sure that I need to use any kind of serialization like boost serialization or not?
And I don't know how to align this data for memory map files, and how to re-construct the data for reading purpose from the file as well.
I search internet but I couldn't find much code example. And sample code would be higly appriciated.
By the way I'm using Windows 7 x64 embedded and Visual Studio 2008.
Thanks...
A common C++ way to serialize would be:
struct myStruct
{
int IntData;
float FloatData;
std::string StringData;
};
std::ostream& operator<<(std::ostream &os, const myStruct &myThing)
{
os
<< myThing.IntData << " "
<< myThing.FloatData << " "
<< myThing.StringData << " "
;
return os;
}
std::istream& operator>>(std::istream &is, myStruct &myThing)
{
is
>> myThing.IntData
>> myThing.FloatData
>> myThing.StringData;
return is;
}
void WriteThing()
{
myStruct myThing;
myThing.IntData = 42;
myThing.FloatData = 0.123;
myThing.StringData = "My_String_Test";
std::ofstream outFile;
outFile.open("myFile.txt");
outFile << myThing;
}
void ReadThing()
{
myStruct myThing;
std::ifstream inFile;
inFile.open("myFile.txt");
inFile >> myThing;
}
Please Note:
std::string defines operators << and >>. Those will be called in the
code above.
streams will treat white space characters as delimiters. Storing Strings with blanks would require additional handling
If you plan to keep your data through updates of your
software, you must implement some sort of file versioning
refer to the docs of fstream to find out how to move the file pointer
using seek etc. on a single large file.
Use boost::serialization with text archive.
Is the most "standard" way of solving platform independence.
Optional, you can set a gzip compression on top of it.
Are you sure you are asking about C++ and not C#? Your code example looks like C#
In C++ If your struct format is not going to change, then you can just write the array out to disk.
here is an example as you requested, but this is really C 101 stuff
FILE* output=fopen ("myfile", "wb");
fwrite (array, sizeof (mystruct), number_of_elements_in_array, output);
fclose (output);

How to serialize an object to send over network

I'm trying to serialize objects to send over network through a socket using only STL. I'm not finding a way to keep objects' structure to be deserialized in the other host. I tried converting to string, to char* and I've spent a long time searching for tutorials on the internet and until now I have found nothing.
Is there a way to do it only with STL?
Are there any good tutorials?
I am almost trying boost, but if there is how to do it with STL I'd like to learn.
You can serialize with anything. All serialization means is that you are converting the object to bytes so that you can send it over a stream (like an std::ostream) and read it with another (like an std::istream). Just override operator <<(std::ostream&, const T&) and operator >>(std::istream&, T&) where T is each of your types. And all the types contained in your types.
However, you should probably just use an already-existing library (Boost is pretty nice). There are tons of things that a library like Boost does for you, like byte-ordering, taking care of common objects (like arrays and all the stuff from the standard library), providing a consistent means of performing serialization and tons of other stuff.
My first question will be: do you want serialization or messaging ?
It might seem stupid at first, since you asked for serialization, but then I have always distinguished the two terms.
Serialization is about taking a snapshot of your memory and restoring it later on. Each object is represented as a separate entity (though they might be composed)
Messaging is about sending information from one point to another. The message usually has its own grammar and may not reflect the organization of your Business Model.
Too often I've seen people using Serialization where Messaging should have been used. It does not mean that Serialization is useless, but it does mean that you should think ahead of times. It's quite difficult to alter the BOM once you have decided to serialize it, especially if you decide to relocate some part of information (move it from one object to another)... because how then are you going to decode the "old" serialized version ?
Now that that's been cleared up...
... I will recommend Google's Protocol Buffer.
You could perfectly rewrite your own using the STL, but you would end up doing work that has already been done, and unless you wish to learn from it, it's quite pointless.
One great thing about protobuf is that it's language agnostic in a way: ie you can generate the encoder/decoder of a given message for C++, Java or Python. The use of Python is nice for message injection (testing) or message decoding (to check the output of a logged message). It's not something that would come easy were you to use the STL.
Serializing C++ Objects over a Network Socket
This is 6 years late but I just recently had this problem and this was one of the threads that I came across in my search on how to serialize object through a network socket in C++. This solution uses just 2 or 3 lines of code. There are a lot of answers that I found work but the easiest that I found was to use reinterpret_cast<obj*>(target) to convert the class or structure into an array of characters and feed it through the socket. Here's an example.
Class to be serialized:
/* myclass.h */
#ifndef MYCLASS_H
#define MYCLASS_H
class MyClass
{
public:
int A;
int B;
MyClass(){A=1;B=2;}
~MyClass(){}
};
#endif
Server Program:
/* server.cpp */
#include "myclass.h"
int main (int argc, char** argv)
{
// Open socket connection.
// ...
// Loop continuously until terminated.
while(1)
{
// Read serialized data from socket.
char buf[sizeof(MyClass)];
read(newsockfd,buf, sizeof(MyClass));
MyClass *msg = reinterpret_cast<MyClass*>(buf);
std::cout << "A = " << std::to_string(msg->A) << std::endl;
std::cout << "B = " << std::to_string(msg->B) << std::endl;
}
// Close socket connection.
// ...
return 0;
}
Client Program:
/* client.cpp */
#include "myClass.h"
int main(int argc, char *argv[])
{
// Open socket connection.
// ...
while(1)
{
printf("Please enter the message: ");
bzero(buffer,256);
fgets(buffer,255,stdin);
MyClass msg;
msg.A = 1;
msg.B = 2;
// Write serialized data to socket.
char* tmp = reinterpret_cast<char*>(&msg);
write(sockfd,tmp, sizeof(MyClass));
}
// Close socket connection.
// ...
return 0;
}
Compile both server.cpp and client.cpp using g++ with -std=c++11 as an option. You can then open two terminals and run both programs, however, start the server program before the client so that it has something to connect to.
Hope this helps.
I got it!
I used strinstream to serialize objects and I sent it as a message using the stringstream's method str() and so string's c_str().
Look.
class Object {
public:
int a;
string b;
void methodSample1 ();
void methosSample2 ();
friend ostream& operator<< (ostream& out, Object& object) {
out << object.a << " " << object.b; //The space (" ") is necessari for separete elements
return out;
}
friend istream& operator>> (istream& in, Object& object) {
in >> object.a;
in >> object.b;
return in;
}
};
/* Server side */
int main () {
Object o;
stringstream ss;
o.a = 1;
o.b = 2;
ss << o; //serialize
write (socket, ss.str().c_str(), 20); //send - the buffer size must be adjusted, it's a sample
}
/* Client side */
int main () {
Object o2;
stringstream ss2;
char buffer[20];
string temp;
read (socket, buffer, 20); //receive
temp.assign(buffer);
ss << temp;
ss >> o2; //unserialize
}
I'm not sure if is necessary convert to string before to serialize (ss << o), maybe is possible directly from char.
I think you should use google Protocol Buffers in your project.In network transport Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:
are simpler
are 3 to 10 times smaller
are 20 to 100 times faster
are less ambiguous
generate data access classes that are easier to use programmaticall
and so on. I think you need read https://developers.google.com/protocol-buffers/docs/overview about protobuf