Splitting a file and passing the data on to other classes - c++

In my current project, I have a lot of binary files of different formats. Several of them act as simple archives, and therefore I am trying to come up with a good approach for passing extracted file data on to other classes.
Here's a simplified example of my current approach:
class Archive {
private:
std::istream &fs;
void Read();
public:
Archive(std::istream &fs); // Calls Read() automatically
~Archive();
const char* Get(int archiveIndex);
size_t GetSize(int archiveIndex);
};
class FileFormat {
private:
std::istream &fs;
void Read();
public:
FileFormat(std::istream &fs); // Calls Read() automatically
~FileFormat();
};
The Archive class basically parses the archive and reads the stored files into char pointers.
In order to load the first FileFormat file from an Archive, I would currently use the following code:
std::ifstream fs("somearchive.arc", std::ios::binary);
Archive arc(fs);
std::istringstream ss(std::string(arc.Get(0), arc.GetSize(0)), std::ios::binary);
FileFormat ff(ss);
(Note that some files in an archive could be additional archives but of a different format.)
When reading the binary data, I use a BinaryReader class with functions like these:
BinaryReader::BinaryReader(std::istream &fs) : fs(fs) {
}
char* BinaryReader::ReadBytes(unsigned int n) {
char* buffer = new char[n];
fs.read(buffer, n);
return buffer;
}
unsigned int BinaryReader::ReadUInt32() {
unsigned int buffer;
fs.read((char*)&buffer, sizeof(unsigned int));
return buffer;
}
I like the simplicity of this approach but I'm currently struggling with a lot of memory errors and SIGSEGVs and I'm afraid that it's because of this method. An example is when I create and read an archive repeatedly in a loop. It works for a large number of iterations, but after a while, it starts reading junk data instead.
My question to you is if this approach is feasible (in which case I ask what I am doing wrong), and if not, what better approaches are there?

The flaws of code in the OP are:
You are allocating heap memory and returning a pointer to it from one of your functions. This may lead to memory leaks. You have no problem with leaks (for now) but you must have such stuff in mind while designing your classes.
When dealing with Archive and FileFormat classes user always has to take into account the internal structure of your archive. Basically it compromises the idea of data incapsulation.
When user of your class framework creates an Archive object, he just gets a way to extract a pointer to some raw data. Then the user must pass this raw data to completely independent class. Also you will have more than one kind of FileFormat. Even without the need to watch for leaky heap allocations dealing with such system will be highly error-prone.
Lets try to apply some OOP principles to the task. Your Archive object is a container of Files of different format. So, an Archive's equivalent of Get() should generally return File objects, not a pointer to raw data:
//We gonna need a way to store file type in your archive index
enum TFileType { BYTE_FILE, UINT32_FILE, /*...*/ }
class BaseFile {
public:
virtual TFileType GetFileType() const = 0;
/* Your abstract interface here */
};
class ByteFile : public BaseFile {
public:
ByteFile(istream &fs);
virtual ~ByteFile();
virtual TFileType GetFileType() const
{ return BYTE_FILE; }
unsigned char GetByte(size_t index);
protected:
/* implementation of data storage and reading procedures */
};
class UInt32File : public BaseFile {
public:
UInt32File(istream &fs);
virtual ~UInt32File();
virtual TFileType GetFileType() const
{ return UINT32_FILE; }
uint32_t GetUInt32(size_t index);
protected:
/* implementation of data storage and reading procedures */
};
class Archive {
public:
Archive(const char* filename);
~Archive();
BaseFile* Get(int archiveIndex);
{ return (m_Files.at(archiveIndex)); }
/* ... */
protected:
vector<BaseFile*> m_Files;
}
Archive::Archive(const char* filename)
{
ifstream fs(filename);
//Here we need to:
//1. Read archive index
//2. For each file in index do something like:
switch(CurrentFileType) {
case BYTE_FILE:
m_Files.push_back(new ByteFile(fs));
break;
case UINT32_FILE:
m_Files.push_back(new UInt32File(fs));
break;
//.....
}
}
Archive::~Archive()
{
for(size_t i = 0; i < m_Files.size(); ++i)
delete m_Files[i];
}
int main(int argc, char** argv)
{
Archive arch("somearchive.arc");
BaseFile* pbf;
ByteFile* pByteFile;
pbf = arch.Get(0);
//Here we can use GetFileType() or typeid to make a proper cast
//An example of former:
switch ( pbf.GetFileType() ) {
case BYTE_FILE:
pByteFile = dynamic_cast<ByteFile*>(pbf);
ASSERT(pByteFile != 0 );
//Working with byte data
break;
/*...*/
}
//alternatively you may omit GetFileType() and rely solely on C++
//typeid-related stuff
}
Thats just a general idea of the classes that may simplify the usage of archives in your application.
Have in mind though that good class design may help you with memory leaks prevention, code clarification and such. But whatever classes you have you will still deal with binary data storage problems. For example, if your archive stores 64 bytes of byte data and 8 uint32's and you somehow read 65 bytes instead of 64, the reading of the following ints will give you junk. You may also encounter alignment and endianness problems (the latter is important if you applications are supposed to run on several platforms). Still, good class design may help you to produce a better code which addresses such problems.

It is asking for trouble to pass a pointer from your function and expect the user to know to delete it, unless the function name is such that it is obvious to do so, e.g. a function that begins with the word create.
So
Foo * createFoo();
is likely to be a function that creates an object that the user must delete.
A preferable solution would, for starters, be to return std::vector<char> or allow the user to pass std::vector<char> & to your function and you write the bytes into it, setting its size if necessary. (This is more efficient if doing multiple reads where you can reuse the same buffer).
You should also learn const-correctness.
As for your "after a while it fills with junk", where do you check for end of file?

Related

Efficient way of storing a large amount of character data between transactions in C++

For our application we have the following scenario:
Firstly, we get a large amount of data (on cases, this can be more than 100MB) through a 3rd party API into our class via a constructor, like:
class DataInputer
{
public:
DataInputer(int id, const std::string& data) : m_id(id), m_data(data) {}
int handle() { /* Do some stuff */ }
private:
std::string m_id;
std::string m_data;
};
The chain of invocation going into our class DataInputer looks like:
int dataInputHandler()
{
std::string inputStringFromThirdParty = GiveMeStringFrom3rdPartyMagic(); // <- 1.
int inputIntFromThirdParty = GiveMeIntFrom3rdPartyMagic();
return DataInputer(inputIntFromThirdParty, inputDataFromThirdParty).handle();
}
We have some control over how the dataInputHandler handles its string (Line marked with 1. is the place where the string is created as an actual object), but no control for what GiveMeStringFrom3rdPartyMagic actually uses to provide it (if it's important for anyone, this data is coming from somewhere via a network connection) so we need. As a consolation we have full control over the DataInputer class.
Now, what the application is supposedly doing is to hold on to the string and the associated integer ID till a later point when it can send to another component (via a different network connection) provided the component provides a valid ID (this is the short description). The problem is that we can't (don't want to) do it in the handle method of the DataInputer class, it would block it for an unknown amount of time.
As a rudimentary solution, we were thinking on creating an "in-memory" string store for all the various strings that will come in from all the various network clients, where the bottom line consists of a:
std::map<int, std::string> idStringStore;
where the int identifies the id of the string, the string is actually the data and DataInputer::handle does something like idStringStore.emplace(m_id, m_data);:
The problem is that unnecessarily copying a string which is on the size of 100s of megabytes can be a very time consuming process, so I would like to ask the community if they have any recommendations or best practices for scenarios like this.
An important mention: we are bound to C++11 for now :(
Use move-semantics to pass the 3rd-party data into your DataInputer constructor. The std::move here is redundant but makes the intention clear to the reader:
class DataInputer
{
public:
DataInputer(int id, std::string&& data) : m_id(id), m_data(std::move(data)) {}
int handle() { /* Do some stuff */ }
private:
std::string m_id;
std::string m_data;
};
And pass GiveMeStringFrom3rdPartyMagic() directly as an argument to the constructor without first copying into inputStringFromThirdParty.
int dataInputHandler()
{
int inputIntFromThirdParty = GiveMeIntFrom3rdPartyMagic();
return DataInputer(inputIntFromThirdParty, GiveMeStringFrom3rdPartyMagic()).handle();
}
Of course, you can use a std::map or any other STL container that supports move-semantics. The point is that move-semantics, generally, is what you're looking to use to avoid needless copies.

C++ object that modifies itself in memory

A friend of mine and some other guy have written the following code that according to my C++ knowledge should be very dangerous:
Recovery& Recovery::LoadRecoverFile() {
fstream File("files/recover.dat", ios::in | ios::binary);
if (File.is_open()) {
while (!File.eof()) {
File.read(reinterpret_cast<char*>(this), sizeof(Recovery)); // <----- ?
}
}
File.close();
return *this; // <----- ?
}
Could you give me your opinion why this is bad and how should it be done correctly?
They basically write an object of class Recovery to a file and when required they read it in with the above method.
Edit:
Just to give some additional information about the code. This is what class Recovery contains.
class Recovery {
public:
Recovery();
virtual ~Recovery();
void createRecoverFile();
void saveRecoverFile( int level, int win, int credit, gameStates state, int clicks );
Recovery& LoadRecoverFile();
const vector<Card>& getRecoverCards() const;
void setRecoverCards(const vector<Card>& recoverCards);
int getRecoverClicks() const;
void setRecoverClicks(int recoverClicks);
int getRecoverCredit() const;
void setRecoverCredit(int recoverCredit);
int getRecoverLevel() const;
void setRecoverLevel(int recoverLevel);
gameStates getRecoverState() const;
void setRecoverState(gameStates recoverState);
int getRecoverTime() const;
void setRecoverTime(int recoverTime);
int getRecoverWin() const;
void setRecoverWin(int recoverWin);
private:
int m_RecoverLevel;
int m_RecoverCredit;
gameStates m_RecoverState;
};
This saves the object to a file:
void Recovery::saveRecoverFile(int level, int win, int credit, gameStates state,
int clicks) {
m_RecoverLevel = level;
m_RecoverCredit = credit;
m_RecoverState = state;
ofstream newFile("files/recover.dat", ios::binary | ios::out);
if (newFile.is_open()) {
newFile.write(reinterpret_cast<char*>(this), sizeof(Recovery));
}
newFile.close();
}
That's how it is used:
m_Recovery.LoadRecoverFile();
credit.IntToTextMessage(m_Recovery.getRecoverCredit());
level.IntToTextMessage(m_Recovery.getRecoverLevel());
m_cardLogic.setTempLevel(m_Recovery.getRecoverLevel());
Timer::g_Timer->StartTimer(m_Recovery.getRecoverLevel() + 3);
It probably is undefined behavior (unless Recovery is a POD made only of scalar fields).
It probably won't work if the Recovery class has a vtable, unless perhaps the process which is reading is the same process which wrote it. Vtables contain function pointers (usually, addresses of some machine code). And these function pointers would vary from one process to another one (even if they are running the same binary), e.g. because of ASLR.
It also won't work if Recovery contains other objects (e.g. a std::vector<std::shared_ptr<Recovery>> ...., or your gameStates), because these sub-objects won't be constructed correctly.
It could work sometimes. But what you apparently are looking for is serialization (then I would suggest using a textual format like JSON, but see also libs11n) or application checkpointing. You should design your application with those goals from the very start.
It really depends on what a Recovery object contains. If it contains pointers to data, open resource descriptors and things like that you will not be able to store those on a file in a meaningful way. Restoring a pointer in this way may set its value, but the value it pointed will most certainly not be where you expect it to be anymore.
If Recovery is a POD this should work.
You may want to look at this question and this other question, which are similar to yours.
As Galik correctly points out, using
while (!File.eof()) {
doesn't make much sense. Instead, you should use
if ( File.read(/* etc etc */) ) {
// Object restored successfully.
}
else {
// Revert changes and signal that object was not loaded.
}
The caller of the function needs to have a way to know if the loading was successful. The method is already a member function, so a better definition could be:
/* Returns true if the file was read successfully, false otherwise.
* If reading fails the previous state of the object is not modified.
*/
bool Recovery::LoadRecoverFile(const std::string & filename);
Personally I would recommend storing the game state in text format rather than binary. Binary data like this is non-portable, sometimes even between different versions of the same compiler on the same computer or even using different compiler configuration options.
That being said if you are going the binary route (or not) the main problem I see with the code is lack of error checking. And the whole idea of getting a Recovery object to hoist itself by its own petard make error checking very difficult.
I have knocked up something I think is more robust. I don't know the proper program structure you are using so this probably won't match what you need. But it may serve as an example of how this can be approached.
Most importantly always check for errors, report them where appropriate and return them to the caller.
enum gameStates
{
MENU, STARTGAME, GAMEOVER, RECOVERY, RULES_OF_GAMES, VIEW_CARDS, STATISTIC
};
const std::string RECOVER_FILE = "files/recover.dat";
struct Recovery
{
int m_RecoverLevel;
int m_RecoverCredit;
gameStates m_RecoverState;
};
struct WhateverClass
{
Recovery m_Recovery;
bool LoadRecoverFile(Recovery& rec);
public:
bool recover();
};
// Supply the Recover object to be restored and
// return true or false to know it succeeded or not
bool WhateverClass::LoadRecoverFile(Recovery& rec)
{
std::ifstream file(RECOVER_FILE, std::ios::binary);
if(!file.is_open())
{
log("ERROR: opening the recovery file: " << RECOVER_FILE);
return false;
}
if(!file.read(reinterpret_cast<char*>(&rec), sizeof(Recovery)))
{
log("ERROR: reading from recovery file: " << RECOVER_FILE);
return false;
}
return true;
}
bool WhateverClass::recover()
{
if(!LoadRecoverFile(m_Recovery))
return false;
credit.IntToTextMessage(m_Recovery.getRecoverCredit());
level.IntToTextMessage(m_Recovery.getRecoverLevel());
m_cardLogic.setTempLevel(m_Recovery.getRecoverLevel());
Timer::g_Timer->StartTimer(m_Recovery.getRecoverLevel() + 3);
return true;
}
Hope this helps.
Hi everyone Actually class StateManager content integers:
#ifndef STATEMANAGER_H_
#define STATEMANAGER_H_
enum gameStates {
MENU, STARTGAME, GAMEOVER, RECOVERY, RULES_OF_GAMES, VIEW_CARDS, STATISTIC
};
class StateManager {
public:
static StateManager* stateMachine;
StateManager();
virtual ~StateManager();
gameStates getCurrentGameStates() const;
void setCurrentGameStates(gameStates currentGameStates);
private:
gameStates m_currentGameStates;
};
#endif /* STATEMANAGER_H_ */

C++ Custom Binary Resource File

I have spent countless hours searching for information about a topic like this. I am writing my own custom game engine for fun using SDL in C++. I'm trying to create a custom binary file which will manage my in game resources. So far I've not been able to get vectors to play nice when it comes to storing each 'type' of object I place in the file. So I dropped the idea of using vectors and went to arrays. I have both examples below where I use both a vector or an array. So, first I create a header for the file. Here is the struct:
struct Header
{
const char* name; // Name of Header file
float version; // Resource version number
int numberOfObjects;
int headerSize; // The size of the header
};
Then after creating the header, I have another struct which defines how an object is stored in memory. Here it is:
struct ObjectData{
int id;
int size;
const char* name;
// std::vector<char> data; // Does not work very well
// unsigned char* data; // Also did not
// Also does not work, because I do not know the size yet until I have the data.
// char data[]
};
The major issue with this struct is that the vector does not play well, an unsigned char pointer kept giving me issues, and an array of char data (for hexadecimal storage) was not working because my compiler does not like variable arrays.
The final struct is my resource file structure.
struct ResourceFile
{
Header header;
int objectCount;
// Again, vectors giving me issues because of how they are constructed internally
// std::vector<ObjectData> objectList;
// Below does not work because, again, no variable data types;
// ObjectData objects[header.numberOfObjects]
};
My goal is to be able to write out a single struct to a binary file. Like so:
Header header;
header.name = "Resources.bin";
header.version = 1.0f;
header.headerSize = sizeof(header);
//vector<char> Object1 = ByteReader::LoadFile("D:\\TEST_FOLDER\\test.obj");
//vector<char> Object2 = ByteReader::LoadFile("D:\\TEST_FOLDER\\test.obj");
ObjectData cube;
cube.id = 0;
cube.name = "Evil Cubie";
cube.data = ByteReader::LoadFile("D:\\TEST_FOLDER\\test.obj");
cube.size = sizeof(cube.id) + sizeof(cube.name) + cube.data.size();
ofstream resourceFile("D:\\TEST_FOLDER\\Resources.bin", ios::out|ios::app|ios::binary);
resourceFile << header.name << header.version << header.headerSize;;
resourceFile << cube.id << cube.name << cube.size;
for each (char ch in cube.data)
{
resourceFile << ch;
}
resourceFile.close();
/*
ObjectData cube2;
cube.id = 1;
cube.name = "Ugle Cubie";
for each (char ch in Object1)
{
cube.object.push_back(ch);
}
*/
//resourceFile.data.push_back(cube);
//resourceFile.data.push_back(cube2);
//resourceFile.header.numberOfObjects = resourceFile.data.size();
//FILE* dat = fopen(filename, "wb");
//fwrite(&resourceFile, sizeof(resourceFile), 1, dat); // <-- write to resource file
//fclose(dat);
As you noticed above, I tried two different ways. The first way I tried it was using good old fwrite. The second way was not even writing it in binary even though I told the computer to do so through the flags accepted by ofstream.
My goal was to get the code to work fluently like this:
ResourceFile resourceFile;
resourceFile.header.name = "Resources.bin";
resourceFile.header.version = 1;
resrouceFile.header.numberOfObjects = 2;
resourceFile.header.headerSize = sizeof(resourceFile.header);
ObjectData cube;
ObjectData cube2;
resourceFile.data.push_back(cube);
resourceFile.data.push_back(cube2);
resourceFile.header.numberOfObjects = resourceFile.data.size();
FILE* dat = fopen(filename, "wb");
fwrite(&resourceFile, sizeof(resourceFile), 1, dat); // <-- write to resource file
fclose(dat);
Still no cigar. Any one have any pointers (no pun intended) or a proper example of a resource manager?
This is one of the things I specialize in, so here you go. There is a whole school of programming around this, but the basic rules I follow are:
1) Use FIXED-LENGTH structures for things with a "constant" layout.
These are things like the flag bits of the file, bytes indicating the # of sub-records, etc. Put as much of the file contents into these structures as you can- they are very efficient especially when combined with a good I/O system.
You do this using the pre-processor macro "#pragma pack(1)" to align a struct to byte boundaries:
#ifdef WINDOWS
#pragma pack(push)
#endif
#pragma pack(1)
struct FixedSizeHeader {
uint32 FLAG_BYTES[1]; // All Members are pointers for a reason
char NAME[20];
};
#ifdef WINDOWS
#pragma pack(pop)
#endif
#ifdef LINUX
#pragma pack()
#endif
2) Create a base class, pure interface with a name like "Serializable". He is your high-level API for staging entire file objects into and out of raw memory.
class Serializable { // Yes, the name comes from Java. The idea, however, predates it
public:
// Choose your buffer type- char[], std::string, custom
virtual bool WriteToBinary(char* buffer) const = 0;
};
NOTE: To support a static "Load" you will need all your "Serializable"s to have an additional static function. There are several (very different) ways to support that, none of which the language alone will enforce since C++ doesn't have "virtual static".
3) Create your aggregate classes for managing each file type. They should have the same name as the file type. Depending on file structure, each may in turn contain more "aggregator" classes before you get down to the fixed structures.
Here's an example:
class GameResourceFile : public Serializable
{
private:
// Operator= and the copy ctor should point to the same data for files,
// since that is what you get with FILE*
protected:
// Actual member variables- allows specialized (derived) file types direct access
FixedSizeHeader* hdr; // You don't have to use pointers here
ContentManager* innards; // Another aggregator- implements "Serializable"
GameResourceFile(FixedSizeHeader* hdr, ContentManager* innards)
: hdr(hdr), innards(innards) {}
virtual ~GameResourceFile() { delete hdr; delete innards; }
public:
virtual bool WriteToBinary(char* outBuffer) const
{
// For fixed portions, use this
memcpy(outBuffer, hdr, sizeof(FixedSizeHeader)); // This is why we 'pack'
outBuffer += sizeof(FixedSizeHeader); // Improve safety...
return innards->WriteToBinary(outBuffer);
}
// C++ doesn't enforce this, but you can via convention
static GameResourceFile* Load(const char* filename)
{
// Load file into a buffer- You'll want your own code here
// Now that's done, we have a buffer
char* srcContents;
FixedSizeHeader* hdr = new FixedSizeHeader();
memcpy(hdr, srcContents, sizeof(FixedSizeHeader));
srcContents += sizeof(FixedSizeHeader);
ContentManager* innards = ContentManager::Load( srcContents); // NOT the file
if(!innards) {
return 0;
}
return new GameResourceFile(hdr, innards);
}
};
Notice how this works- each piece is responsible for serializing itself into the buffer, until we get to "primitive" structures that we can add via memcpy() (you can make ALL the components 'Serializable' classes). If any piece fails to add, the call returns "false" and you can abort.
I STRONGLY recommend using a pattern like "referenced object" to avoid the memory management issues. However, even if you don't you now provide users a nice, one-stop shopping method to load data objects from files:
GameResourceFile* resource = GameResourceFile::Load("myfile.game");
if(!resource) { // Houston, we have a problem
return -1;
}
The best thing yet is to add all low-level manipulation and retrieval APIs for that kind of data to "GameResourceFile". Then any low-level state machine coordination for committing changes to disk & such is all localized to 1 object.

Subdata (substring-like?) of a shared_ptr

I have a data buffer stored in a shared_ptr<void>.
This buffer is organized in several encapsulated layers so that I end up with:
-----------------------------------...
- Header 1 | Header 2 | Data
-----------------------------------...
(Actually it's an Ethernet packet where I decapsulate the layers one after the other).
Once I read Header 1, I would like to pass the rest of the packet to the next layer for reading, so I would like to create a pointer to :
-----------------------...
- Header 2 | Data
-----------------------...
It would be very easy with a raw pointer, as it would just be a matter of pointer arithmetic. But how can I achieve that with a shared_ptr ? (I use boost::shared_ptr) :
I cannot create a new shared_ptr to "first shared_ptr.get() + offset" because it makes no sense to get the ownership to just Header 2 + Data (and delete would crash eventually)
I do not want to copy the data because it would be silly
I want the ownership on the whole buffer to be shared between the two objects (ie. as long as the parent object or the one which requires only Header 2 needs the data, the data should not be deleted).
I could wrap that up in a structure like boost::tuple<shared_ptr<void>, int /*offset*/, int /*length*/> but I wonder if there is a more convenient / elegant way to achieve that result.
Thanks,
I would recommend encapsulating the layers each in a class that knows how to deal with the data as though it were that layer. Think each one as a view into your buffer. Here is a starting point to get you thinking.
class Layer1{
public:
Layer1(shared_ptr<void> buffer) : buffer_(buffer) { }
/* All the functions you need for treating your buffer as a Layer 1 type */
void DoSomething() {}
private:
shared_ptr<void> buffer_;
};
class Layer2{
public:
Layer2(shared_ptr<void> buffer) : buffer_(buffer) { }
/* All the functions you need for treating your buffer as a Layer 2 type */
void DoSomethingElse() {}
private:
shared_ptr<void> buffer_;
};
And how to use it:
shared_ptr<void> buff = getBuff(); //< Do what you need to get the raw buffer.
// I show these together, but chances are, sections of your code will only need
// to think about the data as though it belongs to one layer or the other.
Layer1 l1(buff);
Layer2 l2(buff);
l1.DoSomething();
l2.DoSomethingElse();
Laying things out this way allows you to write functions that operate solely on that layer even though they internally represent the same data.
But, this is by no means perfect.
Perhaps Layer2 should be able to call Layer1's methods. For that you would want inheritance as well. I don't know enough about your design to say whether that would be helpful. Other room for improvement is replacing the shared_ptr<void> with a class that has helpful methods for dealing with the buffer.
can you just use a simple wrapper?
something like this maybe?
class HeaderHolder : protected shared_ptr<void> {
public:
// Constructor and blah blah
void* operator* () {
offset += a_certain_length;
return (shared_ptr<void>::operator*() + offset);
}
};
By the way, I just used a simple wrapper that I reproduce here if someone ever stumbles on the question.
class DataWrapper {
public:
DataWrapper (shared_ptr<void> pData, size_t offset, size_t length) : mpData(pData), mOffset(offset), mLength(length) {}
void* GetData() {return (unsigned char*)mpData.get() + mOffset;}
// same with const...
void SkipData (size_t skipSize) { mOffset += skipSize; mLength -= skipSize; }
void GetLength const {return mLength;}
// Then you can add operator+, +=, (void*), -, -=
// if you need pointer-like semantics.
// Also a "memcpy" member function to copy just this buffer may be useful
// and other helper functions if you need
private:
shared_ptr<void> mpData;
size_t mOffset, mLength;
};
Just be careful when you use GetData: be sure that the buffer will not be freed while you use the unsafe void*. It is safe to use the void* as long as you know the DataWrapper object is alive (because it holds a shared_ptr to the buffer, so it prevents it from being freed).

serializing objects in C++ and storing as a blob type in mysql

I am using mysql/C++ connector to connect to a mysql database. I have some complex data structures so I need to serialize those and save in the database.
I tried something like the following.
vector<int> vectorTest(10,100);
istream *blob = NULL;
ostringstream os;
int size_data = sizeof(vector<int>);
blob = new istringstream((char*)&vectorTest, istringstream::in | istringstream::binary);
string qry = "INSERT INTO vector(id,object) VALUES (?,?)";
prep_stmt = con->prepareStatement(qry);
prep_stmt->setInt(1,1);
prep_stmt->setBlob(2,blob);
prep_stmt->execute();
I just tried a small example here. However the vector object is not getting saved.
Alternatively can I can use the following approach.
ostringstream os;
int size_data = sizeof(vector<int>);
os.write((char*)&vectorTest, size_data);
However I don't know how to redirect the outputstream to an inputstream, because the setBlob() method needs an istream as the input parameter.
Can I know how to get any of this examples working ? If my approach is incorrect can anyone provide a code example or improve the given code segment ? Your immediate response is greatly appreciated.
Thanks
You're going about this completely the wrong way. This isn't "serialization", in fact it's quite possibly the opposite of serialization -- it's just trying to write out a raw memory dump of a vector into the database. Imagine for a second that vector looked like something this:
struct vector_int {
unsigned int num_elements;
int* elements;
};
Where elements is a dynamically allocated array that holds the elements of the vector.
What you would end up writing out to your database is the value of num_elements and then the value of the pointer elements. The element data would not be written to the database, and if you were to load the pointer location back into a vector on a different run of your program, the location it points to would contain garbage. The same sort of thing will happen with std::vector since it contains dynamically allocated memory that will will be written out as pointer values in your case, and other internal state that may not be valid if reloaded.
The whole point of "serialization" is to avoid this. Serialization means turning a complex object like this into a sequence of bytes that contains all of the information necessary to reconstitute the original object. You need to iterate through the vector and write out each integer that's in it. And moreover, you need to devise a format where, when you read it back in, you can determine where one integer ends and the next begins.
For example, you might whitespace-delimit the ints, and write them out like this:
1413 1812 1 219 4884 -57 12
And then when you read this blob back in you would have to parse this string back into seven separate integers and insert them into a newly-created vector.
Example code to write out:
vector<int> vectorTest(10,100);
ostringstream os;
for (vector<int>::const_iterator i = vectorTest.begin(); i != vectorTest.end(); ++i)
{
os << *i << " ";
}
// Then insert os.str() into the DB as your blob
Example code to read in:
// Say you have a blob string called "blob"
vector<int> vectorTest;
istringstream is(blob);
int n;
while(is >> n) {
vectorTest.push_back(n);
}
Now, this isn't necessarily the most efficient approach, space-wise, since this turns your integers into strings before inserting them into the database, which will take much more space than if you had just inserted them as binary-coded integers. However, the code to write out and read in would be more complex in that case because you would have to concern yourself with how you pack the integers into a byte sequence and how you parse a byte sequence into a bunch of ints. The code above uses strings so that the standard library streams can make this part easy and give a more straightforward demonstration of what serialization entails.
My solution to writing to a MySQL database was to use the Visitor design pattern and an abstract base class. I did not use the BLOB data structure, instead used fields (columns):
struct Field
{
// Every field has a name.
virtual const std::string get_field_name(void) = 0;
// Every field value can be converted to a string (except Blobs)
virtual const std::string get_value_as_string(void) = 0;
// {Optional} Every field knows it's SQL type.
// This is used in creating the table.
virtual unsigned int get_sql_type(void) = 0;
// {Optional} Every field has a length
virtual size_t get_field_length(void) = 0;
};
I built a hierarchy including fields for numbers, bool, and strings. Given a Field pointer or reference, an SQL INSERT and SELECT statement can be generated.
A Record would be a container of fields. Just provide a for_each() method with a visitor:
struct Field_Functor
{
virtual void operator()(const Field& f) = 0;
};
struct Record
{
void for_each(Field_Functor& functor)
{
//...
functor(field_container[i]); // or something similar
}
};
By using a more true Visitor design pattern, the SQL specifics are moved into the visitor. The visitor knows the field attributes due to the method called. This reduces the Field structure to having only get_field_name and get_value_as_string methods.
struct Field_Integer;
struct Visitor_Base
{
virtual void process(const Field_Integer& fi) = 0;
virtual void process(const Field_String& fs) = 0;
virtual void process(const Field_Double& fd) = 0;
};
struct Field_With_Visitor
{
virtual void accept_visitor(Visitor_Base& vb) = 0;
};
struct Field_Integer
{
void accept_visitor(Visitor_Base& vb)
{
vb.process(*this);
}
};
The record using the `Visitor_Base`:
struct Record_Using_Visitor
{
void accept_visitor(Visitor_Base& vistor)
{
Field_Container::iterator iter;
for (iter = m_fields.begin();
iter != m_fields.end();
++iter)
{
(*iter)->accept_visitor(rv);
}
return;
}
};
My current hurdle is handling BLOB fields with MySQL C++ Connector and wxWidgets.
You may also want to add the tags: MySQL and database to your next questions.
boost has a serialization library (I have never used it tho)
or XML or JSON