i want write an Object to file (*.txt)
class MyCircle{
double x;
double y;
double radius;
char index;
int check;}
class Question{
int index;
int quantityOfAnswers;
MyCircle [] arrCircles;
char solution;}
class answersShee{
int answersSheetID;
string answersSheetName;
int quantityOfCandidateID;
int quantityOfCodeExamination;
int quantityOfQuestion;
Question arrCandidateIDs[quantityOfCandidateID];
Question arrCodeExaminations[quantityOfCodeExamination];
Question arrQuestions[quantityOfQuestion];}
i want write and read class answersSheet to a file(*.txt) but in C++ dont't have write an Object like in Java :(
I really don't understand. There are many solutions existing on the internet. Search for "c++ object serialization".
Binary or Textual representation in file.
Before you start writing an object to a file, you need to decide whether you are writing binary or textual representations.
A binary representation mirrors the bits and bytes of the platform's representation. Numbers and other structures may not be easily readable when viewed by a text editor. This is usually the most efficient, but least portable solution. Pointers don't do well, neither may multi-byte integers (research "Endianess" or "little endian"). Variable length fields require more structure in the binary field (such as a length field or sentinel value).
A textual representation is the human readable form. The textual form is more portable and easier to read by humans. This form is slower to read and write (due to the translations to and from textual format). Also, pointer values do not port with this format (unless you use numeric link values).
Writing textual representation to file.
There are a plethora of examples on the internet. I will summarize and say that this functionality usually involves overloading operator << and operator >>. Which means adding functionality to your classes and structures (or you could make a free standing function).
Serialization Libraries
Many people use these. I cannot recommend or not recommend these because I haven't used them. In summary, these libraries simplify the reading and writing of objects from and to a file.
Writing Binary Format
Again, this requires adding functionality to your classes and structures (or making free standing functions). I like to augment the typical case of using the std::istream::read and std::ostream::write methods.
I use a technique of:
1. Asking each object for the size that the object would occupy in a buffer, and summing the values.
1.1. This is recursive, each object would ask each member for its size.
2. Allocating a buffer of uint8_t for the size.
3. Pass a pointer to the buffer to each object and tell each object to write its contents at the location pointed to by the buffer pointer (the objects will increment the pointer accordingly).
4. Write the buffer as one big block, using binary translation, to the file.
5. The reading is similar, different order.
Related
I am trying to build and write a binary request and have a "is this possible" type question. It might be important for me to mention the recipiant of the request is not aware of the data structure I have included below, it's just expecting a sequence of bytes, but using a struct seemed like a handy way to prepare the pieces of the request, then write them easily.
Writing the header and footer is fine as they are fixed size but I'm running into problems with the struct "Details", because of the vector. For now Im writing to a file so I can check the request is to spec, but the intention is to write to a PLC using boost asio serial port eventually
I can use syntax like so to write a struct, but that writes pointer addresses rather than values when it gets to the vector
myFile.write((char*) &myDataRequest, drSize);
I can use this sytax to write a vector by itself, but I must include the indexer at 0 to write the values
myFile.write((char*) &myVector[0], vectorSize);
Is there an elegant way to binary write a struct containing a vector (or other suitable collection), doing it in one go? Say for example if I declared the vector differently, or am I resigned to making multiple writes for the content inside the struct. If I replace the vector with an array I can send the struct in one go (without needing to include any indexer) but I dont know the required size until run time so I don't think it is suitable.
My Struct
struct Header
{ ... };
struct Details
{
std::vector<DataRequest> DRList;
};
struct DataRequest
{
short numAddresses; // Number of operands to be read Bytes 0-1
unsigned char operandType; // Byte 2
unsigned char Reserved1; //Should be 0xFF Byte 3
std::vector<short> addressList; // either, starting address (for sequence), or a list of addresses (for non-sequential)
};
struct Footer
{ ... };
It's not possible because the std::vector object doesn't actually contain an array but rather a pointer to a block of memory. However, I'm tempted to claim that being able to write a raw struct like that is not desireable:
I believe that by treating a struct as a block of memory you may end up sending padding bytes, I don't think this is desireable.
Depending on what you write to you may find that writes are buffered anyway, so multiple write calls aren't actually less efficient.
Chances are that you want to do something with the fields being sent over. In particular, with the numeric values you send. This requires enforcing a byte order which both sides of the transmission agree on. In order to be portable, you should exlicitely convert the byte order to make sure that your software is portable (if this is required).
To make a long story short: I suspect writing out each field one by one is not less efficient, it also is more correct.
This is not really a good strategy, since even if you could do this you're copying memory content directly to file. If you change the architecture/processor your client will get different data. If you write a method taking your struct and a filename, which writes the structs values individually and iterates over the vector writing out its content, you'll have full control over the binary format your client expects and are not dependent on the compilers current memory representation.
If you want convenience for marshalling/unmarshalling you should take a look at the boost::serialization library. They do offer a binary archive (besides text and xml) but it has its own format (e.g. it has a version number, which serialization lib was used to dump the data) so it is probably not what your client wants.
What exactly is the format expected at the other end? You have to write
that, period. You can't just write any random bytes. The probability
that just writing an std::vector like you're doing will work is about
as close to 0 as you can get. But the probability that writing a
struct with only int will work is still less than 50%. If the other
side is expecting a specific sequence of bytes, then you have to write
that sequence, byte by byte. To write an int, for example, you must
still write four (or whatever the protocol requires) bytes, something
like:
byte[0] = (value >> 24) & 0xFF;
byte[1] = (value >> 16) & 0xFF;
byte[2] = (value >> 8) & 0xFF;
byte[3] = (value ) & 0xFF;
(Even here, I'm supposing that your internal representation of negative
numbers corresponds to that of the protocol. Usually the case, but not
always.)
Typically, of course, you build your buffer in a std::vector<char>,
and then write &buffer[0], buffer.size(). (The fact that you need a
reinterpret_cast for the buffer pointer should signal that your
approach is wrong.)
code looks like this:
struct Dog {
string name;
unsigned int age;
};
int main()
{
Dog d = {.age = 3, .name = "Lion"};
FILE *fp = fopen("dog.txt", "wb");
fwrite(&d, sizeof(d), 1, fp); //write d into dog.txt
}
My problem is what's the point of write a data object or structure into a binary file? I assume it is for making the data generated in a running program persistent, right? If yes, then how can I get the data back? Using fread?
This makes me think of database-like stuff, dose database write data into disk the same way?
You can do it but you will have a lot of issues to care about:
structure types: all your data needs really be into struct or you can just writing a pointer to some other place.
structure changes: if you need change your structure you will need write a converter to read old struct and write the new.
language interoperability: will be hard to access the data using other language
It was a common practice in the early days before relational databases popularization. You can make index files pointing to a record number.
However nowadays I will advice you to make serialization and write strings instead binaries.
NOTE:
if string is something like char[40] your code maybe will survive... but if your question is about C++ and string is a class then kill you child before it grows up! The string object characters are not into your struct but in the heap.
Writing data in binary is extremely useful and much faster then reading/writing in text, take for instance video games (Although not every video game does this), when the game is saved all of the nescessary structures/classes and other data are written into a save file in binary.
It is just one use for using binary, but the major reason for doing this is speed.
And to read the data back, you will need to know the format that you saved it in, for instance as a simple example, if I saved an integer, char array of n size, and a boolean, I would need to read the binary file in as an integer, char array of n size, and a boolean. Otherwise the data is read improperly and will not be very useful at all
Be careful. The type of field 'name' in your structure is 'string'. This class contains data allocated dynamically. So writing 'string' data into file this way only pointers will be writed, not data itself.
The C++ Middleware Writer supports binary serialization to/from files.
From a marshalling perspective the "unsigned int age" member of your struct is a potential problem. I'd consider changing the type to uint32_t.
I have a challenging situation; we will have programs on Mac, PC, iOS and Android receiving files in a legacy format and parsing data from those files. We cannot change how those files are created.
The files are produced by a C++ program filling a struct with numbers and Strings and then writing it out. Here's a sanitized version.
struct MyObject {
String Kfkj(MAXKYS);
String Oern(MAXKYS);
String Vdflj(MAXKYS, 9);
int Muic;
int Tdfkj;
int VdfkAsdk;
int SsdjsdDsldsk;
int Ndsoief;
String TdflsajPdlj;
String TdckjdfPas;
String AdsfakjIdd;
int IdkfjdKasdkj;
int AsadkjaKadkja(MAXKYS);
int Kasldsdkj;
bool Usadl;
String PsadkjOasdj(9);
String PasdkjOsdkj;
};
Primitives and Strings, as you can see.
Then here is how they write it out to a file:
MyInstance MyObject;
FileName = "C:\MyFile.ab2"
ofstream fout (FileName, ios::binary);
fout.write((char*)& MyInstance, sizeof(MyInstance));
There is no option for us to translate it once and then distribute the file to other platforms; we must translate it on each and every different platform, and this is what we have to work with. I'd appreciate any information on how C++ serializes data, so we know how to parse the file.
EDIT: solution
The feedback I received from multiple answers here was VERY helpful. Using that, I did extensive analysis with hex editors and discovered:
the elements come in the file one after another
a "String," in this case, starts with an int describing how many characters follow the int for that String. If the String does not exist, it will still have that int with a value of 0.
integers, for the files and machines I saw, are two bytes, little-endian, and MOSTLY unsigned (there were a few that were signed, just to keep me on my toes)
the boolean was two bytes, with apparently -1 (FF FF) representing "true"
So far we have not ran into issues with different padding or endianness on different devices, but those are very real concerns. The skilled notes and warnings in these answers provides us with more ammunition to try to convince the client to change to a less fragile alternative, such as XML or JSON, for transferring data online across platforms.
As for those of you asking if the developer was fired... well, let's just say their code is very old, but after multiple conversations we're still having trouble convincing them writing out the C++ struct and trying to read that on different platforms is not a good idea.
You're going to run into many problems.
C++ doesn't have a specific format for serializing data per se. It is highly dependent on the computer architecture/processor that you are running on.
The compiler is allowed to add padding to help alignment on systems. When we say alignment we basically are referring to an architecture/processor's affinity for having data lie on specific byte boundaries. For example, some processors vastly prefer floating point numbers to lie at 4 or 8 byte boundaries - if they don't the processor may work much slower or may not work at all.
So, you can't simply know what padding your system is adding magically.
What you can do is use #pragma pack(1) / #pragma pack(0) to stop your compiler from padding your numbers.
PS: you also have to worry about endianness. What if one computer is running on big-endian and one is little endian? They will interpret bytes differently without a conversion.
Simply put, you either have to fix the application generating the files so it uses a proper serialization scheme OR you need to look at it running on a SPECIFIC computer, look at exactly how it writes the files, and write a translator for every target platform (which is just silly).
Interesting Suggestion
If you're really stuck, write an app that monitors the folder where you write files. Have the app pick up the files (since it's on the same PC it'll be able to read their format without issue). Have it write the files back in XML or some other true serialization format and distribute those instead.
Whoa - that's crazy. So String objects don't contain any pointers? Must not- because you claim this is working code.
Anyway, that code isn't doing any serialization. Its just writing the structure out to file exactly the way it is laid out in memory. The only issue you have is that on some platforms padding and sizeof integral types like int may be different.
You'll have to find the size of the integral types, and use that information in reader/writer for newer platforms to make sure they get laid out the same way on the legacy platform.
You're running a real risk with that code though. As it is, a compiler change could suddenly cause the file layout to change.
The format of your data file is entirely down to the compiler that your C++ program is compiled with, and the definition of your String class. You can rely on the fields being in the order they're declared in, and in this case, I think you can rely on there not being any padding at the start, but that's about all. Some tips that might help you out in this case:-
You don't give the definition of the String class you're using. If it's a typedef for std::string, you're completely screwed, because the contents of the string aren't in the memory. I assume your C++ programmers are using some special local buffer, in which case I'll guess you will find the first bytes of the object are the string, and there is some amount of useless padding afterwards. I hope the struct contains an int at the start telling you how much data in it is useful.
You'll probably find the int fields are four bytes long.
You'll probably find the bool field is one byte long, followed by three bytes of useless padding. Only one bit, most likely the bottom bit, will be set.
That's about all the useful guesswork I can offer you. In your target language, make sure to read the whole file in as the closest thing to a byte array available in the language, and only after that, use the language features to convert it into the right kind of thing in your language. Don't try reading it in as integers, as that won't let you byte-swap if you're on a platform with different endianness to the C++ program. I suggest also looking through the file in a text editor to reverse-engineer it and help you find the offset of each field.
Last piece of advice: consider printing P45s (or pink slips, or whatever you have in your country) for whichever programmers or project managers thought this kind of 'serialization' was a good idea. This kind of sloppy work might have been acceptable in a life-or-death situation, but they have seriously screwed you over in a way you're going to find it very hard to recover from. Writing the code to read in these files will not be that hard, if it's only one struct like this, but keeping it reliable will be a world of pain, and they've effectively made it impossible for themselves to change compilers or compiler version safely.
The way it's done, the struct is written in raw form to a file. So basically what you need to know to parse this file is the binary layout of your struct.
Basically, the fields are just one after the other, so to read an int, you just read 4 bytes and cast that to an int, etc.
Strings are a particular case. It's not clear from your code whether this "String" type is an inline array of characters, or a pointer to such an array. In the first case, you need to know how many characters each string contains and simply read that number of characters sequentially. In the second case, you won't be able to get the string back, since it won't have been written to file. The pointer will be useless to you.
One last concern is whether the struct is packed or not. Since you gave no indication to that, by default struct fields are aligned to 4-bytes boundaries, so there may be space for instance after the boolean field that you need to account for. If the struct is packed, then each field comes directly after the previous.
So, to make a long story short, figure out your struct binary layout using its definition and, if all else fails, inspecting the memory at run-time with the debugger, or use a hex editor to study the output file. Then write that specification down somewhere and this will give you what you need to read from the file. It's impossible to tell exactly what that layout is simply by looking at the pseudo-definition you gave.
Writing in an ofstream does not serialize data. This code write the raw memory content of the struct as it was a string of char. Depending of your compiler, its version, its options and the system it is running on the content will be completely different.
Even the number of bits of a char is allowed to change between c++ implementation.
Data referenced by the object of the struct won't be written (forget the content of std::string).
If you cannot change the writer code. You must know the alignment policy, the size of base type and the data representation. You will have to analyze files produced by hand, for example with an hexadecimal editor like this one
http://www.physics.ohio-state.edu/~prewett/hexedit/
, and probably look at your compiler documentation.
If you can change the writer code. Use proper serialization like json, protocol buffer or simply xml.
No one has pointed out something that sticks out to me as particularly problematic (maybe because I've been bit by it). That problem: the data member bool Usadl;. sizeof(bool) varies across platforms, across compilers, and even across releases of the same compiler. Common values for sizeof(bool) are 4 and 1. This will bite you. It's getting hard to find a big endian machine nowadays, very, very hard to find a computer where CHAR_BIT is not 8 or sizeof(int) is not 4. This is not the case for sizeof(bool).
In agreement with everyone else, Chad's team needs to document the structure of the records in the file, and then make sure the program that produces the file writes this structure explicitly, including element sizes, padding, and endianness. Don't depend on class layout to do this for you. That's just asking for trouble.
The best way would probably be to use JSON or if you want a more robust solution go with something like Avro. Avro has a C++ API and a Java API, so it covers most of the cases you're encountering.
Background:
I'm using Google's protobuf, and I would like to read/write several gigabytes of protobuf marshalled data to a file using C++. As it's recommended to keep the size of each protobuf object under 1MB, I figured a binary stream (illustrated below) written to a file would work. Each offset contains the number of bytes to the next offset until the end of the file is reached. This way, each protobuf can stay under 1MB, and I can glob them together to my heart's content.
[int32 offset]
[protobuf blob 1]
[int32 offset]
[protobuf blob 2]
...
[eof]
I have an implemntation that works on Github:
src/glob.hpp
src/glob.cpp
test/readglob.cpp
test/writeglob.cpp
But I feel I have written some poor code, and would appreciate some advice on how to improve it. Thus,
Questions:
I'm using reinterpret_cast<char*> to read/write the 32 bit integers to and from the binary fstream. Since I'm using protobuf, I'm making the assumption that all machines are little-endian. I also assert that an int is indeed 4 bytes. Is there a better way to read/write a 32 bit integer to a binary fstream given these two limiting assumptions?
In reading from fstream, I create a temporary fixed-length char buffer, so that I can then pass this fixed-length buffer to the protobuf library to decode using ParseFromArray, as ParseFromIstream will consume the entire stream. I'd really prefer just to tell the library to read at most the next N bytes from fstream, but there doesn't seem to be that functionality in protobuf. What would be the most idiomatic way to pass a function at most N bytes of an fstream? Or is my design sufficiently upside down that I should consider a different approach entirely?
Edit:
#codymanix: I'm casting to char since istream::read requires a char array if I'm not mistaken. I'm also not using the extraction operator >> since I read it was poor form to use with binary streams. Or is this last piece of advice bogus?
#Martin York: Removed new/delete in favor of std::vector<char>. glob.cpp is now updated. Thanks!
Don't use new []/delete[].
Instead us a std::vector as deallocation is guaranteed in the event of exceptions.
Don't assume that reading will return all the bytes you requested.
Check with gcount() to make sure that you got what you asked for.
Rather than have Glob implement the code for both input and output depending on a switch in the constructor. Rather implement two specialized classes like ifstream/ofstream. This will simplify both the interface and the usage.
I'm working on a GUI framework, where I want all the elements to be identified by ascii strings of up to 8 characters (or 7 would be ok).
Every time an event is triggered (some are just clicks, but some are continuous), the framework would callback to the client code with the id and its value.
I could use actual strings and strcmp(), but I want this to be really fast (for mobile devices), so I was thinking to use char constants (e.g. int id = 'BTN1';) so you'd be doing a single int comparison to test for the id. However, 4 chars isn't readable enough.
I tried an experiment, something like-
long int id = L'abcdefg';
... but it looks as if char constants can only hold 4 characters, and the only thing making a long int char constant gives you is the ability for your 4 characters to be twice as wide, not have twice the amount of characters. Am I missing something here?
I want to make it easy for the person writing the client code. The gui is stored in xml, so the id's are loaded in from strings, but there would be constants written in the client code to compare these against.
So, the long and the short of it is, I'm looking for a cross-platform way to do quick 7-8 character comparison, any ideas?
Are you sure this is not premature optimisation? Have you profiled another GUI framework that is slow purely from string comparisons? Why are you so sure string comparisons will be too slow? Surely you're not doing that many string compares. Also, consider strcmp should have a near optimal implementation, possibly written in assembly tailored for the CPU you're compiling for.
Anyway, other frameworks just use named integers, for example:
static const int MY_BUTTON_ID = 1;
You could consider that instead, avoiding the string issue completely. Alternatively, you could simply write a helper function to convert a const char[9] in to a 64-bit integer. This should accept a null-terminated string "like so" up to 8 characters (assuming you intend to throw away the null character). Then your program is passing around 64-bit integers, but the programmer is dealing with strings.
Edit: here's a quick function that turns a string in to a number:
__int64 makeid(const char* str)
{
__int64 ret = 0;
strncpy((char*)&ret, str, sizeof(__int64));
return ret;
}
One possibility is to define your IDs as a union of a 64-bit integer and an 8-character string:
union ID {
Int64 id; // Assuming Int64 is an appropriate typedef somewhere
char name[8];
};
Now you can do things like:
ID id;
strncpy(id.name, "Button1", 8);
if (anotherId.id == id.id) ...
The concept of string interning can be useful for this problem, turning string compares into pointer compares.
Easy to get pre-rolled Components
binary search tree for the win -- you get a red-black tree from most STL implementations of set and map, so you might want to consider that.
Intrusive versions of the STL containers perform MUCH better when you move the container nodes around a lot (in the general case) -- however they have quite a few caveats.
Specific Opinion -- First Alternative
If I was you I'd stick to a 64-bit integer type and bundle it in a intrusive container and use the library provided by boost. However if you are new to this sort of thing then use stl::map it is conceptually simpler to grasp, and it has less chances of leaking resources since there is more literature and guides out there for these types of containers and the best practises.
Alternative 2
The problem you are trying to solve I believe: is to have a global naming scheme which maps to handles. You can create a mapping of names to handles so that you can use the names to retrieve handles:
// WidgetHandle is a polymorphic base class (i.e., it has a virtual method),
// and foo::Luv implement WidgetHandle's interface (public inheritance)
foo::WidgetHandle * LuvComponent =
Factory.CreateComponent<foo::Luv>( "meLuvYouLongTime");
....
.... // in different function
foo::WidgetHandle * LuvComponent =
Factory.RetrieveComponent<foo::Luv>("meLuvYouLongTime");
Alternative 2 is a common idiom for IPC, you create an IPC type say a pipe in one process and you can ask the kernel for to retrieve the other end of the pipe by name.
I see a distinction between easily read identifiers in your code, and the representation being passed around.
Could you use an enumerated type (or a large header file of constants) to represent the identifier? The names of the enumerated types could then be as long and meaningful as you wish, and still fit in (I am guessing) a couple of bytes.
In C++0x, you'll be able to use user-defined string literals, so you could add something like 7chars..id or "7chars.."id:
template <char...> constexpr unsigned long long operator ""id();
constexpr unsigned long long operator ""id(const char *, size_t);
Although I'm not sure you can use constexpr for the second one.