struct Vector
{
float x, y, z;
};
func(Vector *vectors) {...}
usage:
load float *coords = load(file);
func(coords);
I have a question about the alignment of structures in C++. I will pass a set of points to the function func(). Is is OK to do it in the way shown above, or is this relying on platform-dependent behavior? (it works at least with my current compiler) Can somebody recommend a good article on the topic?
Or, is it better to directly create a set of points while loading the data from the file?
Thanks
Structure alignment is implementation-dependent. However, most compilers give you a way of specifying that a structure should be "packed" (that is, arranged in memory with no padding bytes between fields). For example:
struct Vector {
float x;
float y;
float z;
} __attribute__((__packed__));
The above code will cause the gcc compiler to pack the structure in memory, making it easier to dump to a file and read back in later. The exact way to do this may be different for your compiler (details should be in your compiler's manual).
I always list members of packed structures on separate lines in order to be clear about the order in which they should appear. For most compilers this should be equivalent to float x, y, z; but I'm not certain if that is implementation-dependent behavior or not. To be safe, I would use one declaration per line.
If you are reading the data from a file, you need to validate the data before passing it to func. No amount of data alignment enforcement will make up for a lack of input validation.
Edit:
After further reading your code, I understand more what you are trying to do. You have a structure that contains three float values, and you are accessing it with a float* as if it were an array of floats. This is very bad practice. You don't know what kind of padding that your compiler might be using at the beginning or end of your structure. Even with a packed structure, it's not safe to treat the structure like an array. If an array is what you want, then use an array. The safest way is to read the data out of the file, store it into a new object of type struct Vector, and pass that to func. If func is defined to take a struct Vector* as an argument and your compiler is allowing you to pass a float* without griping, then this is indeed implementation-dependent behavior that you should not rely on.
Use an operator>> extraction overload.
std::istream& operator>>(std::istream& stream, Vector& vec) {
stream >> vec.x;
stream >> vec.y;
stream >> vec.z;
return stream;
}
Now you can do:
std::ifstream MyFile("My Filepath", std::ios::openmodes);
Vector vec;
MyFile >> vec;
func(&vec);
Prefer passing by reference than passing by pointer:
void func(Vector& vectors)
{ /*...*/ }
The difference here between a pointer and a reference is that a pointer can be NULL or point to some strange place in memory. A reference refers to an existing object.
As far as alignment goes, don't concern yourself. Compilers handle this automagically (at least alignment in memory).
If you are talking about alignment of binary data in a file, search for the term "serialization".
First of all, your example code is bad:
load float *coords = load(file);
func(coords);
You're passing func() a pointer to a float var instead of a pointer to a Vector object.
Secondly, Vector's total size if equal to (sizeof(float) * 3), or in other words to 12 bytes.
I'd consult my compiler's manual to see how to control the struct's aligment, and just to get a peace of mind I'd set it to, say 16 bytes.
That way I'll know that the file, if contains one vector, is only 16 bytes in size always and I need to read only 16 bytes.
Edit:
Check MSVC9's align capabilities .
Writing binary data is non portable between machines.
About the only portable thing is text (even then can not be relied as not all systems use the same text format (luckily most accept the 127 ASCII characters and hopefully soon we will standardize on something like Unicode (he says with a smile)).
If you want to write data to a file you must decide the exact format of the file. Then write code that will read the data from that format and convert it into your specific hardware's representation for that type. Now this format could be binary or it could be a serialized text format it does not matter much in performance (as the disk IO speed will probably be your limiting factor). In terms of compactness the binary format will probably be more efficient. In terms of ease of writing decoding functions on each platform the text format is definitely easier as a lot of it is already built into the streams.
So simple solution:
Read/Write to a serialized text format.
Also no alignment issues.
#include <algorithm>
#include <fstream>
#include <vector>
#include <iterator>
struct Vector
{
float x, y, z;
};
std::ostream& operator<<(std::ostream& stream, Vector const& data)
{
return stream << data.x << " " << data.y << " " << data.z << " ";
}
std::istream& operator>>(std::istream& stream, Vector& data)
{
return stream >> data.x >> data.y >> data.z;
}
int main()
{
// Copy an array to a file
Vector data[] = {{1.0,2.0,3.0}, {2.0,3.0,4.0}, { 3.0,4.0,5.0}};
std::ofstream file("plop");
std::copy(data, data+3, std::ostream_iterator<Vector>(file));
// Read data from a file.
std::vector<Vector> newData; // use a vector as we don't know how big the file is.
std::ifstream input("inputFile");
std::copy(std::istream_iterator<Vector>(input),
std::istream_iterator<Vector>(),
std::back_inserter(newData)
);
}
Related
I am writing a parser in C++ to parse a well defined binary file. I have declared all the required structs. And since only particular fields are of interest to me, so in my structs I have skipped non-required fields by creating char array of size equal to skipped bytes. So I am just reading the file in char array and casting the char pointer to my struct pointer. Now problem is that all data fields in that binary are in big endian order, so after typecasting I need to change the endianness of all the struct fields. One way is to do it manually for each and every field. But there are various structs with many fields, so it'll be very cumbersome to do it manually. So what's the best way to achieve this. And since I'll be parsing very huge such files (say in TB's), so I require a fast way to do this.
EDIT : I have use attribute(packed) so no need to worry about padding.
If you can do misaligned accesses with no penalty, and you don't mind compiler- or platform-specific tricks to control padding, this can work. (I assume you are OK with this since you mention __attribute__((packed))).
In this case the nicest approach is to write value wrappers for your raw data types, and use those instead of the raw type when declaring your struct in the first place. Remember the value wrapper must be trivial/POD-like for this to work. If you have a POSIX platform you can use ntohs/ntohl for the endian conversion, it's likely to be better optimized that whatever you write yourself.
If misaligned accesses are illegal or slow on your platform, you need to deserialize instead. Since we don't have reflection yet, you can do this with the same value wrappers (plus an Ignore<N> placeholder that skips N bytes for fields you're not interested), and declare them in a tuple instead of a struct - you can iterate over the members in a tuple and tell each to deserialize itself from the message.
One way to do that is combine C preprocessor with C++ operators. Write a couple of C++ classes like this one:
#include "immintrin.h"
class FlippedInt32
{
int value;
public:
inline operator int() const
{
return _bswap( value );
}
};
class FlippedInt64
{
__int64 value;
public:
inline operator __int64() const
{
return _bswap64( value );
}
};
Then,
#define int FlippedInt32
before including the header that define these structures. #undef immediately after the #include.
This will replace all int fields in the structures with FlippedInt32, which has the same size but returns flipped bytes.
If it’s your own structures which you can modify you don’t need the preprocessor part. Just replace the integers with the byte-flipping classes.
If you can come up with a list of offsets (in-bytes, relative to the top of the file) of the fields that need endian-conversion, as well as the size of those fields, then you could do all of the endian-conversion with a single for-loop, directly on the char array. E.g. something like this (pseudocode):
struct EndianRecord {
size_t offsetFromTop;
size_t fieldSizeInByes;
};
std::vector<EndianRecord> todoList;
// [populate the todo list here...]
char * rawData = [pointer to the raw data]
for (size_t i=0; i<todoList.size(); i++)
{
const EndianRecord & er = todoList[i];
ByteSwap(&rawData[er.offsetFromTop], er.fieldSizeBytes);
}
struct MyPackedStruct * data = (struct MyPackedStruct *) rawData;
// Now you can just read the member variables
// as usual because you know they are already
// in the correct endian-format.
... of course the difficult part is coming up with the correct todoList, but since the file format is well-defined, it should be possible to generate it algorithmically (or better yet, create it as a generator with e.g. a GetNextEndianRecord() method that you can call, so that you don't have to store a very large vector in memory)
I have a binary file with some layout I know. For example let format be like this:
2 bytes (unsigned short) - length of a string
5 bytes (5 x chars) - the string - some id name
4 bytes (unsigned int) - a stride
24 bytes (6 x float - 2 strides of 3 floats each) - float data
The file should look like (I added spaces for readability):
5 hello 3 0.0 0.1 0.2 -0.3 -0.4 -0.5
Here 5 - is 2 bytes: 0x05 0x00. "hello" - 5 bytes and so on.
Now I want to read this file. Currently I do it so:
load file to ifstream
read this stream to char buffer[2]
cast it to unsigned short: unsigned short len{ *((unsigned short*)buffer) };. Now I have length of a string.
read a stream to vector<char> and create a std::string from this vector. Now I have string id.
the same way read next 4 bytes and cast them to unsigned int. Now I have a stride.
while not end of file read floats the same way - create a char bufferFloat[4] and cast *((float*)bufferFloat) for every float.
This works, but for me it looks ugly. Can I read directly to unsigned short or float or string etc. without char [x] creating? If no, what is the way to cast correctly (I read that style I'm using - is an old style)?
P.S.: while I wrote a question, the more clearer explanation raised in my head - how to cast arbitrary number of bytes from arbitrary position in char [x]?
Update: I forgot to mention explicitly that string and float data length is not known at compile time and is variable.
If it is not for learning purpose, and if you have freedom in choosing the binary format you'd better consider using something like protobuf which will handle the serialization for you and allow to interoperate with other platforms and languages.
If you cannot use a third party API, you may look at QDataStream for inspiration
Documentation
Source code
The C way, which would work fine in C++, would be to declare a struct:
#pragma pack(1)
struct contents {
// data members;
};
Note that
You need to use a pragma to make the compiler align the data as-it-looks in the struct;
This technique only works with POD types
And then cast the read buffer directly into the struct type:
std::vector<char> buf(sizeof(contents));
file.read(buf.data(), buf.size());
contents *stuff = reinterpret_cast<contents *>(buf.data());
Now if your data's size is variable, you can separate in several chunks. To read a single binary object from the buffer, a reader function comes handy:
template<typename T>
const char *read_object(const char *buffer, T& target) {
target = *reinterpret_cast<const T*>(buffer);
return buffer + sizeof(T);
}
The main advantage is that such a reader can be specialized for more advanced c++ objects:
template<typename CT>
const char *read_object(const char *buffer, std::vector<CT>& target) {
size_t size = target.size();
CT const *buf_start = reinterpret_cast<const CT*>(buffer);
std::copy(buf_start, buf_start + size, target.begin());
return buffer + size * sizeof(CT);
}
And now in your main parser:
int n_floats;
iter = read_object(iter, n_floats);
std::vector<float> my_floats(n_floats);
iter = read_object(iter, my_floats);
Note: As Tony D observed, even if you can get the alignment right via #pragma directives and manual padding (if needed), you may still encounter incompatibility with your processor's alignment, in the form of (best case) performance issues or (worst case) trap signals. This method is probably interesting only if you have control over the file's format.
Currently I do it so:
load file to ifstream
read this stream to char buffer[2]
cast it to unsigned short: unsigned short len{ *((unsigned short*)buffer) };. Now I have length of a string.
That last risks a SIGBUS (if your character array happens to start at an odd address and your CPU can only read 16-bit values that are aligned at an even address), performance (some CPUs will read misaligned values but slower; others like modern x86s are fine and fast) and/or endianness issues. I'd suggest reading the two characters then you can say (x[0] << 8) | x[1] or vice versa, using htons if needing to correct for endianness.
read a stream to vector<char> and create a std::string from this vector. Now I have string id.
No need... just read directly into the string:
std::string s(the_size, ' ');
if (input_fstream.read(&s[0], s.size()) &&
input_stream.gcount() == s.size())
...use s...
the same way read next 4 bytes and cast them to unsigned int. Now I have a stride.
while not end of file read floats the same way - create a char bufferFloat[4] and cast *((float*)bufferFloat) for every float.
Better to read the data directly over the unsigned ints and floats, as that way the compiler will ensure correct alignment.
This works, but for me it looks ugly. Can I read directly to unsigned short or float or string etc. without char [x] creating? If no, what is the way to cast correctly (I read that style I'm using - is an old style)?
struct Data
{
uint32_t x;
float y[6];
};
Data data;
if (input_stream.read((char*)&data, sizeof data) &&
input_stream.gcount() == sizeof data)
...use x and y...
Note the code above avoids reading data into potentially unaligned character arrays, wherein it's unsafe to reinterpret_cast data in a potentially unaligned char array (including inside a std::string) due to alignment issues. Again, you may need some post-read conversion with htonl if there's a chance the file content differs in endianness. If there's an unknown number of floats, you'll need to calculate and allocate sufficient storage with alignment of at least 4 bytes, then aim a Data* at it... it's legal to index past the declared array size of y as long as the memory content at the accessed addresses was part of the allocation and holds a valid float representation read in from the stream. Simpler - but with an additional read so possibly slower - read the uint32_t first then new float[n] and do a further read into there....
Practically, this type of approach can work and a lot of low level and C code does exactly this. "Cleaner" high-level libraries that might help you read the file must ultimately be doing something similar internally....
I actually implemented a quick and dirty binary format parser to read .zip files (following Wikipedia's format description) just last month, and being modern I decided to use C++ templates.
On some specific platforms, a packed struct could work, however there are things it does not handle well... such as fields of variable length. With templates, however, there is no such issue: you can get arbitrarily complex structures (and return types).
A .zip archive is relatively simple, fortunately, so I implemented something simple. Off the top of my head:
using Buffer = std::pair<unsigned char const*, size_t>;
template <typename OffsetReader>
class UInt16LEReader: private OffsetReader {
public:
UInt16LEReader() {}
explicit UInt16LEReader(OffsetReader const or): OffsetReader(or) {}
uint16_t read(Buffer const& buffer) const {
OffsetReader const& or = *this;
size_t const offset = or.read(buffer);
assert(offset <= buffer.second && "Incorrect offset");
assert(offset + 2 <= buffer.second && "Too short buffer");
unsigned char const* begin = buffer.first + offset;
// http://commandcenter.blogspot.fr/2012/04/byte-order-fallacy.html
return (uint16_t(begin[0]) << 0)
+ (uint16_t(begin[1]) << 8);
}
}; // class UInt16LEReader
// Declined for UInt[8|16|32][LE|BE]...
Of course, the basic OffsetReader actually has a constant result:
template <size_t O>
class FixedOffsetReader {
public:
size_t read(Buffer const&) const { return O; }
}; // class FixedOffsetReader
and since we are talking templates, you can switch the types at leisure (you could implement a proxy reader which delegates all reads to a shared_ptr which memoizes them).
What is interesting, though, is the end-result:
// http://en.wikipedia.org/wiki/Zip_%28file_format%29#File_headers
class LocalFileHeader {
public:
template <size_t O>
using UInt32 = UInt32LEReader<FixedOffsetReader<O>>;
template <size_t O>
using UInt16 = UInt16LEReader<FixedOffsetReader<O>>;
UInt32< 0> signature;
UInt16< 4> versionNeededToExtract;
UInt16< 6> generalPurposeBitFlag;
UInt16< 8> compressionMethod;
UInt16<10> fileLastModificationTime;
UInt16<12> fileLastModificationDate;
UInt32<14> crc32;
UInt32<18> compressedSize;
UInt32<22> uncompressedSize;
using FileNameLength = UInt16<26>;
using ExtraFieldLength = UInt16<28>;
using FileName = StringReader<FixedOffsetReader<30>, FileNameLength>;
using ExtraField = StringReader<
CombinedAdd<FixedOffsetReader<30>, FileNameLength>,
ExtraFieldLength
>;
FileName filename;
ExtraField extraField;
}; // class LocalFileHeader
This is rather simplistic, obviously, but incredibly flexible at the same time.
An obvious axis of improvement would be to improve chaining since here there is a risk of accidental overlaps. My archive reading code worked the first time I tried it though, which was evidence enough for me that this code was sufficient for the task at hand.
I had to solve this problem once. The data files were packed FORTRAN output. Alignments were all wrong. I succeeded with preprocessor tricks that did automatically what you are doing manually: unpack the raw data from a byte buffer to a struct. The idea is to describe the data in an include file:
BEGIN_STRUCT(foo)
UNSIGNED_SHORT(length)
STRING_FIELD(length, label)
UNSIGNED_INT(stride)
FLOAT_ARRAY(3 * stride)
END_STRUCT(foo)
Now you can define these macros to generate the code you need, say the struct declaration, include the above, undef and define the macros again to generate unpacking functions, followed by another include, etc.
NB I first saw this technique used in gcc for abstract syntax tree-related code generation.
If CPP is not powerful enough (or such preprocessor abuse is not for you), substitute a small lex/yacc program (or pick your favorite tool).
It's amazing to me how often it pays to think in terms of generating code rather than writing it by hand, at least in low level foundation code like this.
You should better declare a structure (with 1-byte padding - how - depends on compiler). Write using that structure, and read using same structure. Put only POD in structure, and hence no std::string etc. Use this structure only for file I/O, or other inter-process communication - use normal struct or class to hold it for further use in C++ program.
Since all of your data is variable, you can read the two blocks separately and still use casting:
struct id_contents
{
uint16_t len;
char id[];
} __attribute__((packed)); // assuming gcc, ymmv
struct data_contents
{
uint32_t stride;
float data[];
} __attribute__((packed)); // assuming gcc, ymmv
class my_row
{
const id_contents* id_;
const data_contents* data_;
size_t len;
public:
my_row(const char* buffer) {
id_= reinterpret_cast<const id_contents*>(buffer);
size_ = sizeof(*id_) + id_->len;
data_ = reinterpret_cast<const data_contents*>(buffer + size_);
size_ += sizeof(*data_) +
data_->stride * sizeof(float); // or however many, 3*float?
}
size_t size() const { return size_; }
};
That way you can use Mr. kbok's answer to parse correctly:
const char* buffer = getPointerToDataSomehow();
my_row data1(buffer);
buffer += data1.size();
my_row data2(buffer);
buffer += data2.size();
// etc.
I personally do it this way:
// some code which loads the file in memory
#pragma pack(push, 1)
struct someFile { int a, b, c; char d[0xEF]; };
#pragma pack(pop)
someFile* f = (someFile*) (file_in_memory);
int filePropertyA = f->a;
Very effective way for fixed-size structs at the start of the file.
Use a serialization library. Here are a few:
Boost serialization and Boost fusion
Cereal (my own library)
Another library called cereal (same name as mine but mine predates theirs)
Cap'n Proto
The Kaitai Struct library provides a very effective declarative approach, which has the added bonus of working across programming languages.
After installing the compiler, you will want to create a .ksy file that describes the layout of your binary file. For your case, it would look something like this:
# my_type.ksy
meta:
id: my_type
endian: be # for big-endian, or "le" for little-endian
seq: # describes the actual sequence of data one-by-one
- id: len
type: u2 # unsigned short in C++, two bytes
- id: my_string
type: str
size: 5
encoding: UTF-8
- id: stride
type: u4 # unsigned int in C++, four bytes
- id: float_data
type: f4 # a four-byte floating point number
repeat: expr
repeat-expr: 6 # repeat six times
You can then compile the .ksy file using the kaitai struct compiler ksc:
# wherever the compiler is installed
# -t specifies the target language, in this case C++
/usr/local/bin/kaitai-struct-compiler my_type.ksy -t cpp_stl
This will create a my_type.cpp file as well as a my_type.h file, which you can then include in your C++ code:
#include <fstream>
#include <kaitai/kaitaistream.h>
#include "my_type.h"
int main()
{
std::ifstream ifs("my_data.bin", std::ifstream::binary);
kaitai::kstream ks(&ifs);
my_type_t obj(&ks);
std::cout << obj.len() << '\n'; // you can now access properties of the object
return 0;
}
Hope this helped! You can find the full documentation for Kaitai Struct here. It has a load of other features and is a fantastic resource for binary parsing in general.
I use ragel tool to generate pure C procedural source code (no tables) for microcontrollers with 1-2K of RAM. It did not use any file io, buffering, and produces both easy to debug code and .dot/.pdf file with state machine diagram.
ragel can also output go, Java,.. code for parsing, but I did not use these features.
The key feature of ragel is the ability to parse any byte-build data, but you can't dig into bit fields. Other problem is ragel able to parse regular structures but has no recursion and syntax grammar parsing.
and thank you in advance for your help!
I am in the process of learning C++. My first project is to write a parser for a binary-file format we use at my lab. I was able to get a parser working fairly easily in Matlab using "fread", and it looks like that may work for what I am trying to do in C++. But from what I've read, it seems that using an ifstream is the recommended way.
My question is two-fold. First, what, exactly, are the advantages of using ifstream over fread?
Second, how can I use ifstream to solve my problem? Here's what I'm trying to do. I have a binary file containing a structured set of ints, floats, and 64-bit ints. There are 8 data fields all told, and I'd like to read each into its own array.
The structure of the data is as follows, in repeated 288-byte blocks:
Bytes 0-3: int
Bytes 4-7: int
Bytes 8-11: float
Bytes 12-15: float
Bytes 16-19: float
Bytes 20-23: float
Bytes 24-31: int64
Bytes 32-287: 64x float
I am able to read the file into memory as a char * array, with the fstream read command:
char * buffer;
ifstream datafile (filename,ios::in|ios::binary|ios::ate);
datafile.read (buffer, filesize); // Filesize in bytes
So, from what I understand, I now have a pointer to an array called "buffer". If I were to call buffer[0], I should get a 1-byte memory address, right? (Instead, I'm getting a seg fault.)
What I now need to do really ought to be very simple. After executing the above ifstream code, I should have a fairly long buffer populated with a number of 1's and 0's. I just want to be able to read this stuff from memory, 32-bits at a time, casting as integers or floats depending on which 4-byte block I'm currently working on.
For example, if the binary file contained N 288-byte blocks of data, each array I extract should have N members each. (With the exception of the last array, which will have 64N members.)
Since I have the binary data in memory, I basically just want to read from buffer, one 32-bit number at a time, and place the resulting value in the appropriate array.
Lastly - can I access multiple array positions at a time, a la Matlab? (e.g. array(3:5) -> [1,2,1] for array = [3,4,1,2,1])
Firstly, the advantage of using iostreams, and in particular file streams, relates to resource management. Automatic file stream variables will be closed and cleaned up when they go out of scope, rather than having to manually clean them up with fclose. This is important if other code in the same scope can throw exceptions.
Secondly, one possible way to address this type of problem is to simply define the stream insertion and extraction operators in an appropriate manner. In this case, because you have a composite type, you need to help the compiler by telling it not to add padding bytes inside the type. The following code should work on gcc and microsoft compilers.
#pragma pack(1)
struct MyData
{
int i0;
int i1;
float f0;
float f1;
float f2;
float f3;
uint64_t ui0;
float f4[64];
};
#pragma pop(1)
std::istream& operator>>( std::istream& is, MyData& data ) {
is.read( reinterpret_cast<char*>(&data), sizeof(data) );
return is;
}
std::ostream& operator<<( std::ostream& os, const MyData& data ) {
os.write( reinterpret_cast<const char*>(&data), sizeof(data) );
return os;
}
char * buffer;
ifstream datafile (filename,ios::in|ios::binary|ios::ate);
datafile.read (buffer, filesize); // Filesize in bytes
you need to allocate a buffer first before you read into it:
buffer = new filesize[filesize];
datafile.read (buffer, filesize);
as to the advantages of ifstream, well it is a matter of abstraction. You can abstract the contents of your file in a more convenient way. You then do not have to work with buffers but instead can create the structure using classes and then hide the details about how it is stored in the file by overloading the << operator for instance.
You might perhaps look for serialization libraries for C++. Perhaps s11n might be useful.
This question shows how you can convert data from a buffer to a certain type. In general, you should prefer using a std::vector<char> as your buffer. This would then look like this:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
int main() {
std::ifstream input("your_file.dat");
std::vector<char> buffer;
std::copy(std::istreambuf_iterator<char>(input),
std::istreambuf_iterator<char>(),
std::back_inserter(buffer));
}
This code will read the entire file into your buffer. The next thing you'd want to do is to write your data into valarrays (for the selection you want). valarray is constant in size, so you have to be able to calculate the required size of your array up-front. This should do it for your format:
std::valarray array1(buffer.size()/288); // each entry takes up 288 bytes
Then you'd use a normal for-loop to insert the elements into your arrays:
for(int i = 0; i < buffer.size()/288; i++) {
array1[i] = *(reinterpret_cast<int *>(buffer[i*288])); // first position
array2[i] = *(reinterpret_cast<int *>(buffer[i*288]+4)); // second position
}
Note that on a 64-bit system this is unlikely to work as you expect, because an integer would take up 8 bytes there. This question explains a bit about C++ and sizes of types.
The selection you describe there can be achieved using valarray.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to serialize in c++?
How to implement serialization in C++
I've been toying around with C++ more and more these days and have only had a couple experiences with ofstream at this point. Most of said experiences have been doing simple file output of variables and reading them back in with ifstream. What I haven't done is anything with objects.
Let's assume that I have an object that is being written to frequently (say a game, and the object is the character) Every time the character is hit, the hp is re-written, every time they defeat an enemy they are gaining experience.... my basic idea is to write a simple text-based dungeon crawling game. But how am I going to make some kind of an autosave file?? Do I just write out every attribute of my object to a file individually and then move onto bigger and better from there? If I had to do it right now that's how I'd go about doing it, but I can't help like feeling that there's an easier way than that....
Anyone care to help me output the contents of an entire object(and it's respective attributes) to a file?
You could just write the object to a file by copying it's contents in memory.
But then you hit the tricky bits! You can't copy any items that are pointers to memory, because when you load them back in they don't own that memory. That means copying things like std::string which may internally contain their own memory allocation is also tricky.
Then even with standard types there are issues if you plan to read them on a different machine with a different number of bits, or a different byte order.
The process is called serialisation - there are a few standard techniques to make it easier.
Take a look at this code :
//! Struct for a 2d marker
struct Marker2d{
double x; //!< x coordinate of marker in calibration phantom
double y; //!< y coordinate of marker in calibration phantom
int id; //!< some unique id (used for sequence id as well as for assigned 3d id)
int code; //!< type of marker (small = 0, large = 1)
float size; //!< real size of marker in 2D image (in pixel)
double distanceToNearest; //!< distance to nearest other marker
/**
* Overloaded stream insertion operator. Abbreviation for the output of 2d marker.
\param output_out A reference to an std::ostream instance indicating the output stream
\param marker_in A constant Marker2d reference indicating the 2d marker that we want to output
\return a std::ostream reference containing the new output data
*/
friend std::ostream & operator<<(std::ostream & output_out, const Marker2d & marker_in)
{
return output_out<< std::fixed << std::setprecision(15) <<marker_in.x<<"\t"<<marker_in.y<<"\t"<<marker_in.id<<"\t"
<<marker_in.code<<"\t"<<marker_in.size<<"\t"<<marker_in.distanceToNearest;
}
/**
* Overloaded stream extraction operator.
\param s_in A reference to an std::istream instance indicating the input stream
\param marker_out A Marker2d reference indicating the 2d marker that will have its data members populated
\return a std::istream reference indicating the input stream
*/
friend std::istream& operator>>(std::istream& s_in, Marker2d & marker_out)
{
s_in >> marker_out.x >> marker_out.y >> marker_out.id >> marker_out.code >> marker_out.size >> marker_out.distanceToNearest;
return s_in;
}
};
This is a simple struct with overloaded >> and << operators. This allows you to output to a file like myOfstreamFile << obj; And read the other way around.
If you have say, a thousand of objects stored in a file you can simply put them in a container like this :
std::vector<Marker2d> myMarkers;
std::ifstream f( fileName_in.c_str() );
if(!f)
throw std::exception(std::string("AX.Algorithms.ComputeAssignmentsTest::getMarkersFromFile - Could not open file : " + fileName_in + " for reading!").c_str());
//cool one liner to read objects from file...
std::copy(std::istream_iterator<AX::Calibration::Marker2d>(f), std::istream_iterator<AX::Calibration::Marker2d>(), std::back_inserter(myMarkers));
Of course you could provide other forms of input and output e.g. save to .xml format and parse it as a dom tree etc. This is just a sample.
EDIT : This will work for relatively simple objects. Look at serialization if you need something more complex
Search the web and SO for "serialization". There are some bumps to watch out for: floating point, endianess and variable length fields (strings).
Good luck!
Behind your simple question hides a complex theme. Please take a look to boost::serialization (here for instance). Any time spent to learning boost it's very rewarding.
There's a nice library in Boost called Boost.Serialize. If you're looking for performance, it is probably not the right choice, but looking at the source code and usage may give you some ideas. Serialization/Deserialization can be tricky, especially when you have lots of nested components. JSON is a very nice format for serializing general objects.
Here's a hacky trick that will likely get coders here to rip out their hair and scream at me (this only works for static objects and not ones that use dynamic memory allocation):
class TestA
{
private:
long Numerics;
char StaticArray[10];
int Data[3];
public:
TestA(){Numerics = 10; strcpy(StaticArray,"Input data"); Data[0] = 100; Data[1] = 200; Data[2] = 300;}
void Test(){Numerics = 1000; strcpy(StaticArray,"Data input"); Data[0] = 300; Data[1] = 200; Data[2] = 100;}
void Print()
{
printf("Numerics is: %ld\nStaticArray is: %s\n%d %d %d\n",Numerics,StaticArray,Data[0],Data[1],Data[2]);
}
};
int main()
{
TestA Test;
FILE *File = fopen("File.txt","wb");
Test.Test();
Test.Print();
fwrite((char *)&Test,sizeof(Test),1,File); //Treats the object as if it is a char array
fclose(File);
TestA Test2;
File = fopen("File.txt","rb");
fread((char *)&Test2,sizeof(Test2),1,File); //Treats the object as if it is a char array
Test2.Print();
fclose(File);
return 0;
}
Which results in:
Numerics is: 1000
Static array is: Data input
300 200 100
Numerics is: 1000
Static array is: Data input
300 200 100
Opening the file reveals the written data:
è Data input w, È d
The above trick allows for easy conversion into a byte-based format. Naturally this is hacky, but classes (or objects) should be expected to supply their own object-to-char array conversion process.
I have an object with several text strings as members. I want to write this object to the file all at once, instead of writing each string to file. How can I do that?
You can override operator>> and operator<< to read/write to stream.
Example Entry struct with some values:
struct Entry2
{
string original;
string currency;
Entry2() {}
Entry2(string& in);
Entry2(string& original, string& currency)
: original(original), currency(currency)
{}
};
istream& operator>>(istream& is, Entry2& en);
ostream& operator<<(ostream& os, const Entry2& en);
Implementation:
using namespace std;
istream& operator>>(istream& is, Entry2& en)
{
is >> en.original;
is >> en.currency;
return is;
}
ostream& operator<<(ostream& os, const Entry2& en)
{
os << en.original << " " << en.currency;
return os;
}
Then you open filestream, and for each object you call:
ifstream in(filename.c_str());
Entry2 e;
in >> e;
//if you want to use read:
//in.read(reinterpret_cast<const char*>(&e),sizeof(e));
in.close();
Or output:
Entry2 e;
// set values in e
ofstream out(filename.c_str());
out << e;
out.close();
Or if you want to use stream read and write then you just replace relevant code in operators implementation.
When the variables are private inside your struct/class then you need to declare operators as friend methods.
You implement any format/separators that you like. When your string include spaces use getline() that takes a string and stream instead of >> because operator>> uses spaces as delimiters by default. Depends on your separators.
It's called serialization. There are many serialization threads on SO.
There are also a nice serialization library included in boost.
http://www.boost.org/doc/libs/1_42_0/libs/serialization/doc/index.html
basically you can do
myFile<<myObject
and
myFile>>myObject
with boost serialization.
If you have:
struct A {
char a[30], b[25], c[15];
int x;
}
then you can write it all just with write(fh, ptr, sizeof(struct a)).
Of course, this isn't portable (because we're not saving the endieness or size of "int," but that may not be an issue for you.
If you have:
struct A {
char *a, *b, *c;
int d;
}
then you're not looking to write the object; you're looking to serialize it. Your best bet is to look in the Boost libraries and use their serialization routines, because it's not an easy problem in languages without reflection.
There's not really a simple way, it's C++ after all, not PHP, or JavaScript.
http://www.parashift.com/c++-faq-lite/serialization.html
Boost also has some library for it: http://www.boost.org/doc/libs/release/libs/serialization ... like Tronic already mentioned :)
The better method is to write each field individually along with the string length.
As an alternative, you can create a char array (or std::vector<char>) and write all the members into the buffer, then write the buffer to the output.
The underlying thorn is that a compiler is allowed to insert padding between members in a class or structure. Use memcpy or std::copy will result in padding bytes written to the output.
Just remember that you need to either write the string lengths and the content or the content followed by some terminating character.
Other people will suggest checking out the Boost Serialization library.
Unfortunately that is generally not quite possible. If your struct only contains plain data (no pointers or complex objects), you can store it as a one chunk, but care must be taken if portability is an issue. Padding, data type size and endianess issues make this problematic.
You can use Boost.Serialization to minimize the amount of code required for proper portable and versionable searialization.
Assuming your goal is as stated, to write out the object with a single call to write() or fwrite() or whatever, you'd first need to copy the string and other object data into a single contiguous block of memory. Then you could write() that block of memory out with a single call. Or you might be able to do a vector-write by calling writev(), if that call is available on your platform.
That said, you probably won't gain much by reducing the number of write calls. Especially if you are using fwrite() or similar already, then the C library is already doing buffering for you, so the cost of multiple small calls is minimal anyway. Don't put yourself through a lot of extra pain and code complexity unless it will actually do some good...