Passing raw data in C++ - c++

Up until now, whenever I wanted to pass some raw data to a function (like a function that loads an image from a buffer), I would do something like this:
void Image::load(const char* buffer, std::size_t size);
Today I took a look at the Boost libraries, more specifically at the property_tree/xml_parser.hpp header, and I noticed this function signature:
template<typename Ptree>
void read_xml(std::basic_istream<typename Ptree::key_type::value_type>&,
Ptree &, int = 0);
This actually made me curious: is this the correct way to pass around raw data in C++, by using streams? Or am I misinterpreting what the function is supposed to be used for?
If it's the former, could you please point me to some resource where I can learn how to use streams for this? I haven't found much myself (mainly API references), and I have't been able to find the Boost source code for the XML parser either.
Edit: Some extra details
Seems there's been some confusion as to what I want. Given a data buffer, how can I convert it to a stream such that it is compatible with the read_xml function I posted above? Here's my specific use case:
I'm using the SevenZip C library to read an XML file from an archive. The library will provide me with a buffer and its size, and I want to put that in stream format such that it is compatible with read_xml. How can I do that?

Well, streams are quite used in C++ because of their conveniences:
- error handling
- they abstract away the data source, so whether you are reading from a file, an audio source, a camera, they are all treated as input streams
- and probably more advantages I don't know of
Here is an overview of the IOstream library, perhaps that might better help you understand what's going on with streams:
http://www.cplusplus.com/reference/iostream/
Understanding what they are exactly will help you understand how and when to use them.

There's no single correct way to pass around data buffers. A combination of pointer and length is the most basic way; it's C-friendly. Passing a stream might allow for sequential/chunked processing - i. e. not storing the whole file in memory at the same time. If you want to pass a mutable buffer (that might potentially grow), a vector<char>& would be a good choice.
Specifically on Windows, a HGLOBAL or a section object handle might be used.
The C++ philosophy explicitly allows for many different styles, depending on context and environment. Get used to it.

Buffers of raw memory in C++ can either be of type unsigned char*, or you can create a std::vector<unsigned char>. You typically don't want to use just a char* for your buffer since char is not guaranteed by the standard to use all the bits in a single byte (i.e., this will end up varying by platform/compiler). That being said, streams have some excellent uses as well, considering that you can use a stream to read bytes from a file or some other input, etc., and from there, store that data in a buffer.

Seems there's been some confusion as to what I want. Given a data buffer, how can I convert it to a stream such that it is compatible with the read_xml function I posted above?
Easily (I hope PTree::Key_type::value_type would be something like char):
istringstream stream(string(data, len));
read_xml(stream, ...);
More on string streams here.

This is essentially using a reference to pass the stream contents. So behind the scene's it's essentially rather similar to what you did so far and it's essentially the same - just using a different notation. Simplified, the reference just hides the pointer aspect, so in your boost example you're essentially working with a pointer to the stream.
References got the advantage avoiding all the referencing/dereferencing and are therefore easier to handle in most situations. However they don't allow you multiple levels of (de-)referencing.
The following two example functions do essentially the same:
void change_a(int &var, myclass &cls)
{
var = cls.convert();
}
void change_b(int *var, myclass *cls)
{
*var = cls->convert();
}
Talking about the passed data itself: It really depends on what you're trying to achieve and what's more effective. If you'd like to modify a string, utilizing an object of class std::string might be more convenient than using a classic pointer to a buffer (char *). Streams got the advantage that they can represent several different things (e.g. data stream on the network, a compressed stream or simply a file or memory stream). This way you can write single functions or methods that accept a stream as input and will instantly work without worrying about the actual stream source. Doing this with classic buffers can be more complicated. On the other side you shouldn't forget that all objects will add some overhead, so depending on the job to be done a simple pointer to a character string might be perfectly fine (and the most effective solution). There's no "the one way to do it".

Related

Serialization doubts

I have to create an application layer protocol for a C++ application, but I have some doubts about how I can do it, especially about the serialization:
My idea is to create a class for describing the header, something like this:
class Header {
int type;
int length;
char[] message;
}
Now, in order to serialize it and to pass it through a socket, I'm thinking about using Boost Serialization. But my question is: is it "cross-platform"? In the sense that if I want to receive the data into a Python/Ruby/any-other-language server with its own class, can I do it or not (since I've serialized a C++ class)?
If not, is useful to serialize the class data into a JSON/XML file and transmit it?
If I want to serialize an object into a string, Does I have to pay attention to the big/little-endian and/or the string encoding and/or other details?
Since, not all the machines using the same number of bytes to define, for example, the primitive data types, is it necessary to use something like uint32_t data types to force the system to use a certain amount of bytes?
Thank you very much!
I think you should check this answer first.
Serializing into a C-style string is fine as far as big/little endian woes go, but is not as good performance-wise.
Using C-style string serialization will mostly solve this problem as well. On read side you should make sure that when parsing numbers they don't exceed local data size.
If this is a serious project then maybe consider using JSON or XML.

Is it possible to override std::istream read method?

I want to use std::istream to read data from a given class which only provides 2 methods:
// Returns a byte from the stream (consuming it)
uint8_t getChar(OwnIOStream stream);
// Makes the passed pointer point to the data in the stream
bool getCharBlockPtr(OwnIOStream stream, uint8_t** buffer, uin32_t maxSize, uint32_t* size);
I first thought of inheriting from stream_buf and implement the underflow method using the getChar() method. However I would like to use the getCharBlockPtr() instead to avoid copies of data (I assume calling underflow for each read byte will decrease the performance). The problem is that I need to know in advance the quantity of bytes I want to read each time. This is why I was thinking whether it would be possible to override the read method of istream.
Using boost is not an option.
Deriving from streambuf is a perfectly valid option.
Whether you need to do much (if any) more than override underflow depends. If OwnIOStream is already buffering data, then you might get adequate performance from just doing that. I'd start by doing that, and seeing how things go.
The next obvious step would be to use getCharBlockPtr. A streambuf has a setg that you use to set the pointers to the beginning, current position, and end of the buffer. At least if I'm interpreting things correctly, you'd call getCharBlockPtr and get the beginning and end. You'd then call setg to set the beginning and current positions to the beginning, and the end to the end. From there, the streambuf should be able to read data directly from the buffer. When it runs out, it'll call underflow, and you'll need to get more data again.
One note: it doesn't look like OwnIOStream supports a putback area, which streams normally do expect to support. Getting this to work correctly (especially putting back a character when at the beginning of a buffer) may be somewhat non-trivial to support correctly.
The general rule is that most classes from the standard C++ library are not meant to be derived. The exception here is the streambuf class which is intended to be the low level interface between an IO capable object and a C++ stream.
So the standard way here would be to build a custom streambuf calling getChar or better getCharBlockPtr that would allow buffering. Jerry's answer explains how to do it.
But it looks like you have special requirement here:
you know at a higher level the number of bytes you want to read
you are in a performance critical operation and want to avoid buffer copies if possible - but this last point should be reviewed after profiling...
Then, asking for the full power of a C++ stream may not be the way to go. Building a custom class that would only implement the methods of a istream that you need to call, as well as few extractors would still give a nice code at high level and could give a more optimized code at low level.
But you and only you can know:
whether your higher level code requires a C++ stream - then a custom class is not an option and you will have to use a custom streambuf
whether you could directly give the expected size to the streambuf from the calling code as a hint. More or less:
class OwnStreamBuf: public std::streambuf {
...
void sizehint(size_t hint) { /* size for next read access */
this->hint = hint
}
int_type underflow() {
...
cr = getCharBlockPtr(stream, buffer, hint, size); /* use hint as maxSize */
...
}
}
...
OwnStreamBuf buf(...);
class OwnIStream: public istream ...
OwnIStream is(buf)
...
buf.sizehint(n)
is >> special_obj;
...
whether you only need a handful of extractors and a custom class will be simpler to build from scratch.

C++ Truncating an iostream Class

For a project I'm working on for loading/storing data in files, I made the decision to implement the iostream library, because of some of the features it holds over other file io libraries. One such feature, is the ability to use either the deriving fstream or stringstream classes to allow the loading of data from a file, or an already existent place in memory. Although, so far, there is one major fallback, and I've been having trouble getting information about it for a while.
Similar to the functions available in the POSIX library, I was looking for some implementation of the truncate or ftruncate functions. For stringstream, this would be as easy as passing the associated string back to the stream after reconstructing it with a different size. For fstream, I've yet to find any way of doing this, actually, because I can't even find a way to pull the handle to the file out of this class. While giving me a solution to the fstream problem would be great, an even better solution to this problem would have to be iostream friendly. Because every usage of either stream class goes through an iostream in my program, it would just be easier to truncate them through the base class, than to have to figure out which class is controlling the stream every time I want to change the overall size.
Alternatively, if someone could point me to a library that implements all of these features I'm looking for, and is mildly easy to replace iostream with, that would be a great solution as well.
Edit: For clarification, the iostream classes I'm using are more likely to just be using only the stringstream and fstream classes. Realistically, only the file itself needs to be truncated to a certain point, I was just looking for a simpler way to handle this, that doesn't require me knowing which type of streambuf the stream was attached to. As the answer suggested, I'll look into a way of using ftruncate alongside an fstream, and just handle the two specific cases, as the end user of my program shouldn't see the stream classes anyways.
You can't truncate an iostream in place. You have to copy the first N bytes from the existing stream to a new one. This can be done with the sgetn() and sputn() methods of the underlying streambuf object, which you can obtain by iostream::rdbuf().
However, that process may be I/O intensive. It may be better to use special cases to manipulate the std::string or call ftruncate as you mentioned.
If you want to be really aggressive, you can create a custom std::streambuf derivative class which keeps a pointer to the preexisting rdbuf object in a stream, and only reads up to a certain point before generating an artifiicial end-of-file. This will look to the user like a shorter I/O sequence, but will not actually free memory or filesystem space. (It's not clear if that is important to you.)

C++ way to serialize a message?

In my current project I have a few different interfaces that require me to serialize messages into byte buffers. I feel like I'm probably not doing it in a way that would make a true C++ programmer happy (and I'd like to).
I would typically do something like this:
struct MyStruct {
uint32_t x;
uint64_t y;
uint8_t z[80];
};
uint8_t* serialize(const MyStruct& s) {
uint8_t* buffer = new uint8_t[sizeof(s)];
uint8_t* temp = buffer;
memcpy(temp, &s.x, sizeof(s.x));
temp += sizeof(s.x);
//would also have put in network byte order...
... etc ...
return buffer;
}
Excuse any typos, that was just an example off the top of my head. Obviously it can get more complex if the structure I'm serializing has internal pointers.
So, I have two questions that are closely related:
Is there any problem in the specific scenario above with serializing by casting the struct directly to a char buffer assuming I know that the destination systems are in the same endianness?
Main question: Is there a better... erm... C++? way to do this aside from the addition of smart pointers? I feel like this is such a common problem that the STL probably handles it - and if it doesn't, I'm sure there's a better way to do it anyway using C++ mechanisms.
EDIT Bonus points if you can do a clean example of serializing this structure in a better way using standard C++/STL without added libraries.
You might want to take a look at Google Protocol Buffers (also known as protobuf). You define your data in a language neutral IDL and then run it through a generator to generate your C++ classes. It'll take care of byte ordering issues and can provide a very compact binary form.
By using this you'll not only be able to save your C++ data but it'll be useable in other languages (C#, Java, Python etc) as there are protobuf implementation available for them.
You should probable use either Boost::serialization or streams directly. More information in either of the links to the right.
Is it possible to serialize and deserialize a class in C++?
just voted AzPs reply as the answer, checking Boost first is the way to go.
in addition about your code sample:
1 - changing the signature of your serialization function to a method taking a file:
void MyStruct::serialize(FILE* file) // or stream
{
int size = sizeof(this);
fwrite(&size, sizeof(int), 1, file); // write size
fwrite(this, 1, size, file); // write raw bytes of struct
}
reduces the necessity to copy the struct.
2 - yes, your code makes the serialized bytes dependent on your platform, compiler and compiler settings. this is not good or bad, if the same binary writes and reads the serialized bytes, this might be beneificial because of simplicity and performance. But it is not only endianness, also packing and struct layout affects compatibility. For instance a 32bit or a 64bit version of your app will change the layout of our struct quite for sure. Finally serializing the raw footprint also serializes padding bytes - the bytes the compiler might put between struct fields, an overhead undesirable for high traffic network streams (see google protocol buffers as they hunt every bit they can save).
EDIT:
i see you added "embedded". yes, then such simple serialize / deserialize (mirror implementation of the above serialize) methods might be a good and simple choice.

C++ Iterate an istream

What is the best way to parse or iterate an istream? I need to create a function that takes an istream, parses it and creates an object so was wondering the easiest way to do this. Even something that could convert it to string would be dandy.
You can use an istream_iterator.
typedef std::istream_iterator<std::string> streamiter;
for (streamiter it = streamiter(some_istream); it != streamiter(); it++) {
// process words
}
This will split the input stream at all whitespaces.
Since C++ doesn't have reflection and persistence built in, you cannot write a function that reads any object and then see what it came up with. You need to know what you are looking for before-hand and read that specifically. (Of course you could always just read tokens and feed those into a parser.)
If you know exactly which type of object to read from the stream, it's often good to give that class a constructor taking a std::istream&. Since usually the class is also where the code to write into the stream is, this puts both of them close together, which is best for maintenance. The parsing code then just creates the object passing the stream to the constructor.
If you don't know which type you will encounter, you will have to write a (probably simple) parsing function. Such formats should start with an identifier that tells which type of objects follow. Your parsing function would first have to read that identifier, and then branch into code that reads the appropriate type from the stream. Since at this point it knows what type of object to read from the stream, reading the actual objects can be implemented in constructors as described above.