The Beast websocket example stores the data in a multibuffer:
The implementation uses a sequence of one or more character arrays
of varying sizes. Additional character array objects are appended to
the sequence to accommodate changes in the size of the character
sequence.
When looking at the interface it is not completely clear to me how it works. If I read the descriptions it can be seen as an array of buffers. But it seems the output is only a single chunk of data. Does this mean the "one or more arrays" are only applicable to the internal structure ?
In the example code the data is read into the buffer as follows: m_websocketStream.async_read(m_buffer.....
Does each async_read operation creates a new internal buffer.
If this is the case, how to interpret it at the other end. E.G. how to read it into a std::string or std::vector.
When looking into the sources data() returns const_buffer_type, which is a forward declaration.
For the data member the help information provides the following info, which is not of much help:
The type used to represent the input sequence as a list of buffers.
using const_buffers_type = implementation_defined;
The definition seems to come from the header file boost/asio/buffer.hpp which is included as well. The overall structure however is somewhat obfuscating to me.
I just try to understand how to handle the data as bytes or convert it to as std::string.
Tried the following, but this is also not allowed:
std::string( boost::asio::buffer_cast<const char*>(m_buffer.data())
,boost::asio::buffer_size(m_buffer.data()) );
Anyone who can enlighten me a little ?
data() returns an object meeting the requirements of ConstBufferSequence (http://www.boost.org/doc/libs/1_65_0/doc/html/boost_asio/reference/ConstBufferSequence.html). prepare() returns an object meeting the requirements of MutableBufferSequence (http://www.boost.org/doc/libs/1_65_0/doc/html/boost_asio/reference/MutableBufferSequence.html)
All of the dynamic buffers in beast meet the requirements of DynamicBuffer, described in http://www.boost.org/doc/libs/develop/libs/beast/doc/html/beast/concepts/DynamicBuffer.html
If you want to convert a buffer sequence into a string you need to loop over each element and append it to the string individually. Such a function might look like this:
template<class ConstBufferSequence>
std::string
to_string(ConstBufferSequence const& buffers)
{
std::string s;
s.reserve(boost::asio::buffer_size(buffers));
for(boost::asio::const_buffer b : buffers)
s.append(boost::asio::buffer_cast<char const*>(b),
boost::asio::buffer_size(b));
return s;
}
Alternatively, if you want to avoid the buffer copy you can use something like beast::flat_buffer which guarantees that all the buffer sequences will have length one. Something like this:
inline
std::string
to_string(beast::flat_buffer const& buffer)
{
return std::string(boost::asio::buffer_cast<char const*>(
beast::buffers_front(buffer.data())),
boost::asio::buffer_size(buffer.data()));
}
For more information on buffers, see http://www.boost.org/doc/libs/1_65_0/doc/html/boost_asio/overview/core/buffers.html
In the latest versions of Beast, there is now the function buffers_to_string which will do this for you in a single function call.
Related
A C function API accepts uint8_t* and size_t as parameters:
bool foo(uint8_t* buff, size_t bufflen)
What is the best way to manage and handle in C++ layer invoking this API. Is string, vector or list a better option
Just make sure while calling this API from C++ you always pass a uint8_t type pointer . normal array uint8_t arr[x] (x is any +ve number) will also work. Just make sure address you passed has data of type uint8_t with correct size of the buffer.
e.g. uint8_t arr[6]; for this the call will be foo(arr,6);
You probably want std::vector<uint8_t> while passing data() and size().
You can't pass a container to the C function. You can still use one in your C++ code, but you'll need to pass a pointer to the data, in accordance to what the C function parameters are. Use a vector. This is equivalent to an array in C, in that the data is stored contiguously in memory.
std::vector<uint8_t> myData;
// ... fill myData
// for c++11 and later,
foo(myData.data(), myData.size());
// pre-c++11
foo(&(myData[0]), myData.size());
Is string, vector or list a better option?
Well, list is a non-starter, because it will not store the sequence in contiguous memory. In other words, the C code would not be able to take a pointer to the first element and increment it to get to the second element.
As for the other two, that depends on the rest of your C++ code. I would lean towards vector rather than string, but you haven't really provided enough context for that to be any more than a general feeling.
Usually I would go with a helper class that has an method that either takes a vector, or a custom structure that acts like a span - i.e. a pair<void*,int>, or perhaps even a span (but I'm not allowed the C++14 crayons).
If the data really is character-based the std::string and string spans can work well, but if it is really binary data, vector and vector spans are the better encapsulation, IMBO.
I still don't want to call that directly from application code if what is actually in there is structured data. You can easily write a method that takes an expected structure type and generates the pointer and sizeof(instance).
You can write a generic template that would accept any structure and convert it to a void*/char* and length, but that tends to open your code up to more accidents.
I would like to write a struct to a file as binary. The struct has two members, one is POD-only but the problem is the second member is a string:
struct ToWrite
{
std::string _str;
PODType _pod;
};
If I was simply writing the POD type as binary I would do:
file.write((char*)&_pod, sizeof(_pod));
and to read back:
const PODType& pod = *reinterpret_cast<const PODType*>(&bytes[position]);
However, I'm aware the string is more-complicated because you need to record the size. If I were to add a third class member, being an int containing the size of the string, how do you write and read the struct?
You need to do three things:
Define a storage format for the structure as a stream of bytes.
Write code to convert the structure into an array of bytes in the format you defined in step 1.
Write code to parse the array of bytes you defined in step 1 and fill in the structure.
If you want to find more information, the best search keyword to use is probably "serialization". There are lots of serialization libraries that you can use to save you from having to go through this complexity every time you need to serialize/deserialize a data structure. I personally like protocol buffers and boost serialization, but there are lots of options.
I don't quite understand the advantage of using streambuf over the regular array.
Let me explain my problem. I have a network connection which is encrypted using Rijndael 128 ECB + some easy cipher to encrypt remaining data that is shorter than 16 bytes. The packets are structured as length_of_whole_packet+operationcode+data. I have to actually copy all the data from the streambuf so I can apply decryption algorithm? Why making another copy of data I already have?
Same problem I have with the sending data. When in securedmode the packet structure is length_of_whole_packet+crc+data, where crc and data is encrypted. I could make some monstrosity as MakePacket(HEADER, FORMAT, ...) that would allocate array, format the packet, add crc and encrypt it, but I would like to avoid vararg function. I can't use structs as the packets has dynamic length because there can be arrays in it, or strings. If I used MakePacket(unsigned char opcode, &streambuf sb) then there would be problem with the crc again -> have to make a copy to encrypt it.
Should I use the vararg monstrosity for sending using regular array as buffer in combination with unsigned char pbyRecvBuffer[BUFFERMAXLEN] for recv?
I'm not really sure how to design this communication with avoiding copies of data.
Thank you for you answers.
When using streambufs, copying of data can often be minimized by using algorithms that operate on iterators, such as std::istreambuf_iterator or boost::asio::buffers_iterator, rather than copying data from a streambuf into another data structure.
For stream-like application protocols, boost::asio::streambuf is often superior to boost::asio::buffer() compatible types, such as a raw array. For example, consider HTTP, where a delimiter is used to identify boundaries between variable length headers and bodies. The higher-level read_until() operations provide an elegant way to to read the protocol, as Boost.Asio will handle the memory allocation, detect the delimiter, and invoke the completion handler once the message boundary has been reached. If an application used a raw array, it would need to read chunks and copy each fragmented chunk into an aggregated memory buffer until the appropriate delimiter was identified.
If the application can determine the exact amount of bytes to read, then it may be worth considering using boost::array for fixed length portions and std::vector for the variable length portions. For example, an application protocol with a:
fixed length body could be read into a boost::array.
fixed length header that contains enough information to determine the length of the following variable length body could use a std::vector to read the fixed size header, resize the vector once the body length has been determined, then read the body.
In context of the question, if length_of_whole_packet is of a fixed length, the application could read it into std::vector, resize the vector based on the determined body length, then read the remaining data into the vector. The decryption algorithm could then operate directly on the vector and use an output iterator, such as std::back_insert_iterator, with an auxiliary output buffer if the algorithm cannot be done in-place. The same holds true for encrypting data to be written.
I have a structure, that contain string. Something like that:
struct Chunk {
int a;
string b;
int c;
};
So, i suppose, that i cannot write and read this structure from file using fread and fwrite functions. Because string may reserve different memory capacity.
But such code works correctly.
Chunk var;
fwrite(&var, sizeof(Chunk), 1, file);
fread(&var, sizeof(Chunk), 1, file);
Is there really some problems in it?
You are justified in doubting this. You should only stream POD types with fwrite and fread and string is not POD.
You shouldn't do it like this, because different implementations use different structures of std::string.
In general you should only serialize integral types, the boolean type, binary data (if you can call it serializing). Make sure to use one endian-ness if you are thinking of sharing serialized data between platforms.
Watch out with floats, doubles and pointers. They can become very pesky.
You'll have to watch out with C/C++ structs too ebcause they can contain unpredictable amounts of padding.
You should serialize data.
You might like to do this manually - when it comes about std::string , check out:
const charT* std::string::c_str() const
const charT* std::string::data() const
When it comes about more complex objects, you might be interested in things like Google Protocol Buffers and/or Thrift.
I have a situation where I need to process large (many GB's) amounts of data as such:
build a large string by appending many smaller (C char*) strings
trim the string
convert the string into a C++ const std::string for processing (read only)
repeat
The data in each iteration are independent.
My question is, I'd like to minimise (if possible eliminate) heap allocated memory usage, as it at the moment is my largest performance problem.
Is there a way to convert a C string (char*) into a stl C++ string (std::string) without requiring std::string to internally alloc/copy the data?
Alternatively, could I use stringstreams or something similar to re-use a large buffer?
Edit: Thanks for the answers, for clarity, I think a revised question would be:
How can I build (via multiple appends) a stl C++ string efficiently. And if performing this action in a loop, where each loop is totally independant, how can I re-use thisallocated space.
You can't actually form a std::string without copying the data. A stringstream would probably reuse the memory from pass to pass (though I think the standard is silent on whether it actually has to), but it still wouldn't avoid the copying.
A common approach to this sort of problem is to write the code which processes the data in step 3 to use a begin/end iterator pair; then it can easily process either a std::string, a vector of chars, a pair of raw pointers, etc. Unlike passing it a container type like std::string, it would no longer know or care how the memory was allocated, since it would still belong to the caller. Carrying this idea to its logical conclusion is boost::range, which adds all the overloaded constructors to still let the caller just pass a string/vector/list/any sort of container with .begin() and .end(), or separate iterators.
Having written your processing code to work on an arbitrary iterator range, you could then even write a custom iterator (not as hard as it sounds, basically just an object with some standard typedefs, and operator ++/*/=/==/!= overloaded to get a forward-only iterator) that takes care of advancing to the next fragment each time it hit the end of the one it's working on, skipping over whitespace (I assume that's what you meant by trim). That you never had to assemble the whole string contiguously at all. Whether or not this would be a win depends on how many fragments/how large of fragments you have. This is essentially what the SGI rope mentioned by Martin York is: a string where append forms a linked list of fragments instead of a contiguous buffer, which is thus suitable for much longer values.
UPDATE (since I still see occasional upvotes on this answer):
C++17 introduces another choice: std::string_view, which replaced std::string in many function signatures, is a non-owning reference to a character data. It is implicitly convertible from std::string, but can also be explicitly constructed from contiguous data owned somewhere else, avoiding the unnecessary copying std::string imposes.
Is it at all possible to use a C++ string in step 1? If you use string::reserve(size_t), you can allocate a large enough buffer to prevent multiple heap allocations while appending the smaller strings, and then you can just use that same C++ string throughout all of the remaining steps.
See this link for more information on the reserve function.
To help with really big strings SGI has the class Rope in its STL.
Non standard but may be usefull.
http://www.sgi.com/tech/stl/Rope.html
Apparently rope is in the next version of the standard :-)
Note the developer joke. A rope is a big string. (Ha Ha) :-)
This is a lateral thinking answer, not directly addressing the question but "thinking" around it. Might be useful, might not...
Readonly processing of std::string doesn't really require a very complex subset of std::string's features. Is there a possibility that you could do search/replace on the code that performs all the processing on std::strings so it takes some other type instead? Start with a blank class:
class lightweight_string { };
Then replace all std::string references with lightweight_string. Perform a compilation to find out exactly what operations are needed on lightweight_string for it to act as a drop-in replacement. Then you can make your implementation work however you want.
Is each iteration independent enough that you can use the same std::string for each iteration? One would hope that your std::string implementation is smart enough to re-use memory if you assign a const char * to it when it was previously used for something else.
Assigning a char * into a std::string must always at least copy the data. Memory management is one of the main reasons to use std::string, so you won't be a able to override it.
In this case, might it be better to process the char* directly, instead of assigning it to a std::string.