I am making use of C++ msgpack implementation. I have hit a roadblock as to how to pack binary data. In terms of binary data I have a buffer of the following type:
unsigned char* data;
The data variable points to an array which is actually an image. What I want to do is pack this using msgpack. There seems to be no example of how to actually pack binary data. From the format specification raw bytes are supported, but I am not sure how to make use of the functionality.
I tried using a vector of character pointers like the following:
msgpack::sbuffer temp_sbuffer;
std::vector<char*> vec;
msgpack::pack(temp_sbuffer, vec);
But this results in a compiler error since there is no function template for T=std::vector.
I have also simply tried the following:
msgpack::pack(temp_sbuffer, "Hello");
But this also results in a compilation error (i.e. no function template for T=const char [6]
Thus, I was hoping someone could give me advice on how to use msgpack C++ to pack binary data represented as a char array.
Josh provided a good answer but it requires the copying of byte buffers to a vector of char. I would rather minimize copying and use the buffer directly (if possible). The following is an alternative solution:
Looking through the source code and trying to determine how different data types are packed according to the specification I happened upon msgpack::packer<>::pack_raw(size_t l) and msgpack::packer<>::pack_raw_body(const char* b, size_t l). While there appears to be no documentation for these methods this is how I would described them.
msgpack::packer<>::pack_raw(size_t l): This method appends the type identification to buffer (i.e. fix raw, raw16 or raw32) as well as the size information (which is an argument for the method).
msgpack::packer<>::pack_raw_body(const char* b, size_t l): This method appends the raw data to the buffer.
The following is a simple example of how to pack a character array:
msgpack::sbuffer temp_sbuffer;
msgpack::packer<msgpack::sbuffer> packer(&temp_sbuffer);
packer.pack_raw(5); // Indicate that you are packing 5 raw bytes
packer.pack_raw_body("Hello", 5); // Pack the 5 bytes
The above example can be extended to pack any binary data. This allows one to pack directly from byte arrays/buffers without having to copy to an intermediate (i.e. a vector of char).
If you can store your image in a vector<unsigned char> instead of a raw array of unsigned char, then you can pack that vector:
#include <iostream>
#include <string>
#include <vector>
#include <msgpack.hpp>
int main()
{
std::vector<unsigned char> data;
for (unsigned i = 0; i < 10; ++i)
data.push_back(i * 2);
msgpack::sbuffer sbuf;
msgpack::pack(sbuf, data);
msgpack::unpacked msg;
msgpack::unpack(&msg, sbuf.data(), sbuf.size());
msgpack::object obj = msg.get();
std::cout << obj << std::endl;
}
Strangely, this only works for unsigned char. If you try to pack a buffer of char instead (or even an individual char), it won't compile.
MessagePack has a raw_ref type which you could use like so:
#include "msgpack.hpp"
class myClass
{
public:
msgpack::type::raw_ref r;
MSGPACK_DEFINE(r);
};
int _tmain(int argc, _TCHAR* argv[])
{
const char* str = "hello";
myClass c;
c.r.ptr = str;
c.r.size = 6;
// From here on down its just the standard MessagePack example...
msgpack::sbuffer sbuf;
msgpack::pack(sbuf, c);
msgpack::unpacked msg;
msgpack::unpack(&msg, sbuf.data(), sbuf.size());
msgpack::object o = msg.get();
myClass d;
o.convert(&d);
OutputDebugStringA(d.r.ptr);
return 0;
}
Disclaimer: I found this by poking around the header files, not through reading the non-existent documentation on serialising raw bytes, so it may not be the 'correct' way (though it was defined along with all the other 'standard' types a serialiser would want to explicitly handle).
msgpack-c has been updated after question and answers were posted.
I'd like to inform the current situation.
Since msgpack-c version 2.0.0 C-style array has been supported. See https://github.com/msgpack/msgpack-c/releases
msgpack-c can pack const char array such as "hello".
Types conversion rule is documented https://github.com/msgpack/msgpack-c/wiki/v2_0_cpp_adaptor#predefined-adaptors.
char array is mapped to STR. If you want to use BIN instead of STR, you need to wrap with msgpack::type::raw_ref.
That is packing overview.
Here are unpacking and converting description:
https://github.com/msgpack/msgpack-c/wiki/v2_0_cpp_object#conversion
Unpack means creating msgpack::object from MessagePack formatted byte stream. Convert means converting to C++ object from msgpack::object.
If MessagePack formatted data is STR, and covert target type is char array, copy data to array, and if array has extra capacity, add '\0'. If MessagePack formatted data is BIN, '\0' is not added.
Here is a code example based on the original question:
#include <msgpack.hpp>
#include <iostream>
inline
std::ostream& hex_dump(std::ostream& o, char const* p, std::size_t size ) {
o << std::hex << std::setw(2) << std::setfill('0');
while(size--) o << (static_cast<int>(*p++) & 0xff) << ' ';
return o;
}
int main() {
{
msgpack::sbuffer temp_sbuffer;
// since 2.0.0 char[] is supported.
// See https://github.com/msgpack/msgpack-c/wiki/v2_0_cpp_adaptor#predefined-adaptors
msgpack::pack(temp_sbuffer, "hello");
hex_dump(std::cout, temp_sbuffer.data(), temp_sbuffer.size()) << std::endl;
// packed as STR See https://github.com/msgpack/msgpack/blob/master/spec.md
// '\0' is not packed
auto oh = msgpack::unpack(temp_sbuffer.data(), temp_sbuffer.size());
static_assert(sizeof("hello") == 6, "");
char converted[6];
converted[5] = 'x'; // to check overwriting, put NOT '\0'.
// '\0' is automatically added if char-array has enought size and MessagePack format is STR
oh.get().convert(converted);
std::cout << converted << std::endl;
}
{
msgpack::sbuffer temp_sbuffer;
// since 2.0.0 char[] is supported.
// See https://github.com/msgpack/msgpack-c/wiki/v2_0_cpp_adaptor#predefined-adaptors
// packed as BIN
msgpack::pack(temp_sbuffer, msgpack::type::raw_ref("hello", 5));
hex_dump(std::cout, temp_sbuffer.data(), temp_sbuffer.size()) << std::endl;
auto oh = msgpack::unpack(temp_sbuffer.data(), temp_sbuffer.size());
static_assert(sizeof("hello") == 6, "");
char converted[7];
converted[5] = 'x';
converted[6] = '\0';
// only first 5 bytes are written if MessagePack format is BIN
oh.get().convert(converted);
std::cout << converted << std::endl;
}
}
Running Demo:
https://wandbox.org/permlink/mYJyYycfsQIwsekY
Related
What I am trying to do is to save multiple different pointers to unique wchar_t strings into a vector. My current code is this:
std::vector<wchar_t*> vectorOfStrings;
wchar_t* bufferForStrings;
for (i = 0, i > some_source.length; i++) {
// copy some string to the buffer...
vectorOfStrings.push_back(bufferForStrings);
}
This results in bufferForStrings being added to the vector again and again, which is not what I want.
RESULT:
[0]: (pointer to buffer)
[1]: (pointer to buffer)
...
What I want is this:
[0]: (pointer to unique string)
[1]: (pointer to other unique string)
...
From what I know about this type of string, the pointer points to the beginning of an array of characters which ends in a null terminator.
So, the current code effectively results in the same string being copied to the buffer again and again. How do I fix this?
The simplest way is to use std:wstring, provided by the STL, as the type for your vector's elements. You can use the constructor that class provides to implicitly copy the contents of your wchar_t*-pointed buffer to the vector (in the push_back() call).
Here's a short demo:
#include <string>
#include <vector>
#include <iostream>
int main()
{
wchar_t test[][8] = { L"first", L"second", L"third", L"fourth" };
std::vector<std::wstring> vectorOfStrings;
wchar_t* bufferForStrings;
size_t i, length = 4;
for (i = 0; i < length; i++) {
// copy some string to the buffer...
bufferForStrings = test[i];
vectorOfStrings.push_back(bufferForStrings);
}
for (auto s : vectorOfStrings) {
std::wcout << s << std::endl;
}
return 0;
}
Further, if you later need access to the vector's elements as wchar_t* pointers, you can use each element's c_str() member function to retrieve such a pointer (though that will be const qualified).
There are other methods, if you want to avoid using the std::wstring class; for 'ordinary' char* buffers, you could use the strdup() function to create a copy of the current buffer, and send that to push_back(). Unfortunately, the equivalent wcsdup() function is not (yet) part of the standard library (though Microsoft and others have implemented it).
Below is a simplified example of my problem. I have some external byte data which appears to be a string with cp1252 encoded degree symbol 0xb0. When it is stored in my program as an std::string it is correctly represented as 0xffffffb0. However, when that string is then written to a file, the resulting file is only one byte long with just 0xb0. How do I write the string to the file? How does the concept of UTF-8 come into this?
#include <iostream>
#include <fstream>
typedef struct
{
char n[40];
} mystruct;
static void dump(const std::string& name)
{
std::cout << "It is '" << name << "'" << std::endl;
const char *p = name.data();
for (size_t i=0; i<name.size(); i++)
{
printf("0x%02x ", p[i]);
}
std::cout << std::endl;
}
int main()
{
const unsigned char raw_bytes[] = { 0xb0, 0x00};
mystruct foo;
foo = *(mystruct *)raw_bytes;
std::string name = std::string(foo.n);
dump(name);
std::ofstream my_out("/tmp/out.bin", std::ios::out | std::ios::binary);
my_out << name;
my_out.close();
return 0;
}
Running the above program produces the following on STDOUT
It is '�'
0xffffffb0
First of all, this is a must read:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Now, when you done with that, you have to understand what type represents p[i].
It is char, which in C is a small size integer value with a sign! char can be negative!
Now, since you have cp1252 characters, they are outside the scope of ASCII. This means these characters are seen as negative values!
Now, when they are converted to int, the sign bit is replicated, and when you are trying to print it, you will see 0xffffff<actual byte value>.
To handle that in C, first you should cast to unsigned char:
printf("0x%02x ", (unsigned char)p[i]);
then the default conversion will fill in the missing bits with zeros and printf() will give you a proper value.
Now, in C++ this is a bit more nasty, since char and unsigned char are treated by stream operators as a character representation. So to print them in hex manner, it should be like this:
int charToInt(char ch)
{
return static_cast<int>(static_cast<unsigned char>(ch));
}
std::cout << std::hex << charToInt(s[i]);
Now, direct conversion from char to unsigned int will not fix the problem since silently the compiler will perform a conversation to int first.
See here: https://wandbox.org/permlink/sRmh8hZd78Oar7nF
UTF-8 has nothing to this issue.
Off-topic: please, when you write pure C++ code, do not use C. It is pointless and makes code harder to maintain, and it is not faster. So:
do not use char* or char[] to store strings. Just use std::string.
do not use printf(), use std::cout (or the fmt library, if you like format strings - it will became a future C++ standard).
do not use alloc(), malloc(), free() - in modern C++, use std::make_unique() and std::make_shared().
I'm trying to store the address of a pointer as a string. In other words, I want to insert the content of the bytes that make up the address into a char vector.
What is the best way of doing this?
I need a fully portable method, including for 64 bit system.
To get an array (or vector, if you prefer that) of the actual bytes of the address, this should do the trick:
int foo = 10;
int* bar = &foo;
// Interpret pointer as array of bytes
unsigned char const* b = reinterpret_cast<unsigned char const*>(&bar);
// Copy that array into a std::array
std::array<unsigned char, sizeof(void*)> bytes;
std::copy(b, b + sizeof(void*), bytes.begin());
To get an array containing the hexadecimal representation split up into single characters (whatever sense that makes), I'd use a stringstream - as some of the others already suggested. You can also use snprintf to get a string representation of the address, but that's more the C-style way.
// Turn pointer into string
std::stringstream ss;
ss << bar;
std::string s = ss.str();
// Copy character-wise into a std::array (1 byte = 2 characters)
std::array<char, sizeof(void*) * 2> hex;
std::copy(s.begin(), s.end(), hex.begin());
The simplest way is to do
char buf[sizeof(void*) * 2 + 3];
snprintf(buf, sizeof(buf), "%p", /* the address here */ );
std::string serialized (std::to_string ((intptr_t) ptr));
C++ way to dos this would be to use string streams
#include <string>
#include <sstream>
int main()
{
MyType object;
std::stringstream ss;
std::string result;
ss << &object; // puts the formatted address of object into the stream
result = ss.str(); // gets the stream as a std::string
return 0;
}
void storeAddr(vector<string>& v,void *ptr)
{
stringstream s;
s << (void*)ptr ;
v.push_back(s.str());
}
Pay attention to base64_decode in http://www.adp-gmbh.ch/cpp/common/base64.html
std::string base64_decode(std::string const& encoded_string)
The function is suppose to return byte array to indicate binary data. However, the function is returning std::string. My guess is that, the author is trying to avoid from perform explicit dynamic memory allocation.
I try to verify the output is correct.
int main()
{
unsigned char data[3];
data[0] = 0; data[1] = 1; data[2] = 2;
std::string encoded_string = base64_encode(data, 3);
// AAEC
std::cout << encoded_string << std::endl;
std::string decoded_string = base64_decode(encoded_string);
for (int i = 0; i < decoded_string.length(); i++) {
// 0, 1, 2
std::cout << (int)decoded_string.data()[i] << ", ";
}
std::cout << std::endl;
getchar();
}
The decoded output is correct. Just want to confirm, is it valid to std::string to hold binary data, to avoid manual dynamic memory management.
std::string s;
s += (char)0;
// s.length() will return 1.
Yes, you can store any sequence of char in a std::string. That includes any binary data.
Yes. std::string can hold any char value ('\0' has no special meaning). However I wouldn't be surprised finding some C++ functions (e.g. from external libraries) having problems with strings with embedded NULs.
Anyway I don't understand what you are going to gain with an std::string instead of std::vector<unsigned char> that would make your intentions more clear and that offers more guarantees (e.g. that all the bytes are in contiguous not-shared memory so that you can pass &x[0] to someone expecting a plain buffer for direct access).
I don't think it's completely valid.
Care must be taken with string and binary data because it uses internally char type and char depends in the implementation if it is defined as unsigned or signed type. I prefer to use basic_string<unsigned char> and be sure of what i'm reading.
I dont think one should use std::string for byte-data-storage. The method provide aren't design to deal with byte-data and you will risk yourself since any changes (or "optimization") on std::string will break your code.
it is better to use std::vector or std::vector (where byte is typedef to uint8 ) to express nature of data. You will no longer have string specific functions available , which is what you want for binary data
You need an array of character( not string) to store the binary data. Best is use vector.
We are setting a char [] to some hex values i.e.
char [] test1 = {0x30,0x31,0x32,0x33,0x34,0x35};
Then we copy it into a string using
string teststring(test1, sizeof(test1));
Is the array suppose to be null terminated? or is the way we do the assignment, C++ is smart enough to know that its null terminated and have it appended anyhow?
Since you are using the sizeof operator and supplying the length of the array, you should not need to add the NULL.
You can find the API for the constructors here. It mentions this explicitly.
As mentioned in other solutions however, if you decided to create an array of wchar_ts then you would need to modify your supplied length argument to the constructor as follows:
sizeof(test1) / sizeof(wchar_t)
This is because the sizeof operator returns the size in bytes of the target, not the number of elements. For your current question using char does not have this requirement since the size of a char is defined by the C++ to be 1, thus division is not needed.
It is not null-terminated. Nulls have no special meaning for std::string.
You can do something like this freely:
#include <string>
#include <iomanip>
#include <iostream>
int main()
{
char test1 [] = {0x30, 0x31, 0, 0x33, 0, 0x35};
std::string teststring(test1, sizeof(test1));
for(size_t i = 0; i<teststring.size(); ++i)
std::cout << std::hex << std::showbase << (int)teststring[i] << ' ';
std::cout << '\n';
}
Although, of course, if I used std::string teststring(test1); as the constructor in this example, the resulting string would have been 2 characters long.
Using this string constructor you don't need to add an extra ending 0.
However, I'd add the ending 0 just in case, or so that this variable could be used in other contexts where pointers to chars are used. In this case, the sizeof should be modified:
string teststring(test1, sizeof(test1) - 1);