const unsigned char * to std::string - c++

sqlite3_column_text returns a const unsigned char*, how do I convert this to a std::string? I've tried std::string(), but I get an error.
Code:
temp_doc.uuid = std::string(sqlite3_column_text(this->stmts.read_documents, 0));
Error:
1>.\storage_manager.cpp(109) : error C2440: '<function-style-cast>' : cannot convert from 'const unsigned char *' to 'std::string'
1> No constructor could take the source type, or constructor overload resolution was ambiguous

You could try:
temp_doc.uuid = std::string(reinterpret_cast<const char*>(
sqlite3_column_text(this->stmts.read_documents, 0)
));
While std::string could have a constructor that takes const unsigned char*, apparently it does not.
Why not, then? You could have a look at this somewhat related question: Why do C++ streams use char instead of unsigned char?

On the off-chance you actually want a string of unsigned characters, you could create your own type:
typedef std::basic_string <unsigned char> ustring;
You should then be able to say things like:
ustring s = sqlite3_column_text(this->stmts.read_documents, 0);

The reason people typically use an (unsigned char *) type is to indicate that the data is binary and not plain ASCII text. I know libxml does this, and from the looks of it, sqlite is doing the same thing.
The data you're getting back from the sqlite call is probably UTF-8 encoded Unicode text. While a reinterpret_cast may appear to work, if someone ever stores text in the field that is not plain ASCII, your program probably won't be well-behaved.
The std::string class isn't designed with Unicode in mind, so if you ask for the length() of a string, you'll get the number of bytes, which, in UTF-8, is not necessarily the same thing as the number of characters.
Short answer: the simple cast may work, if you're certain the data is just ASCII. If it can be any UTF-8 data, then you need to handle encoding/decoding in a smarter way.

I'm not familiar with sqlite3_column_text, but one thing you may want to do is when you call the std:string constructor, you'll want to cast to (const char*). I believe that it should have a constructor for that type.
However, it is odd that this sqlite function is return an unsigned char*, is it returning a Pascal string (first char is the length of the string)? If so, then you'll have to create the std::string with the bytes and the length.

if temp_doc.uuid is a std::string try :
temp_doc.uuid = static_cast<const char*>(sqlite3_column_text(this->stmts.read_documents, 0));

try:
temp_doc.uuid = std::string(reinterpret_cast<const char*>(sqlite3_column_text(this->stmts.read_documents, 0)));

You can't construct a std::string from const unsigned char* -- you have to cast it to const char* first:
temp_doc.uuid = std::string( reinterpret_cast< const char* >(
sqlite3_column_text(this->stmts.read_documents, 0) ) );

I'm no expert but this example here seems much simpler:
string name = (const char*) (sqlite3_column_text(res, 0));

An old but important question, if you have to preserve the full information in the unsigned char sequence. In my opinion that is with reinterpret_cast not the case. I found an interesting solution under converting string to vector
which I modified to
basic_string<unsigned char> temp = sqlite3_column_text(stmt, 0);
string firstItem( temp.begin(), temp.end() );
Since I am programming for gtkmm, you can realize the conversion into a Glib::ustring with
basic_string<unsigned char> temp = sqlite3_column_text(stmt, 0);
Glib::ustring firstItem = string( temp.begin(), temp.end() );

Related

working with binary data and unsigned char

I'm writing a program that reads a content of a binary file (specificly Windows PE file. Wikipedia page and detailed PE structure).
Because of the binary data in the file, the characters often "fall out" of the ascii range (0-127) and that result in negative values.
To make sure I won't work with unwanted negative values, I can either pass const unsigned char * or convert the resulting char in the calculation to unsigned char.
On one hand, passing const unsigned char * makes sense because the data is non-ascii that has a numaric value and thus should be treated as positive.
In addition, it'll let me perform calculations without the need to cast the result to unsigned char.
On the other hand, I can't pass constant strings (const char *, such as pre-defined strings "MZ", "PE\0\0" etc.) to functions without first casting them to const unsigned char *.
What would be the better approach or best-practice in this scenario?
I think I'd use unsigned char, but avoid casting, and instead define a little class named ustring (or something similar). You have a couple of choices with that. One would be to instantiate std::basic_string over unsigned char. This can be useful (it gives you all of std::string's functionality, but with unsigned chars instead of chars. The obvious disadvantage is that it's probably overkill, and has essentially no compatibility with std::string, even though it's almost exactly the same thing.
The other obvious possibility would be to define your own class. Since you apparently care mostly about string literals, I'd probably go this way. The class would be initalized with a string literal, and it would just hold a pointer to the string, but as unsigned char * instead of just char *.
Then there's one more step to make life better: define a user defined literal operator named something like _us, so creating an object of your type from a string literal will look something like this: auto DOS_sig = "MZ"_us;
class ustring {
unsigned char const *data;
unsigned long long len;
public:
ustring(unsigned char const *s, unsigned long long len)
: data(s)
, len(len)
{}
operator char const *() const { return data; }
bool operator==(ustring const &other) const {
// note: memcmp treats what you pass it as unsigned chars.
return len == other.len && 0 == memcmp(data, other.data, len);
}
// you probably want to add more stuff here.
};
ustring operator"" _us(char const * const s, unsigned long long len) {
return ustring((unsigned char const *)s, len);
}
If I'm not mistaken, this should be pretty easy to work with. For example, let's assume you've memory mapped what you think is a PE file, with its base address at mapped_file. To see if it has a DOS signature, you might do something like this:
if (ustring(&mapped_file[0], 2) == "MZ"_us)
std::cerr << "File appears to be an executable.\n";
else
std::cerr << "file does not appear to be an executable.\n";
Caution: I haven't tested this, so fencepost errors and such are likely--for example, I don't remember whether the length passed to the user defined literal operator includes the NUL terminator or not. This isn't intended to represent finished code, just a sketch of a general direction that might be useful to explore.

convert unsigned char* to std::string

I am little poor in typecasting. I have a string in xmlChar* (which is unsigned char*), I want to convert this unsigned char to a std::string type.
xmlChar* name = "Some data";
I tried my best to typecast , but I couldn't find a way to convert it.
std::string sName(reinterpret_cast<char*>(name));
reinterpret_cast<char*>(name) casts from unsigned char* to char* in an unsafe way but that's the one which should be used here. Then you call the ordinary constructor of std::string.
You could also do it C-style (not recommended):
std::string sName((char*) name);
I think the accepted solution is a little bit risky and not that good to be honest. I think the better solution is using std::to_string:
unsinged char char1{192};
auto result = std::to_string(char1)
now char1 equals to std::string("192")

How do I convert the contents of an unsigned char * to a const char *?

I can across reinterpret casts, and most of the time it was brought up, a warning was given, so I am wondering if there are other alternatives (or clean implementations of reinterpret cast of course)
You don't say what warning was given or what the problem was, but casting to char* with reinterpret_cast should work without warnings:
unsigned char *a;
const char *b = reinterpret_cast<char*>(a);
It depends on what you're trying to do.
If you just want to access the contents as char, then a simple
static_cast or using the value in a context where a char is expected
will do the trick.
If you need to pass the buffer to a function expecting a char const*,
a reinterpret_cast is about the only solution.
If you want a string, using the pointers into the buffer will be fine:
std::string
bufferToString( unsigned char const* buffer, size_t length )
{
return std::string( buffer, buffer + length );
}
or you can copy into an existing string:
myString.assign( buffer, buffer + length );
myString.append( buffer, buffer + length );
// etc.
Any string function (or algorithm, like std::copy) which takes two
iterators can be used. All that is required is that dereferencing the
iterator result in a type which converts implicitly to char, which is
the case of unsigned char.
(You cannot use the string functions which take a buffer address and a
length, as these are not templates, and require the buffer address to
have type char const*. And while unsigned char converts implicitly
to char, unsigned char* requires a reinterpret_cast to convert it
to char*.)

const char * to vector<unsigned char> Initalisation

I understand that using vector is a good way to store binary data when using C++ and the STL. However for my unit tests I'd like to initalise the vector using a const char* C string variable.
I'm attempting to use a variant of the code found here - Converting (void*) to std::vector<unsigned char> - to do this:
const char* testdata = "the quick brown fox jumps over the lazy dog.";
unsigned char* buffer = (unsigned char*)testdata;
typedef vector<unsigned char> bufferType;
bufferType::size_type size = strlen((const char*)buffer);
bufferType vec(buffer, size);
However the VC++ compiler is not happy with the line initialising the vector, stating:
error C2664: 'std::vector<_Ty>::vector(unsigned int,const _Ty &)' : cannot convert parameter 1 from 'char *' to 'unsigned int'
I appreciate the extreme n00bity of this question and am fully prepared for much criticism on the code above :)
Thanks in advance,
Chris
It should be
bufferType vec(buffer, buffer + size);
not
bufferType vec(buffer, size);
std::transform is useful for just this sort of problem. You can use it to "transform" one piece of data at a time. See documentation here:
http://www.cplusplus.com/reference/algorithm/transform/
The following code works in VS2010. (I created a std::string from your const char* array, but you could probably avoid that if you really wanted to.)
#include <algorithm>
#include <vector>
int main(int, char*[])
{
// Initial test data
const char* testdata = "the quick brown fox jumps over the lazy dog.";
// Transform from 'const char*' to 'vector<unsigned char>'
std::string input(testdata);
std::vector<unsigned char> output(input.length());
std::transform(input.begin(), input.end(), output.begin(),
[](char c)
{
return static_cast<unsigned char>(c);
});
// Use the transformed data in 'output'...
return 0;
}
Here is what worked for me:
// Fetch data into vector
std::vector<char> buffer = <myMethod>.getdata();
// Get a char pointer to the data in the vector
char* buf = buffer.data();
// cast from char pointer to unsigned char pointer
unsigned char* membuf = reinterpret_cast<unsigned char*>(buf);
// now convert to vector<unsigned char> buffer
std::vector<unsigned char> vec(membuf, membuf + buffer.size());
// display vector<unsigned char>
CUtils::<myMethodToShowDataBlock>(vec);
What you intended to do seems to be something like:
buffertype vec(testdata, next(testdata, strlen(testdata)));
There is no need for the intermediate buffer variable. The conversion from char to unsigned char will happen implicitly.
Note that this does not grab the terminating '\0' character from testdata. So if you wanted to be able to do something like: cout << vec.data() you wouldn't be able to. If you want that you could do: buffertype vec(testdata, next(testdata, strlen(testdata) + 1)) or you may just want to consider doing:
basic_string<unsigned char> vec(testdata, next(testdata, strlen(testdata)));
Which will preserve a hidden '\0'. Because this is not a string you won't be able to do, cout << vec but cout << vec.data() will work. I've created a Live Example of each of these.

Best way to create a string buffer for binary data

When I try the following, I get an error:
unsigned char * data = "00000000"; //error: cannot convert const char to unsigned char
Is there a special way to do this which I'm missing?
Update
For the sake of brevity, I'll explain what I'm trying to achieve:
I'd like to create a StringBuffer in C++ which uses unsigned values for raw binary data. It seems that an unsigned char is the best way to accomplish this. If there is a better method?
std::vector<unsigned char> data(8, '0');
Or, if the data is not uniform:
auto & arr = "abcdefg";
std::vector<unsigned char> data(arr, arr + sizeof(arr) - 1);
Or, so you can assign directly from a literal:
std::basic_string<unsigned char> data = (const unsigned char *)"abcdefg";
Yes, do this:
const char *data = "00000000";
A string literal is an array of char, not unsigned char.
If you need to pass this to a function that takes const unsigned char *, well, you'll need to cast it:
foo(static_cast<const unsigned char *>(data));
You have many ways. One is to write:
const unsigned char *data = (const unsigned char *)"00000000";
Another, which is more recommended is to declare data as it should be:
const char *data = "00000000";
And when you pass it to your function:
myFunc((const unsigned char *)data);
Note that, in general a string of unsigned char is unusual. An array of unsigned chars is more common, but you wouldn't initialize it with a string ("00000000")
Response to your update
If you want raw binary data, first let me tell you that instead of unsigned char, you are better off using bigger containers, such as long int or long long. This is because when you perform operations on the binary literal (which is an array), your operations are cut by 4 or 8, which is a speed boost.
Second, if you want your class to represent binary values, don't initialize it with a string, but with individual values. In your case would be:
unsigned char data[] = {0x30, 0x30, 0x30, 0x30, /* etc */}
Note that I assume you are storing binary as binary! That is, you get 8 bits in an unsigned char. If you, on the other hand, mean binary as in string of 0s and 1s, which is not really a good idea, but either way, you don't really need unsigned char and just char is sufficient.
unsigned char data[] = "00000000";
This will copy "00000000" into an unsigned char[] buffer, which also means that the buffer won't be read-only like a string literal.
The reason why the way you're doing it won't work is because your pointing data to a (signed) string literal (char[]), so data has to be of type char*. You can't do that without explicitly casting "00000000", such as: (unsigned char*)"00000000".
Note that string literals aren't explicitly of type constchar[], however if you don't treat them as such and try and modify them, you will cause undefined behaviour - a lot of the times being an access violation error.
You're trying to assign string value to pointer to unsigned char. You cannot do that. If you have pointer, you can assign only memory address or NULL to that.
Use const char instead.
Your target variable is a pointer to an unsigned char. "00000000" is a string literal. It's type is const char[9]. You have two type mismatches here. One is that unsigned char and char are different types. The lack of a const qualifier is also a big problem.
You can do this:
unsigned char * data = (unsigned char *)"00000000";
But this is something you should not do. Ever. Casting away the constness of a string literal will get you in big trouble some day.
The following is a little better, but strictly speaking it is still unspecified behavior (maybe undefined behavior; I don't want to chase down which it is in the standard):
const unsigned char * data = (const unsigned char *)"00000000";
Here you are preserving the constness but you are changing the pointer type from char* to unsigned char*.
#Holland -
unsigned char * data = "00000000";
One very important point I'm not sure we're making clear: the string "00000000\0" (9 bytes, including delimiter) might be in READ-ONLY MEMORY (depending on your platform).
In other words, if you defined your variable ("data") this way, and you passed it to a function that might try to CHANGE "data" ... then you could get an ACCESS VIOLATION.
The solution is:
1) declare as "const char *" (as the others have already said)
... and ...
2) TREAT it as "const char *" (do NOT modify its contents, or pass it to a function that might modify its contents).