Converting (void*) to std::vector<unsigned char>

Converting (void*) to std::vector<unsigned char> - c++

I have a (void*) buffer that I need to convert to std::vector<unsigned char> before I can pass it on. Unfortunately, my C++ casting skills a little weak. Any suggestions?

You will need the length of the buffer. Once you do, we can do this:
unsigned char *charBuf = (unsigned char*)voidBuf;
/* create a vector by copying out the contents of charBuf */
std::vector<unsigned char> v(charBuf, charBuf + len);
Okay, the comment got me started on why I did not use reinterpret_cast:
In C++, the C-style cast is a convenience function -- it asks the compiler to choose the safest and most portable form of conversion over the set of available cast operators.
The reinterpret_cast is implementation defined and should always be the last thing on your mind (and used when you are necessarily doing a non-portable thing knowingly).
The conversion between (unsigned doesn't change the type) char * and void * is portable (you could actually use static_cast if you are really picky).
The problem with the C-style cast is: the added flexibility can cause heartaches when the pointer type changes.
Note: I agree with the general convention of not casting as much as possible. However, without any source provided, this is the best I could do.

You can't simply cast a void* to a std::vector<unsigned char> because the memory layout of the latter includes other objects, such as the size and the number of bytes currently allocated.
Assuming the buffer is pointed to by buf and its length is n:
vector<unsigned char> vuc(static_cast<char*>(buf), static_cast<char*>(buf) + n);
will create a copy of the buffer that you can safely use.
[EDIT: Added static_cast<char*>, which is needed for pointer arithmetic.]

The only time this would be legitimate is if you had already created a vector, and simply wanted to get it back.
void SomeFunc(void* input);
main() {
std::vector< unsigned char > v;
SomeFunc((void*) &v);
}
SomeFunc(void* input) {
// Now, you could cast that void* into a vector
std::vector< unsigned char >* v_ = (vector<unsigned char>*)input
}
I haven't actually tried to see if this will run, but that's the spirit of it. That said, if you are making this from scratch, you are definitely doing it wrong. This is really bad. The only time this could be even remotely understandable is if you are forced to implement the already defined "SomeFunc()".

using std::vector class for an already allocated buffer is not a solution. A std::vector object manages the memory and deallocates it at destruction time.
A complicated solution might be to write your own allocator, that uses an already allocated buffer, but you have to be very careful on several scenarios, like vector resizing, etc.
If you have that void* buffer bound through some C API functions, then you can forget about conversion to std::vector.
If you need only a copy of that buffer, it can be done like this:
std::vector< unsigned char> cpy(
(unsigned char*)buffer, (unsigned char*)buffer + bufferSize);
where bufferSize is the size in chars of the copied buffer.

Related

Dynamic memory on a function new char[size] vs char[size]

So I have this function that has a string with a pre-defined buffer (the buffer is defined when calling a function).
My question is, why doesn't the compiler throws me an error whenever I do the following (without the new operator?):
int crc32test(unsigned char *write_string, int buffer_size){
// Append CRC32 to string
int CRC_NBYTES = 4;
int new_buffer_size = buffer_size + CRC_NBYTES; // Current buffer size + CRC
// HERE (DECLARATION OF THE STRING)
unsigned char appendedcrc_string[new_buffer_size];
return 0;
}
isn't THIS the correct way to do it..?
int crc32test(unsigned char *write_string, int buffer_size){
// Append CRC32 to string
int CRC_NBYTES = 4;
int new_buffer_size = buffer_size + CRC_NBYTES; // Current buffer size + CRC
// HERE (DECLARATION OF THE STRING USING NEW)
unsigned char * appendedcrc_string = new unsigned char[new_buffer_size+1];
delete[] appendedcrc_string ;
return 0;
}
And I actually compiled both, and both worked. Why isn't the compiler throwing me any error?
And is there a reason to use the new operator if apparently the former function works too?

There's a few answers here already, and I'm going to repeat several things said already. The first form you use is not valid C++, but will work in certain versions of GCC and CLang... It is decidedly non-portable.
There are a few options that you have as alternatives:
Use std::string<unsigned char> for your input and s.append(reinterpret_cast<unsigned char*>(crc), 4);
Similarly, you can use std::vector<unsigned char>
If your need is just for a simple resizable buffer, you can use std::unique_ptr<unsigned char[]> and use memcpy & std::swap, etc to move the data into a resized buffer and then free the old buffer.
As a non-portable alternative for temporary buffer creation, the alloca() function carves out a buffer by twiddling the stack pointer. It doesn't play very well with C++ features but it can be used if extremely careful about ensuring that the function will never have an exception thrown from it.
Store the CRC with the buffer in a structure like
struct input {
std::unique_ptr<unsigned char[]> buffer;
uint32_t crc;
}
And deal with the concatenation of the CRC and buffer someplace else in your code (i.e. on output). This, I believe is the best method.

The first code is ill-formed, however some compilers default to a mode where non-standard extensions are accepted.
You should be able to specify compiler switches for standard conformance. For example, in gcc, -std=c++17 -pedantic.
The second code is "correct" although not the preferred way either, you should use a container which frees the memory when execution leaves the scope, instead of a manual delete. For example, std::vector<unsigned char> buf(new_buffer_size + 1);.

The first example uses a C99 feature called Variable Length Arrays (VLA), that e.g. g++ by default supports as a C++ language extension. It's non-standard code.
Instead of the second example and similar, you should preferably use std::vector.

Can I reinterpret std::vector<char> as a std::vector<unsigned char> without copying?

I have a reference to std::vector<char> that I want to use as a parameter to a function which accepts std::vector<unsigned char>. Can I do this without copying?
I have following function and it works; however I am not sure if a copy actually takes place - could someone help me understanding this? Is it possible to use std::move to avoid copy or is it already not being copied?
static void showDataBlock(bool usefold, bool usecolor,
std::vector<char> &chunkdata)
{
char* buf = chunkdata.data();
unsigned char* membuf = reinterpret_cast<unsigned char*>(buf);
std::vector<unsigned char> vec(membuf, membuf + chunkdata.size());
showDataBlock(usefold, usecolor, vec);
}
I was thinking that I could write:
std::vector<unsigned char> vec(std::move(membuf),
std::move(membuf) + chunkdata.size());
Is this overkill? What actually happens?

...is it possible to use std::move to avoid copy or is it already not
being copied
You cannot move between two unrelated containers. a std::vector<char> is not a std::vector<unsigned char>. And hence there is no legal way to "move ~ convert" the contents of one to another in O(1) time.
You can either copy:
void showData( std::vector<char>& data){
std::vector<unsigned char> udata(data.begin(), data.end());
for(auto& x : udata)
modify( x );
....
}
or cast it in realtime for each access...
inline unsigned char& as_uchar(char& ch){
return reinterpret_cast<unsigned char&>(ch);
}
void showDataBlock(std::vector<char>& data){
for(auto& x : data){
modify( as_uchar(x) );
}
}

If you have a v1 of type std::vector<T1> and need a v2 of type std::vector<T2> there is no way around copying the data, even if T1 and T2 are "similar" like char and unsigned char.
Use standard library:
std::vector<unsigned char> v2;
std::copy(v1.begin(), v1.end(), std::back_inserter(v2));
The only possible way around it is to somehow work with only one type: either obtain std::vector<T2> from the start if possible, or work with std::vector<T1> from now on (maybe add an overload that deals with it). Or create generic code (templates) that can deal with any [contigous] container.
I think reinterpret_cast and std::move should make it possible to
avoid copy
no, it can't
please elaborate - why not?
A vector can steal resources (move data) only from another vector of the same type. That's how it's interface was designed.
To do what you want you would need a release() method that would release the vector ownership of the underlying data and return it as a (unique) pointer and a move constructor/assignment that would acquire the underlying data from a (unique) pointer. (And even then you would still require an reinterpret_cast which is... danger zone)
std::vector has none of those. Maybe it should have. It just doesn't.

As others already pointed out, there is no way around the copy without changing showDataBlock.
I think you have two options:
Extend showDataBlock to work on both signed char and unsigned char (ie. make it a template) or
Don't take the container as argument but an iterator range instead. You could then (in case of value_type being char) use special iterators converting from signed char to unsigned char elementwisely.

I guess you coded another overloaded function :-
showDataBlock(usefold, usecolor, std::vector<unsigned char> & vec);
You try to convert from std::vector<T> to another std::vector<T2>.
There is no way to avoid the copying.
Each std::vector has its own storage, roughly speaking, it is a raw pointer.
The main point is : you can't share such raw pointer among multiple std::vector.
I think it is by design.
I think it is a good thing, otherwise it would waste CPU to keep track.
The code ...
std::move(membuf)
... move the raw pointer = actually do nothing. (same as passing as membuf)
To optimize, you should verify the reason : why you want to convert from std::vector<char> to std::vector<unsigned char> in the first place.
Is it a better idea if you create a new class C that can represent as both char and unsigned char? (e.g. C::getChar() and C::getUnsignedChar(), may be ... store only char but provide converter as its non-static function)
If it doesn't help, I suggest creating a new custom data-structure.
I often do that when it is needed.
However, in this case, I don't think it need any optimization.
It is OK for me, except it is a performance critical code.

while unsigned char and char are unrelated types. I think they're similar enough in this case (same size pods) to get away with a reinterpret_cast of the entire templated class.
static void showDataBlock(bool usefold, bool usecolor,
std::vector<char> &chunkdata)
{
showDataBlock(usefold, usecolor, reinterpret_cast< std::vector<unsigned char>&>(chunkdata));
}
However, I tend to find these problems are due to not designing the best architecture. Look at the bigger picture of what it is that this software is supposed to be doing to identify why you need to work wit both signed and unsigned char blocks of data.

I ended up doing something like this :
static void showDataBlock(bool usefold,bool usecolor, std::vector<char> chunkdata)
{
std::vector<unsigned char>&cache = reinterpret_cast<std::vector<unsigned char>&>(chunkdata);
showDataBlock(usefold, usecolor, cache);
}
static bool showDataBlock(bool usefold,bool usecolor, std::vector<unsigned char> &chunkdata)
{
// showing the data
}
This solution allowed me to pass vector as ref or as normal
it seems to be working - if its the best solution I do not know, however you all came with some really good suggestions - thank you all
I agree I can not avoid the copy, so I let the copy be done with normal parameter passing
Please if you find this solution wrong, then provide a better one in comment, not just downvote

Convert size_t to vector<unsigned char>

I want to convert size_t to vector of unsigned chars. This vector is defined as 4 bytes.
Could anybody suggest a suitable way to do that?

Once you've reconciled yourself to the fact that your std::vector is probably going to have to be bigger than that - it will need to have sizeof(size_t) elements - one well-defined way is to access the data buffer of such an appropriately sized vector and use ::memcpy:
size_t bar = 0; /*initialise this else the copy code behaviour is undefined*/
std::vector<uint8_t> foo(sizeof(bar)); /*space must be allocated at this point*/
::memcpy(foo.data(), &bar, sizeof(bar));
There is an overload of data() that returns a non-const pointer to the data buffer. I'm exploiting this here. Accessing the data buffer in this way is unusual but other tricks (using unions etc.) often lead to code whose behaviour is, in general, undefined.

By "convert", I'll assume you mean "copy", since vector will allocate and own its memory. You can't just give it a pointer and expect to use your own memory.
An efficient way to do so which avoids two-stage construction (that causes initialization of the array with zero) is to do this:
auto ptr = reinterpret_cast<uint8_t*>(&the_size);
vector<uint8_t> vec{ptr, ptr + sizeof(size_t)};
Note that sizeof(size_t) is not required to be 4. So you shouldn't write your code assuming that it is.

You could write a generic converter using std::bitset
template <typename T>
std::vector<unsigned char> Type_To_Bit_Vector(T type, char true_char, char false_char){
//convert type to bitset
std::bitset<sizeof(type)*8> bset(type);
//convert bitset to vector<unsigned char>
std::vector<char> vec;
for(int i = 0 ; i < bset.size() ; i++){
if (bset[i]){
vec.push_back(true_char);
}else{
vec.push_back(false_char);
}
}
return vec;
}
You could then get a desired vector representation like so:
auto vec = Type_To_Bit_Vector(size_t(123),'1','0');

Can I get a non-const C string back from a C++ string?

Const-correctness in C++ is still giving me headaches. In working with some old C code, I find myself needing to assign turn a C++ string object into a C string and assign it to a variable. However, the variable is a char * and c_str() returns a const char []. Is there a good way to get around this without having to roll my own function to do it?
edit: I am also trying to avoid calling new. I will gladly trade slightly more complicated code for less memory leaks.

C++17 and newer:
foo(s.data(), s.size());
C++11, C++14:
foo(&s[0], s.size());
However this needs a note of caution: The result of &s[0]/s.data()/s.c_str() is only guaranteed to be valid until any member function is invoked that might change the string. So you should not store the result of these operations anywhere. The safest is to be done with them at the end of the full expression, as my examples do.
Pre C++-11 answer:
Since for to me inexplicable reasons nobody answered this the way I do now, and since other questions are now being closed pointing to this one, I'll add this here, even though coming a year too late will mean that it hangs at the very bottom of the pile...
With C++03, std::string isn't guaranteed to store its characters in a contiguous piece of memory, and the result of c_str() doesn't need to point to the string's internal buffer, so the only way guaranteed to work is this:
std::vector<char> buffer(s.begin(), s.end());
foo(&buffer[0], buffer.size());
s.assign(buffer.begin(), buffer.end());
This is no longer true in C++11.

There is an important distinction you need to make here: is the char* to which you wish to assign this "morally constant"? That is, is casting away const-ness just a technicality, and you really will still treat the string as a const? In that case, you can use a cast - either C-style or a C++-style const_cast. As long as you (and anyone else who ever maintains this code) have the discipline to treat that char* as a const char*, you'll be fine, but the compiler will no longer be watching your back, so if you ever treat it as a non-const you may be modifying a buffer that something else in your code relies upon.
If your char* is going to be treated as non-const, and you intend to modify what it points to, you must copy the returned string, not cast away its const-ness.

I guess there is always strcpy.
Or use char* strings in the parts of your C++ code that must interface with the old stuff.
Or refactor the existing code to compile with the C++ compiler and then to use std:string.

There's always const_cast...
std::string s("hello world");
char *p = const_cast<char *>(s.c_str());
Of course, that's basically subverting the type system, but sometimes it's necessary when integrating with older code.

If you can afford extra allocation, instead of a recommended strcpy I would consider using std::vector<char> like this:
// suppose you have your string:
std::string some_string("hello world");
// you can make a vector from it like this:
std::vector<char> some_buffer(some_string.begin(), some_string.end());
// suppose your C function is declared like this:
// some_c_function(char *buffer);
// you can just pass this vector to it like this:
some_c_function(&some_buffer[0]);
// if that function wants a buffer size as well,
// just give it some_buffer.size()
To me this is a bit more of a C++ way than strcpy. Take a look at Meyers' Effective STL Item 16 for a much nicer explanation than I could ever provide.

You can use the copy method:
len = myStr.copy(cStr, myStr.length());
cStr[len] = '\0';
Where myStr is your C++ string and cStr a char * with at least myStr.length()+1 size. Also, len is of type size_t and is needed, because copy doesn't null-terminate cStr.

Just use const_cast<char*>(str.data())
Do not feel bad or weird about it, it's perfectly good style to do this.
It's guaranteed to work in C++11. The fact that it's const qualified at all is arguably a mistake by the original standard before it; in C++03 it was possible to implement string as a discontinuous list of memory, but no one ever did it. There is not a compiler on earth that implements string as anything other than a contiguous block of memory, so feel free to treat it as such with complete confidence.

If you know that the std::string is not going to change, a C-style cast will work.
std::string s("hello");
char *p = (char *)s.c_str();
Of course, p is pointing to some buffer managed by the std::string. If the std::string goes out of scope or the buffer is changed (i.e., written to), p will probably be invalid.
The safest thing to do would be to copy the string if refactoring the code is out of the question.

std::string vString;
vString.resize(256); // allocate some space, up to you
char* vStringPtr(&vString.front());
// assign the value to the string (by using a function that copies the value).
// don't exceed vString.size() here!
// now make sure you erase the extra capacity after the first encountered \0.
vString.erase(std::find(vString.begin(), vString.end(), 0), vString.end());
// and here you have the C++ string with the proper value and bounds.
This is how you turn a C++ string to a C string. But make sure you know what you're doing, as it's really easy to step out of bounds using raw string functions. There are moments when this is necessary.

If c_str() is returning to you a copy of the string object internal buffer, you can just use const_cast<>.
However, if c_str() is giving you direct access tot he string object internal buffer, make an explicit copy, instead of removing the const.

Since c_str() gives you direct const access to the data structure, you probably shouldn't cast it. The simplest way to do it without having to preallocate a buffer is to just use strdup.
char* tmpptr;
tmpptr = strdup(myStringVar.c_str();
oldfunction(tmpptr);
free tmpptr;
It's quick, easy, and correct.

In CPP, if you want a char * from a string.c_str()
(to give it for example to a function that only takes a char *),
you can cast it to char * directly to lose the const from .c_str()
Example:
launchGame((char *) string.c_str());

C++17 adds a char* string::data() noexcept overload. So if your string object isn't const, the pointer returned by data() isn't either and you can use that.

Is it really that difficult to do yourself?
#include <string>
#include <cstring>
char *convert(std::string str)
{
size_t len = str.length();
char *buf = new char[len + 1];
memcpy(buf, str.data(), len);
buf[len] = '\0';
return buf;
}
char *convert(std::string str, char *buf, size_t len)
{
memcpy(buf, str.data(), len - 1);
buf[len - 1] = '\0';
return buf;
}
// A crazy template solution to avoid passing in the array length
// but loses the ability to pass in a dynamically allocated buffer
template <size_t len>
char *convert(std::string str, char (&buf)[len])
{
memcpy(buf, str.data(), len - 1);
buf[len - 1] = '\0';
return buf;
}
Usage:
std::string str = "Hello";
// Use buffer we've allocated
char buf[10];
convert(str, buf);
// Use buffer allocated for us
char *buf = convert(str);
delete [] buf;
// Use dynamic buffer of known length
buf = new char[10];
convert(str, buf, 10);
delete [] buf;

Setting boost dynamic_bitset from a string

Dynamic bitset
I have a use case where i need to populate
boost::dynamic_bitset<unsigned char> , from a std::string buffer.
Can you suggest as to how to go about this. So I need to come up with a function
void populateBitSet (std::string &buffer,
boost::dynamic_bitset<unsigned char> & bitMap) {
//populate bitMap from a string buffer
}

If you have binary data like this:
string buffer = "0101001111011";
You want to initialize it like this (turns out there's a constructor that handles this case):
void populateBitSet (std::string &buffer, boost::dynamic_bitset<unsigned char> & bitMap)
{
bitMap = boost::dynamic_bitset<unsigned char> (buffer);
}
If you want the raw data, use the iterator constructor:
void populateBitSet (std::string &buffer, boost::dynamic_bitset<unsigned char> & bitMap)
{
bitMap = boost::dynamic_bitset<unsigned char> (buffer.begin(), buffer.end());
}
These do end up allocating the needed memory twice, so you might be better off with a stack allocation and a swap. Or you just can wait until C++0x and let the move semantics do their thing.
// Unecessary in C++0x
void populateBitSet (std::string &buffer, boost::dynamic_bitset<unsigned char> & bitMap)
{
boost::dynamic_bitset<unsigned char> localBitmap(buffer.begin(), buffer.end());
bitMap.swap(localBitmap);
}
Edit:
To clarify why the first versions allocate twice as much memory:
Take a look at another way to write the first version:
typedef boost::dynamic_bitset<unsigned char> bits; // just to shorten the examples.
void populateBitSet (std::string &buffer, bits &bitMap)
{
const bits &temp = bits(buffer); // 1. initialize temporary
bitMap = temp; // 2. Copy over data from temp to bitMap
}
If you put these two lines together, as in the first example, you still get a temporary constructed on the stack, followed by an assignment. In 1. boost needs to allocate enough memory for the entire set of bits. In 2, boost needs to allocate again enough memory to hold that same set of bit and then copy the values over. It's possible that bitMap already has enough memory, so it may not always need to reallocate, but it's also possible that it will free its backing memory and reallocate from scratch anyway.
Most containers that fit the stl mold also have a swap function that you can use in place of assignment when you intend to throw away one side of the swap. These are usually O(1) and non-throwing as they often just involve swapping some pointers. See this GotW for another reason why these are useful.
In C++0X, you'll be able to use assignment, and still get the advantages of swap. Since you can overload on r-values (like the temporary), the container know that when you assign a temporary, it knows that it can cannibalize the temp and basically do a swap. The Visual Studio Team Blog has covered rvalues and move semantics quite well here.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Converting (void*) to std::vector<unsigned char> - c++

I have a (void*) buffer that I need to convert to std::vector<unsigned char> before I can pass it on. Unfortunately, my C++ casting skills a little weak. Any suggestions?

Related

Dynamic memory on a function new char[size] vs char[size]

Can I reinterpret std::vector<char> as a std::vector<unsigned char> without copying?

Convert size_t to vector<unsigned char>

Can I get a non-const C string back from a C++ string?

Setting boost dynamic_bitset from a string

Categories

Resources