Dynamic bitset
I have a use case where i need to populate
boost::dynamic_bitset<unsigned char> , from a std::string buffer.
Can you suggest as to how to go about this. So I need to come up with a function
void populateBitSet (std::string &buffer,
boost::dynamic_bitset<unsigned char> & bitMap) {
//populate bitMap from a string buffer
}
If you have binary data like this:
string buffer = "0101001111011";
You want to initialize it like this (turns out there's a constructor that handles this case):
void populateBitSet (std::string &buffer, boost::dynamic_bitset<unsigned char> & bitMap)
{
bitMap = boost::dynamic_bitset<unsigned char> (buffer);
}
If you want the raw data, use the iterator constructor:
void populateBitSet (std::string &buffer, boost::dynamic_bitset<unsigned char> & bitMap)
{
bitMap = boost::dynamic_bitset<unsigned char> (buffer.begin(), buffer.end());
}
These do end up allocating the needed memory twice, so you might be better off with a stack allocation and a swap. Or you just can wait until C++0x and let the move semantics do their thing.
// Unecessary in C++0x
void populateBitSet (std::string &buffer, boost::dynamic_bitset<unsigned char> & bitMap)
{
boost::dynamic_bitset<unsigned char> localBitmap(buffer.begin(), buffer.end());
bitMap.swap(localBitmap);
}
Edit:
To clarify why the first versions allocate twice as much memory:
Take a look at another way to write the first version:
typedef boost::dynamic_bitset<unsigned char> bits; // just to shorten the examples.
void populateBitSet (std::string &buffer, bits &bitMap)
{
const bits &temp = bits(buffer); // 1. initialize temporary
bitMap = temp; // 2. Copy over data from temp to bitMap
}
If you put these two lines together, as in the first example, you still get a temporary constructed on the stack, followed by an assignment. In 1. boost needs to allocate enough memory for the entire set of bits. In 2, boost needs to allocate again enough memory to hold that same set of bit and then copy the values over. It's possible that bitMap already has enough memory, so it may not always need to reallocate, but it's also possible that it will free its backing memory and reallocate from scratch anyway.
Most containers that fit the stl mold also have a swap function that you can use in place of assignment when you intend to throw away one side of the swap. These are usually O(1) and non-throwing as they often just involve swapping some pointers. See this GotW for another reason why these are useful.
In C++0X, you'll be able to use assignment, and still get the advantages of swap. Since you can overload on r-values (like the temporary), the container know that when you assign a temporary, it knows that it can cannibalize the temp and basically do a swap. The Visual Studio Team Blog has covered rvalues and move semantics quite well here.
Related
I have an std::vector. I want to copy the contents of the vector into a char* buffer of a certain size.
Is there a safe way to do this?
Can I do this?
memcpy(buffer, _v.begin(), buffer_size);
or this?
std::copy(_v.begin(), _v.end(), buffer); // throws a warning (unsafe)
or this?
for (int i = 0; i < _v.size(); i++)
{
*buffer = _v[i];
buffer++;
}
Thanks..
std::copy(_v.begin(), _v.end(), buffer);
This is preferred way to do this in C++. It is safe to copy this way if buffer is large enough.
If you just need char*, then you can do this:
char *buffer=&v[0];//v is guaranteed to be a contiguous block of memory.
//use buffer
Note changing data pointed to by buffer changes the vector's content also!
Or if you need a copy, then allocate a memory of size equal to v.size() bytes, and use std::copy:
char *buffer = new char[v.size()];
std::copy(v.begin(), v.end(), buffer);
Dont forget to delete []buffer; after you're done, else you'll leak memory.
But then why would you invite such a problem which requires you to manage the memory yourself.. especially when you can do better, such as:
auto copy = v; // that's simpler way to make copies!!
// and then use copy as new buffer.
// no need to manually delete anything. :-)
Hope that helps.
The safest way to copy a vector<char> into a char * buffer is to copy it to another vector, and then use that vector's internal buffer:
std::vector<char> copy = _v;
char * buffer = ©[0];
Of course, you can also access _vs buffer if you don't actually need to copy the data. Also, beware that the pointer will be invalidated if the vector is resized.
If you need to copy it into a particular buffer, then you'll need to know that the buffer is large enough before copying; there are no bounds checks on arrays. Once you've checked the size, your second method is best. (The first only works if vector::iterator is a pointer, which isn't guaranteed; although you could change the second argument to &_v[0] to make it work. The third does the same thing, but is more complicated, and probably should be fixed so it doesn't modify buffer).
Well, you want to assign to *buffer for case 3, but that should work. The first one almost certainly won't work.
EDIT: I stand corrected regarding #2.
static std::vector<unsigned char> read_binary_file (const std::string filename)
{
// binary mode is only for switching off newline translation
std::ifstream file(filename, std::ios::binary);
file.unsetf(std::ios::skipws);
std::streampos file_size;
file.seekg(0, std::ios::end);
file_size = file.tellg();
file.seekg(0, std::ios::beg);
std::vector<unsigned char> vec(file_size);
vec.insert(vec.begin(),
std::istream_iterator<unsigned char>(file),
std::istream_iterator<unsigned char>());
return (vec);
}
and then:
auto vec = read_binary_file(filename);
auto src = (char*) new char[vec.size()];
std::copy(vec.begin(), vec.end(), src);
but remember to delete []src later
I have a reference to std::vector<char> that I want to use as a parameter to a function which accepts std::vector<unsigned char>. Can I do this without copying?
I have following function and it works; however I am not sure if a copy actually takes place - could someone help me understanding this? Is it possible to use std::move to avoid copy or is it already not being copied?
static void showDataBlock(bool usefold, bool usecolor,
std::vector<char> &chunkdata)
{
char* buf = chunkdata.data();
unsigned char* membuf = reinterpret_cast<unsigned char*>(buf);
std::vector<unsigned char> vec(membuf, membuf + chunkdata.size());
showDataBlock(usefold, usecolor, vec);
}
I was thinking that I could write:
std::vector<unsigned char> vec(std::move(membuf),
std::move(membuf) + chunkdata.size());
Is this overkill? What actually happens?
...is it possible to use std::move to avoid copy or is it already not
being copied
You cannot move between two unrelated containers. a std::vector<char> is not a std::vector<unsigned char>. And hence there is no legal way to "move ~ convert" the contents of one to another in O(1) time.
You can either copy:
void showData( std::vector<char>& data){
std::vector<unsigned char> udata(data.begin(), data.end());
for(auto& x : udata)
modify( x );
....
}
or cast it in realtime for each access...
inline unsigned char& as_uchar(char& ch){
return reinterpret_cast<unsigned char&>(ch);
}
void showDataBlock(std::vector<char>& data){
for(auto& x : data){
modify( as_uchar(x) );
}
}
If you have a v1 of type std::vector<T1> and need a v2 of type std::vector<T2> there is no way around copying the data, even if T1 and T2 are "similar" like char and unsigned char.
Use standard library:
std::vector<unsigned char> v2;
std::copy(v1.begin(), v1.end(), std::back_inserter(v2));
The only possible way around it is to somehow work with only one type: either obtain std::vector<T2> from the start if possible, or work with std::vector<T1> from now on (maybe add an overload that deals with it). Or create generic code (templates) that can deal with any [contigous] container.
I think reinterpret_cast and std::move should make it possible to
avoid copy
no, it can't
please elaborate - why not?
A vector can steal resources (move data) only from another vector of the same type. That's how it's interface was designed.
To do what you want you would need a release() method that would release the vector ownership of the underlying data and return it as a (unique) pointer and a move constructor/assignment that would acquire the underlying data from a (unique) pointer. (And even then you would still require an reinterpret_cast which is... danger zone)
std::vector has none of those. Maybe it should have. It just doesn't.
As others already pointed out, there is no way around the copy without changing showDataBlock.
I think you have two options:
Extend showDataBlock to work on both signed char and unsigned char (ie. make it a template) or
Don't take the container as argument but an iterator range instead. You could then (in case of value_type being char) use special iterators converting from signed char to unsigned char elementwisely.
I guess you coded another overloaded function :-
showDataBlock(usefold, usecolor, std::vector<unsigned char> & vec);
You try to convert from std::vector<T> to another std::vector<T2>.
There is no way to avoid the copying.
Each std::vector has its own storage, roughly speaking, it is a raw pointer.
The main point is : you can't share such raw pointer among multiple std::vector.
I think it is by design.
I think it is a good thing, otherwise it would waste CPU to keep track.
The code ...
std::move(membuf)
... move the raw pointer = actually do nothing. (same as passing as membuf)
To optimize, you should verify the reason : why you want to convert from std::vector<char> to std::vector<unsigned char> in the first place.
Is it a better idea if you create a new class C that can represent as both char and unsigned char? (e.g. C::getChar() and C::getUnsignedChar(), may be ... store only char but provide converter as its non-static function)
If it doesn't help, I suggest creating a new custom data-structure.
I often do that when it is needed.
However, in this case, I don't think it need any optimization.
It is OK for me, except it is a performance critical code.
while unsigned char and char are unrelated types. I think they're similar enough in this case (same size pods) to get away with a reinterpret_cast of the entire templated class.
static void showDataBlock(bool usefold, bool usecolor,
std::vector<char> &chunkdata)
{
showDataBlock(usefold, usecolor, reinterpret_cast< std::vector<unsigned char>&>(chunkdata));
}
However, I tend to find these problems are due to not designing the best architecture. Look at the bigger picture of what it is that this software is supposed to be doing to identify why you need to work wit both signed and unsigned char blocks of data.
I ended up doing something like this :
static void showDataBlock(bool usefold,bool usecolor, std::vector<char> chunkdata)
{
std::vector<unsigned char>&cache = reinterpret_cast<std::vector<unsigned char>&>(chunkdata);
showDataBlock(usefold, usecolor, cache);
}
static bool showDataBlock(bool usefold,bool usecolor, std::vector<unsigned char> &chunkdata)
{
// showing the data
}
This solution allowed me to pass vector as ref or as normal
it seems to be working - if its the best solution I do not know, however you all came with some really good suggestions - thank you all
I agree I can not avoid the copy, so I let the copy be done with normal parameter passing
Please if you find this solution wrong, then provide a better one in comment, not just downvote
I want to convert size_t to vector of unsigned chars. This vector is defined as 4 bytes.
Could anybody suggest a suitable way to do that?
Once you've reconciled yourself to the fact that your std::vector is probably going to have to be bigger than that - it will need to have sizeof(size_t) elements - one well-defined way is to access the data buffer of such an appropriately sized vector and use ::memcpy:
size_t bar = 0; /*initialise this else the copy code behaviour is undefined*/
std::vector<uint8_t> foo(sizeof(bar)); /*space must be allocated at this point*/
::memcpy(foo.data(), &bar, sizeof(bar));
There is an overload of data() that returns a non-const pointer to the data buffer. I'm exploiting this here. Accessing the data buffer in this way is unusual but other tricks (using unions etc.) often lead to code whose behaviour is, in general, undefined.
By "convert", I'll assume you mean "copy", since vector will allocate and own its memory. You can't just give it a pointer and expect to use your own memory.
An efficient way to do so which avoids two-stage construction (that causes initialization of the array with zero) is to do this:
auto ptr = reinterpret_cast<uint8_t*>(&the_size);
vector<uint8_t> vec{ptr, ptr + sizeof(size_t)};
Note that sizeof(size_t) is not required to be 4. So you shouldn't write your code assuming that it is.
You could write a generic converter using std::bitset
template <typename T>
std::vector<unsigned char> Type_To_Bit_Vector(T type, char true_char, char false_char){
//convert type to bitset
std::bitset<sizeof(type)*8> bset(type);
//convert bitset to vector<unsigned char>
std::vector<char> vec;
for(int i = 0 ; i < bset.size() ; i++){
if (bset[i]){
vec.push_back(true_char);
}else{
vec.push_back(false_char);
}
}
return vec;
}
You could then get a desired vector representation like so:
auto vec = Type_To_Bit_Vector(size_t(123),'1','0');
I have an std::vector. I want to copy the contents of the vector into a char* buffer of a certain size.
Is there a safe way to do this?
Can I do this?
memcpy(buffer, _v.begin(), buffer_size);
or this?
std::copy(_v.begin(), _v.end(), buffer); // throws a warning (unsafe)
or this?
for (int i = 0; i < _v.size(); i++)
{
*buffer = _v[i];
buffer++;
}
Thanks..
std::copy(_v.begin(), _v.end(), buffer);
This is preferred way to do this in C++. It is safe to copy this way if buffer is large enough.
If you just need char*, then you can do this:
char *buffer=&v[0];//v is guaranteed to be a contiguous block of memory.
//use buffer
Note changing data pointed to by buffer changes the vector's content also!
Or if you need a copy, then allocate a memory of size equal to v.size() bytes, and use std::copy:
char *buffer = new char[v.size()];
std::copy(v.begin(), v.end(), buffer);
Dont forget to delete []buffer; after you're done, else you'll leak memory.
But then why would you invite such a problem which requires you to manage the memory yourself.. especially when you can do better, such as:
auto copy = v; // that's simpler way to make copies!!
// and then use copy as new buffer.
// no need to manually delete anything. :-)
Hope that helps.
The safest way to copy a vector<char> into a char * buffer is to copy it to another vector, and then use that vector's internal buffer:
std::vector<char> copy = _v;
char * buffer = ©[0];
Of course, you can also access _vs buffer if you don't actually need to copy the data. Also, beware that the pointer will be invalidated if the vector is resized.
If you need to copy it into a particular buffer, then you'll need to know that the buffer is large enough before copying; there are no bounds checks on arrays. Once you've checked the size, your second method is best. (The first only works if vector::iterator is a pointer, which isn't guaranteed; although you could change the second argument to &_v[0] to make it work. The third does the same thing, but is more complicated, and probably should be fixed so it doesn't modify buffer).
Well, you want to assign to *buffer for case 3, but that should work. The first one almost certainly won't work.
EDIT: I stand corrected regarding #2.
static std::vector<unsigned char> read_binary_file (const std::string filename)
{
// binary mode is only for switching off newline translation
std::ifstream file(filename, std::ios::binary);
file.unsetf(std::ios::skipws);
std::streampos file_size;
file.seekg(0, std::ios::end);
file_size = file.tellg();
file.seekg(0, std::ios::beg);
std::vector<unsigned char> vec(file_size);
vec.insert(vec.begin(),
std::istream_iterator<unsigned char>(file),
std::istream_iterator<unsigned char>());
return (vec);
}
and then:
auto vec = read_binary_file(filename);
auto src = (char*) new char[vec.size()];
std::copy(vec.begin(), vec.end(), src);
but remember to delete []src later
I have a (void*) buffer that I need to convert to std::vector<unsigned char> before I can pass it on. Unfortunately, my C++ casting skills a little weak. Any suggestions?
You will need the length of the buffer. Once you do, we can do this:
unsigned char *charBuf = (unsigned char*)voidBuf;
/* create a vector by copying out the contents of charBuf */
std::vector<unsigned char> v(charBuf, charBuf + len);
Okay, the comment got me started on why I did not use reinterpret_cast:
In C++, the C-style cast is a convenience function -- it asks the compiler to choose the safest and most portable form of conversion over the set of available cast operators.
The reinterpret_cast is implementation defined and should always be the last thing on your mind (and used when you are necessarily doing a non-portable thing knowingly).
The conversion between (unsigned doesn't change the type) char * and void * is portable (you could actually use static_cast if you are really picky).
The problem with the C-style cast is: the added flexibility can cause heartaches when the pointer type changes.
Note: I agree with the general convention of not casting as much as possible. However, without any source provided, this is the best I could do.
You can't simply cast a void* to a std::vector<unsigned char> because the memory layout of the latter includes other objects, such as the size and the number of bytes currently allocated.
Assuming the buffer is pointed to by buf and its length is n:
vector<unsigned char> vuc(static_cast<char*>(buf), static_cast<char*>(buf) + n);
will create a copy of the buffer that you can safely use.
[EDIT: Added static_cast<char*>, which is needed for pointer arithmetic.]
The only time this would be legitimate is if you had already created a vector, and simply wanted to get it back.
void SomeFunc(void* input);
main() {
std::vector< unsigned char > v;
SomeFunc((void*) &v);
}
SomeFunc(void* input) {
// Now, you could cast that void* into a vector
std::vector< unsigned char >* v_ = (vector<unsigned char>*)input
}
I haven't actually tried to see if this will run, but that's the spirit of it. That said, if you are making this from scratch, you are definitely doing it wrong. This is really bad. The only time this could be even remotely understandable is if you are forced to implement the already defined "SomeFunc()".
using std::vector class for an already allocated buffer is not a solution. A std::vector object manages the memory and deallocates it at destruction time.
A complicated solution might be to write your own allocator, that uses an already allocated buffer, but you have to be very careful on several scenarios, like vector resizing, etc.
If you have that void* buffer bound through some C API functions, then you can forget about conversion to std::vector.
If you need only a copy of that buffer, it can be done like this:
std::vector< unsigned char> cpy(
(unsigned char*)buffer, (unsigned char*)buffer + bufferSize);
where bufferSize is the size in chars of the copied buffer.