void* or char* for generic buffer representation? - c++

I'm designing a Buffer class whose purpose is to represent a chunk of memory.
My underlying buffer is a char* (well, a boost::shared_array<char> actually, but it doesn't really matter).
I'm stuck at deciding what prototype to choose for my constructor:
Should I go with:
Buffer(const void* buf, size_t buflen);
Or with:
Buffer(const char* buf, size_t buflen);
Or something else ?
What is usually done, and why ?

API interface is more clear for user, if buffer has void* type, and string has char* type. Compare memcpy and strcpy function definitions.

For the constructor and other API functions, the advantage of void* is that it allows the caller to pass in a pointer to any type without having to do an unnecessary cast. If it makes sense for the caller to be able to pass in any type, then void* is preferable. If it really only makes sense for the caller to be able to pass in char*, then use that type.

C++17
C++17 introduced std::byte specifically for this.
Its definition is actually simple: enum class byte : unsigned char {};.
I generally used unsigned char as the underlying structure (don't want signedness to mess up with my buffer for I know what reason). However I usually typedefed it:
// C++11
using byte = unsigned char;
// C++98
typedef unsigned char byte;
And then refer to it as byte* which neatly conveys the meaning in my opinion, better than either char* or void* at least.

I'd prefer char*, because for me personally it plays better with being "a buffer". void* seems more like "a pointer to I don't know what". Besides, it is what your underlying is, anyway.

I'd recommend uint8_t, which is defined in stdint.h. It's basically the same thing as the "typedef unsigned char byte;" that others have been recommending, but it has the advantage of being part of the C standard.
As for void*, I would only use that for polymorphism. ie. I'd only call something a void pointer if I didn't yet know what type of thing it would be pointing to. In your case you've got an array of bytes, so I'd label it as such by using uint8_t* as the type.

I prefer unsigned char * or uint8_t * for buffer implementations, since void * has the annoying restriction that you can't perform pointer math on it. So if you want to process some data at some offset from the buffer, or just break your buffer up into chunks or whatever, you are stuck casting to some other type anyway to do the math.
I prefer unsigned char * or uint8_t * over plain char * because of the special rules regarding aliasing and char *, which has the potential to seriously pessimize some loops working on your buffers.

Related

What type of cast is suitable to convert from unsigned char* to char*?

When I write data to and from a buffer to save to a file I tend to use std::vector<unsigned char>, and I treat those unsigned chars just as bytes to write anything into, so:
int sizeoffile = 16;
std::vector<unsigned char> buffer(sizeoffile);
std::ifstream inFile("somefile", std::ios::binary | std::ios::in);
inFile.read(buffer.data(), sizeoffile); // Argument of type unsigned char* is incompatible
// with parameter of type char*
The first argument of ifstream::read() wants a char pointer, but my vector buffer is unsigned char. What sort of cast is suitable here to read the data into my buffer? It's essentially a char* to unsigned char*. I can do with reinterpret_cast or a C-style cast, but this makes me think I'm doing something wrong as these are not very often recommended at all. Have I made the wrong choice of data type (unsigned char) for my buffer?
The safest thing to do will be not to use a cast directly, but to use a helper template that restricts itself to casting between types with compatible representations.
template<typename T, typename U>
treat_as(U* ptr) -> enable_if_t< is_same_type_v< remove_unsigned<T>, remove_unsigned<U> >, T >*
{ return reinterpret_cast<T*>(ptr); }
and then
inFile.read(treat_as<char>(&buffer[0]), sizeoffile);
If someday the vector type changes to unsigned wchar_t, this invocation will fail while a reinterpret_cast will silently start doing the wrong thing.
The similarity between char and unsigned char is a red herring here: for any trivially copyable type, you can reinterpret_cast its address to char* for filling via istream::read because char has special permission to alias any type. (Arguably it should work even for types like std::tuple<int> with trivial copy constructors but non-trivial copy assignment operators, but the standard doesn’t promise that. On the other hand, pointers are trivially copyable, but that doesn’t mean you can load pointer values from other executions!)
You have to use sizeof in general, of course; it might be wise to use it even if it’s 1 to protect against future type changes.

Why pass a pointer as a (char *) and cast to a (long *)

I know legacy is always a justification, but I wanted to check out this example from MariaDB and see if I understand it enough to critique what's going on,
static int show_open_tables(THD *, SHOW_VAR *var, char *buff) {
var->type = SHOW_LONG;
var->value = buff;
*((long *)buff) = (long)table_cache_manager.cached_tables();
return 0;
}
Here they're taking in char* and they're writing it to var->value which is also a char*. Then they force a pointer to a long in the buff and set the type to a SHOW_LONG to indicate it as such.
I'm wondering why they would use a char* for this though and not a uintptr_t -- especially being when they're forcing pointers to longs and other types in it.
Wasn't the norm pre-uintptr_t to use void* for polymorphism in C++?
There seems to be two questions here. So I've split my answer up.
Using char*
Using a char* is fine. Character types (char, signed char, and unsigned char) are specially treated by the C and C++ standards. The C standard defines the following rules for accessing an object:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.
This effectively means character types are the closest the standards come to defining a 'byte' type (std::byte in C++17 is just defined as enum class byte : unsigned char {})
However, as per the above rules casting a char* to a long* and then assigning to it is incorrect (although generally works in practice). memcpy should be used instead. For example:
long cached_tables = table_cache_manager.cached_tables();
memcpy(buf, &cached_tables, sizeof(cached_tables));
void* would also be a legitimate choice. Whether it is better is a mater of opinion. I would say the clearest option would be to add a type alias for char to convey the intent to use it as a byte type (e.g. typedef char byte_t). Of the top of my head though I can think of several examples of prominent libraries which use char as is, as a byte type. For example, the Boost memory mapped file code gives a char* and leveldb uses std::string as a byte buffer type (presumably to taking advantage of SSO).
Regarding uinptr_t:
uintptr_t is an optional type defined as an unsigned integer capable of holding a pointer. If you want to store the address of a pointed-to object in an integer, then it is a suitable type to use. It is not a suitable type to use here.
they're taking in char* and they're writing it to var->value which is also a char*. Then they force a pointer to a long in the buff and set the type to a SHOW_LONG to indicate it as such.
Or something. That code is hideous.
I'm wondering why they would use a char* for this though and not a uintptr_t -- especially being when they're forcing pointers to longs and other types in it.
Who knows? Who knows what the guy was on when he wrote it? Who cares? That code is hideous, we certainly shouldn't be trying to learn from it.
Wasn't the norm pre-uintptr_t to use void* for polymorphism in C++?
Yes, and it still is. The purpose of uintptr_t is to define an integer type that is big enough to hold a pointer.
I wanted to check out this example from MariaDB and see if I understand it enough to critique what's going on
You might have reservations about doing so but I certainly don't, that API is just a blatant lie. The way to do it (if you absolutely have to) would (obviously) be:
static int show_open_tables(THD *, SHOW_VAR *var, long *buff) {
var->type = SHOW_LONG;
var->value = (char *) buff;
*buff = (long)table_cache_manager.cached_tables();
return 0;
}
Then at least it is no longer a ticking time bomb.
Hmmm, OK, maybe (just maybe) that function is used in a dispatch table somewhere and therefore needs (unless you cast it) to have a specific signature. If so, I'm certainly not going to dig through 10,000 lines of code to find out (and anyway, I can't, it's so long it crashes my tablet).
But if anything, that would just make it worse. Now that timebomb has become a stealth bomber. And anyway, I don't believe it's that for a moment. It's just a piece of dangerous nonsense.

When defining a function that will operate on memory, is it more correct to pass addresses as void* or uint8*?

Most of the existing run-time memory functions accept or return void*, which enables passing of arguments without explicitly casting the pointer types. Should this pattern be replicated when creating custom memory functions?
In other words, which of the following is more correct, and why:
int read_bytes( void * dest, size_t count );
or
int read_bytes( uint8_t * dest, size_t count );
?
I would recommend using void*. Otherwise, every call to the function will look like:
int n = read_bytes((uint8_t*)&myVar, sizeof(myVar));
instead of just
int n = read_bytes(&myVar, sizeof(myVar));
void* is a general purpose pointer in C/C++. It is used in cases where you don't want a specific type specified with the data and avoids the need to cast the pointer. It is also the pointer that you want to use with raw addresses.
You would use uint_t where you want to specify that you are really dealing with unsigned integers.
If you treat memory only as raw, opaque memory, and not as a sequence of bytes, then void * is an appropriate type. This may be idiomatic, for example, when using placement-new to create object in memory. There are also some traditional C APIs that use void pointers for memory references, like memcpy or memchr, so occasionally it can be convenient to use the same type.
On the other hand, if you're thinking of memory as an array of bytes, and especially if you want to access random bytes in memory (i.e. perform pointer or iterator arithmetic), you should absolutely use a char pointer type. There's a certain debate about which one is best; typically, for I/O you want plain char as the "system's I/O data type" (e.g. reading/writing I/O). On the other hand, if you want to operate on arithmetic byte values, unsigned char is more appropriate. The two types are layout-compatible, though, so feel free to treat one as the other if that's necessary.

Conversion of C Style Casting to C++ Style Casting

I'm not familiar in C++ casting and I want to convert my C style casting to a C++ casting. Here is my code,
typedef unsigned char u8;
u8 sTmp[20] = {0};
//.. code to put string data in sTmp
char* sData;
sData = (char*)&(sTmp[0]);
Here, I want to convert (char*)&(sTmp[0]) to a C++ casting.
Many thanks.
Your cast is unnecessarily complicated. You get the first element of the array and then the address of that element. On expressions, arrays decay into pointers, so you can get the address of the array by its name alone:
sData = (char*)sTmp;
Like #Richard said above, the best way to do the cast on C++ is using reinterpret_cast, like this:
sData = reinterpret_cast<char*>(sTmp);
Finally, sTemp (like I already mentioned) will decay to a pointer on expressions, specifically an unsigned char* (which is the usual way of addressing raw memory), so it is very likely that you don't actually need to cast it to char* at all. (unless you have to print it, which doesn´t seem right anyway)

What is the meaning of this?

Code:
void *buff;
char *r_buff = (char *)buff;
I can't understand the type casting of buff. Please help.
Thank you.
buff is a pointer to some memory, where the type of its content is unspecified (hence the void).
The second line tells that r_buff shall point to the same memory location, and the contents shall be interpreted as char(s).
buff is typed as a void pointer, which means it points to memory without declaring anything about the contents.
When you cast to char *, you declare that you're interpreting the pointer as being a char pointer.
In well written C++, you should not use C-style casts. So your cast should look like this:
void *buff;
char *r_buff = static_cast<char *>(buff);
See here for an explanation of what the C++ casting operators do.
By its name, buff is likely to be a memory buffer in which to write data, possibly binary data.
There are reasons why one might want to cast it to char *, potentially to use pointer arithmetic on it as one is writing because you cannot do that with a void*.
For example if you are supplied also a size (likely) and your API requires not pointer and size but 2 pointers (begin and end) you will need pointer arithmetic to determine where the end is.
The code could well be C in which case the cast is correct. If the code is C++ though a static_cast is preferable although the C cast is not incorrect in this instance. The reason a static_cast is generally preferred is that the compiler will catch more occasions when you cast incorrectly that way, and it is also more easily greppable. However casting in general breaks type-safety rules and therefore is preferably avoided much of the time. (Not that it is never correct, as it may be here).