As a follow up to my previous question (Variable Length Array Performance Implications (C/C++)), I am having a bit of trouble maintaining const correctness using the C system call writev(). Namely, it appears as though I have run into the exact same problem as this user did in C, although I am using C++:
https://codereview.stackexchange.com/questions/9547/casting-const-pointer-to-non-const-pointer-when-using-struct-iovec
Here's a snippet of my code:
int my_awesome_transmit_function(const uint8_t* const buffer, const size_t length) {
// ... some stuff happens...
struct iovec iov[2];
iov[1].iov_base = buffer; // compiler yells about this
iov[1].iov_len = length;
// ... some more code you don't care about
writev(my_fd, iov, 2);
}
Given the solution presented from the CodeReview post, I have implemented the following change to the line that's giving me issues since I'd like to avoid the C-style cast:
iov[1].iov_base = const_cast<uint8_t*>(buffer);
Is this a valid way of using const cast? As far as I can tell, writev guarentees that the iov structure will not be modified (http://linux.die.net/man/2/writev). This blog post (http://blog.aaronballman.com/2011/06/when-should-you-use-const_cast/) leads me to believe that this is a valid usage (since I am never modifying the buffer), but I'd like to be certain since generally wherever I see const_cast or reinterpret_cast, I get a bit leery.
Thanks in advance for the help.
Yes, your use of const_cast is fine.
If one of the array elements is not going to be modified (ever), then you could declare it as a const.
const struct iovec iov_c(buffer, length); // create a constructor for your struct! always neater than
// assigning their members directly in code.
This assumes you have control over the signature of writev(..) and can pass in two iovec pointers.
If not, the usage of const_cast there looks alright.
Related
Say we declared a char* buffer:
char *buf = new char[sizeof(int)*4]
//to delete:
delete [] buf;
or a void* buffer:
void *buf = operator new(sizeof(int)*4);
//to delete:
operator delete(buf);
How would they differ if they where used exclusively with the purpose of serving as pre-allocated memory?- always casting them to other types(not dereferencing them on their own):
int *intarr = static_cast<int*>(buf);
intarr[1] = 1;
Please also answer if the code above is incorrect and the following should be prefered(only considering the cases where the final types are primitives like int):
int *intarr = static_cast<int*>(buf);
for(size_t i = 0; i<4; i++){
new(&intarr[i]) int;
}
intarr[1] = 1;
Finally, answer if it is safe to delete the original buffer of type void*/char* once it is used to create other types in it with the latter aproach of placement new.
It is worth clarifying that this question is a matter of curiosity. I firmly believe that by knowing the bases of what is and isnt possible in a programming language, I can use these as building blocks and come up with solutions suitable for every specific use case when I need to in the future. This is not an XY question, as I dont have a specific implementation of code in mind.
In any case, I can name a few things I can relate to this question off the top of my head(pre-allocated buffers specifically):
Sometimes you want to make memory buffers for custom allocation. Sometimes even you want to align these buffers to cache line boundaries or other memory boundaries. Almost always in the name of more performance and sometimes by requirement(e.g. SIMD, if im not mistaken). Note that for alignment you could use std::aligned_alloc()
Due to a technicality, delete [] buf; may be considered to be UB after buf has been invalidated due to the array of characters being destroyed due to the reuse of the memory.
I wouldn't expect it to cause problems in practice, but using raw memory from operator new doesn't suffer from this technicality and there is no reason to not prefer it.
Even better is to use this:
T* memory = std::allocator<T>::allocate(n);
Because this will also work for overaligned types (unlike your suggestions) and it is easy to replace with a custom allocator. Or, simply use std::vector
for(size_t i = 0; i<4; i++){
new(&intarr[i]) int;
}
There is a standard function for this:
std::uninitialized_fill_n(intarr, 4, 0);
OK, it's slightly different that it initialises with an actual value, but that's probably better thing to do anyway.
Say I want to store the size of a std::vector in an int I have the following options, to my knowledge:
int size = vector.size(); // Throws an implicit conversion warning
int size = (int)vector.size(); // C like typecasting is discouraged and forbidden in many code standards
int size = static_cast<int>(vector.size()); // This makes me want to gouge my eyes out (it's ugly)
Is there any other option that avoids all of the above issues?
I'm going to frame challenge this question. You shouldn't want a short and elegant solution to this problem.
Casting in any language, including C++, is basically the programmer's equivalent to swearing: you'll do it sometimes because it's easy and effortless, but you shouldn't. It means that somewhere, somehow, your design got screwed up. Maybe you need to pass the size of an array to an old API, but the old API didn't use size_t. Maybe you designed a piece of code to use float's, but in the actual implementation, you treat them like int's.
Regardless, casting is being used to patch over mistakes made elsewhere in the code. You shouldn't want a short and simple solution to resolve that. You should prefer something explicit and tedious, for two reasons:
It signals to other programmers that the cast isn't a mistake: that it's something intentional and necessary
To make you less likely to do it; and to instead focus on making sure your types are what you intended, rather than what the target API is expecting.
So embrace the static_cast, dynamic_cast, const_cast, and reinterpret_cast style of writing your code. Instead of trying to find ways to make the casts easier, find ways to refactor your code so they're less necessary.
If you're prepared to disregard all of that instead, then write something like this:
template<typename T, typename U>
T as(U && u) {
return static_cast<T>(u);
}
int size = as<int>(values.size());
bool poly_type::operator==(base_type const& o) {
if(this == &o)
return true;
if(typeid(*this) == typeid(o)) {
return as<poly_type const&>(o).value == value;
} else {
return false;
}
}
That'll at least reduce the amount of typing you end up using.
I'm going to answer your question just like you've asked. The other answers say why you shouldn't do it. But if you still want to have this, use this function:
#include <assert.h>
#include <limits.h>
inline int toInt(std::size_t value) {
assert(value<=MAX_INT);
return static_cast<int>(value);
}
Usage:
int size = toInt(vector.size());
toInt asserts if the input value is out of range. Feel free to modify it to your needs.
Storing a vector size , which might exceed the maximum value of int, in an int is an ugly operation in the first place. This is reflected in your compiler warning or the fact that you have to write ugly code to suppress the warning.
The presence of the static_cast informs other people reading the code that there is a potential bug here, the program might malfunction in various ways if the vector size does exceed INT_MAX.
Obviously (I hope?) the best solution is to use the right type for the value being stored, auto size = vector.size();.
If you really are determined to use int for whatever reason then I would recommend adding code to handle the case of the vector begin too large (e.g. throw before the int declaration if it is), or add a code comment explaining why that was never possible.
With no comments, the reader can't tell if your cast was just because you wanted to shut the compiler up and didn't care about the potential bug; or if you knew what you were doing.
There are problems, where we need to fill buffers with mixed types. Two examples:
programming OpenGL/DirectX, we need to fill vertex buffers, which can have mixed types (which is basically an array of struct, but the struct maybe described by a run-time data)
creating a memory allocator: putting header/trailer information to the buffer (size, flags, next/prev pointer, sentinels, etc.)
The problem can be described like this:
there is an allocation function, which gives back some memory (new, malloc, OS dependent allocation function, like mmap or VirtualAlloc)
there is a need to put mixed types into an allocated buffer, at various offsets
A solution can be this, for example writing an int to an offset:
void *buffer = <allocate>;
int offset = <some_offset>;
char *ptr = static_cast<char*>(buffer);
*reinterpret_cast<int*>(ptr+offset) = int_value;
However, this is inconvenient, and has UB at least two places:
ptr+offset is UB, as there is no char array at ptr
writing to the result of reinterpret_cast is UB, as there is no int there
To solve the inconvenience problem, this solution is often used:
union Pointer {
void *asVoid;
bool *asBool;
byte *asByte;
char *asChar;
short *asShort;
int *asInt;
Pointer(void *p) : asVoid(p) { }
};
So, with this union, we can do this:
Pointer p = <allocate>;
p.asChar += offset;
*p.asInt++ = int_value; // write an int to offset
*p.asShort++ = short_value; // then a short afterwards
// other writes here
This solution is convenient for filling buffers, but has further UB, as the solution uses non-active union members.
So, my question is: how can one solve this problem in a strictly standard conformant, and most convenient way? I mean, I'd like to have the functionality which the union solution gives me, but in a standard conformant way.
(Note: suppose, that we have no alignment issues here, alignment is taken care of by using proper offsets)
A simple (and conformant) way to handle these things is leveraging std::memcpy to move whatever values you need into the correct offsets in your storage area, e.g.
std::int32_t value;
char *ptr;
int offset;
// ...
std::memcpy(ptr+offset, &value, sizeof(value));
Do not worry about performance, since your compiler will not actually perform std::memcpy calls in many cases (e.g. small values). Of course, check the assembly output (and profile!), but it should be fine in general.
If I define an variable:
int (**a)[30];
It is pointer. This pointer points to a pointer which points to an array of 30 ints.
How to declare it or initialize it?
int (**a)[10] = new [10][20][30];
int (**a)[10] = && new int[10];
All doesn't work.
The direct answer to your question of how to initialize a (whether or not that's what you actually need) is
int (**a)[10] = new (int (*)[10]);
I don't think this is actually what you want though; you probably want to initialize the pointer to point to an actual array, and either way std::vector is the better way to do it.
If you want an answer to the question as it stands, then you can do this kind of thing:
int a[30];
int (*b)[30] = &a;
int (**c)[30] = &b;
But it's unlikely to be what you want, as other people have commented. You probably need to clarify your underlying goal - people can only speculate otherwise.
Just to follow on from MooingDuck's remark, I can in fact see a way to do it without the typedef, but not directly:
template <typename T>
T *create(T *const &)
{
return new T;
}
int (**a)[30] = create(a);
It's not pretty though.
What do you expect to get by writing &(&var)? This is an equivalent of address of address of a block of memory. Doing things like this just to satisfy the number of * in your code makes no sense.
Think about it - how can you get an address of an address? Even if, by some sheer luck or weird language tricks you manage to do it, there no way it will work.
I have the following declaration in a file that gets generated by a perl script ( during compilation ):
struct _gamedata
{
short res_count;
struct
{
void * resptr;
short id;
short type;
} res_table[3];
}
_gamecoderes =
{
3,
{
{ &char_resource_ID_RES_welcome_object_ID,1002, 1001 },
{ &blah_resource_ID_RES_another_object_ID,1004, 1003 },
{ &char_resource_ID_RES_someting_object_ID,8019, 1001 },
}
};
My problem is that struct _gamedata is generated during compile time and the number of items in res_table will vary. So I can't provide a type declaring the size of res_table in advance.
I need to parse an instance of this structure, originally I was doing this via a pointer to a char ( and not defining struct _gamedata as a type. But I am defining res_table.
e.g.
char * pb = (char *)_gamecoderes;
// i.e. pb points to the instance of `struct _gamedata`.
short res_count = (short *)pb;
pb+=2;
res_table * entry = (res_table *)pb;
for( int i = 0; i < res_count; i++ )
{
do_something_with_entry(*entry);
}
I'm getting wierd results with this. I'm not sure how to declare a type _struct gamedata as I need to be able to handle a variable length for res_table at compile time.
Since the struct is anonymous, there's no way to refer to the type of this struct. (res_table is just the member name, not the type's name). You should provide a name for the struct:
struct GameResult {
short type;
short id;
void* resptr;
};
struct _gamedata {
short res_count;
GameResult res_table[3];
};
Also, you shouldn't cast the data to a char*. The res_count and entry's can be extracted using the -> operator. This way the member offsets can be computed correctly.
_gamedata* data = ...;
short res_count = data->res_count;
GameResult* entry = data->res_table;
or simply:
_gamedata* data;
for (int i = 0; i < data->res_count; ++ i)
do_something_with_entry(data->res_table[i]);
Your problem is alignment. There will be at least two bytes of padding in between res_count and res_table, so you cannot simply add two to pb. The correct way to get a pointer to res_table is:
res_table *table = &data->res_table;
If you insist on casting to char* and back, you must use offsetof:
#include <stddef.h>
...
res_table *table = (res_table *) (pb + offsetof(_gamedata, res_table));
Note: in C++ you may not use offsetof with "non-POD" data types (approximately "types you could not have declared in plain C"). The correct idiom -- without casting to char* and back -- works either way.
Ideally use memcpy(3), at least use type _gamedata, or define a protocol
We can consider two use cases. In what I might call the programmer-API type, serialization is an internal convenience and the record format is determined by the compiler and library. In the more formally defined and bulletproof implementation, a protocol is defined and a special-purpose library is written to portably read and write a stream.
The best practice will differ depending on whether it makes sense to create a versioned protocol and develop stream I/O operations.
API
The best and most completely portable implementation when reading from compiler-oject serialized streams would be to declare or dynamically allocate an exact or max-sized _gamedata and then use memcpy(3) to pull the data out of the serial stream or device memory or whatever it is. This lets the compiler allocate the object that is accessed by compiler code and it lets the developer allocate the object that is accessed by developer (i.e., char *) logic.
But at a minimum, set a pointer to _gamedata and the compiler will do everything for you. Note also that res_table[n] will always be at the "right" address regardless of the size of the res_table[] array. It's not like making it bigger changes the location of the first element.
General serialization best practice
If the _gamedata object itself is in a buffer and potentially misaligned, i,e., if it is anything other than an object allocated for a _gamedata type by the compiler or dynamically by a real allocator, then you still have potential alignment issues and the only correct solution is to memcpy(3) each discrete type out of the buffer.
A typical error is to use the misaligned pointer anyway, because it works (slowly) on x86. But it may not work on mobile devices, or future architectures, or on some architectures when in kernel mode, or with advanced optimizations enabled. It's best to stick with real C99.
It's a protocol
Finally, when serializing binary data in any fashion you are really defining a protocol. So, for maximum robustness, don't let the compiler define your protocol. Since you are in C, you can generally handle each fundamental object discretely with no loss in speed. If both the writer and reader do it, then only the developers have to agree on the protocol, not the developers and the compilers and the build team, and the C99 authors, and Dennis M. Ritchie, and probably some others.
As #Zack points out, there is padding between elements of your structure.
I'm assuming you have a char* because you've serialized the structure (in a cache, on disk, or over the network). Just because you are starting with a char * doesn't mean you have to access the entire struct the hard way. Cast it to a typed pointer, and let the compiler do the work for you:
_gamedata * data = (_gamedata *) my_char_pointer;
for( int i = 0; i < data->res_count; i++ )
{
do_something_with_entry(*data->res_table[i]);
}