Valid use of reinterpret_cast?

Valid use of reinterpret_cast? - c++

Empirically the following works (gcc and VC++), but is it valid and portable code?
typedef struct
{
int w[2];
} A;
struct B
{
int blah[2];
};
void my_func(B b)
{
using namespace std;
cout << b.blah[0] << b.blah[1] << endl;
}
int main(int argc, char* argv[])
{
using namespace std;
A a;
a.w[0] = 1;
a.w[1] = 2;
cout << a.w[0] << a.w[1] << endl;
// my_func(a); // compiler error, as expected
my_func(reinterpret_cast<B&>(a)); // reinterpret, magic?
my_func( *(B*)(&a) ); // is this equivalent?
return 0;
}
// Output:
// 12
// 12
// 12
Is the reinterpret_cast valid?
Is the C-style cast equivalent?
Where the intention is to have the bits located at &a interpreted as a
type B, is this a valid / the best approach?
(Off topic: For those that want to know why I'm trying to do this, I'm dealing with two C libraries that want 128 bits of memory, and use structs with different internal names - much like the structs in my example. I don't want memcopy, and I don't want to hack around in the 3rd party code.)

In C++11, this is fully allowed if the two types are layout-compatible, which is true for structs that are identical and have standard layout. See this answer for more details.
You could also stick the two structs in the same union in previous versions of C++, which had some guarantees about being able to access identical data members (a "common initial sequence" of data members) in the same order for different structure types.
In this case, yes, the C-style cast is equivalent, but reinterpret_cast is probably more idiomatic.

Related

Zero sized array in struct managed by shared pointer

Consider the following structure:
struct S
{
int a;
int b;
double arr[0];
} __attribute__((packed));
As you can see, this structure is packed and has Zero sized array at the end.
I'd like to send this as binary data over the network (assume I took care of endianity).
In C/C++ I could just use malloc to allocate as much space as I want and use free later.
I'd like this memory to be handled by std::shared_ptr.
Is there a straight forward way of doing so without special hacks?

I'd like this memory to be handled by std::shared_ptr.
Is there a straight forward way of doing so without special hacks?
Sure, there is:
shared_ptr<S> make_buffer(size_t s)
{
auto buffer = malloc(s); // allocate as usual
auto release = [](void* p) { free(p); }; // a deleter
shared_ptr<void> sptr(buffer, release); // make it shared
return { sptr, new(buffer) S }; // an aliased pointer
}
This works with any objects that are placed in a malloced buffer, not just when there are zero-sized arrays, provided that the destructor is trivial (performs no action) because it is never called.
The usual caveats about zero-sized arrays and packed structures still apply as well, of course.

double arr[0];
} __attribute__((packed));
Zero sized arrays are not allowed as data members (nor as any other variable) in C++. Furthermore, there is no such attribute as packed in C++; it is a language extension (as such, it may be considered to be a special hack). __attribute__ itself is a language extension. The standard syntax for function attributes uses nested square brackets like this: [[example_attribute]].
I'd like to send this as binary data over the network
You probably should properly serialise the data. There are many serialisation specifications although none of them is universally ubiquitous and none of them is implemented in the C++ standard library. Indeed, there isn't a standard API for network commnication either.
A straightforward solution is to pick an existing serialisation format and use an existing library that implements it.

First, let me explain again why I have this packed structure:
it is used for serialization of data over the network so there's a header file with all network packet structures.
I know it generates bad assembly due to alignment issues, but I guess that this problem persists with regular serialization (copy to char * buffer with memcpy).
Zero size arrays are supported both by gcc and clang which I use.
Here's an example of a full program with a solution to my question and it's output (same output for gcc and g++).
compiled with -O3 -std=c++17 flags
#include <iostream>
#include <memory>
#include <type_traits>
#include <cstddef>
struct S1
{
~S1() {std::cout << "deleting s1" << std::endl;}
char a;
int b;
int c[0];
} __attribute__((packed));
struct S2
{
char a;
int b;
int c[0];
};
int main(int argc, char **argv)
{
auto s1 = std::shared_ptr<S1>(static_cast<S1 *>(::operator
new(sizeof(S1) + sizeof(int) * 1e6)));
std::cout << "is standart: " << std::is_standard_layout<S1>::value << std::endl;
for (int i = 0; i < 1e6; ++i)
{
s1->c[i] = i;
}
std::cout << sizeof(S1) << ", " << sizeof(S2) << std::endl;
std::cout << offsetof(S1, c) << std::endl;
std::cout << offsetof(S2, c) << std::endl;
return 0;
}
This is the output:
is standart: 1
5, 8
5
8
deleting s1
Is there anything wrong with doing this?
I made sure using valgrind all allocations/frees work properly.

Cast an object value without pointers

Let's assume that A and B are two classes (or structures) having no inheritance relationships (thus, object slicing cannot work). I also have an object b of the type B. I would like to interpret its binary value as a value of type A:
A a = b;
I could use reinterpret_cast, but I would need to use pointers:
A a = reinterpret_cast<A>(b); // error: invalid cast
A a = *reinterpret_cast<A *>(&b); // correct [EDIT: see *footnote]
Is there a more compact way (without pointers) that does the same? (Including the case where sizeof(A) != sizeof(B))
Example of code that works using pointers: [EDIT: see *footnote]
#include <iostream>
using namespace std;
struct C {
int i;
string s;
};
struct S {
unsigned char data[sizeof(C)];
};
int main() {
C c;
c.i = 4;
c.s = "this is a string";
S s = *reinterpret_cast<S *>(&c);
C s1 = *reinterpret_cast<C *>(&s);
cout << s1.i << " " << s1.s << endl;
cout << reinterpret_cast<C *>(&s)->i << endl;
return 0;
}
*footnote: It worked when I tried it, but it is actually an undefined behavior (which means that it may work or not) - see comments below

No. I think there's nothing in the C++ syntax that allows you to implicitly ignore types. First, that's against the notion of static typing. Second, C++ lacks standardization at binary level. So, whatever you do to trick the compiler about the types you're using might be specific to a compiler implementation.
That being said, if you really wanna do it, you should check how your compiler's data alignment/padding works (i.e.: struct padding in c++) and if there's a way to control it (i.e.: What is the meaning of "__attribute__((packed, aligned(4))) "). If you're planning to do this across compilers (i.e.: with data transmitted across the network), then you should be extra careful. There are also platform issues, like different addressing models and endianness.

Yes, you can do it without a pointer:
A a = reinterpret_cast<A &>(b); // note the '&'
Note that this may be undefined behaviour. Check out the exact conditions at http://en.cppreference.com/w/cpp/language/reinterpret_cast

function that returns an array of objects

I have got a structure
class pyatno {
int pyatnoNumber;
int locX, locY;
bool possible;
char *number;
char pyatnoView[4][4];
}
the idea is to make a function, that would return an array of pyatno.pyatnoView objects, but there is a mess. I don't understand how exactly I can get access to this "property". I am not strong in c++, so if it isn't real, or i am talking something wrong, explain please, cause I am really stacked in this question.

As you mentioned that you are not very strong with c++, and your question is rather unclear, here are several suggestions.
To get access to a class's attributes, c++ has the notion of visibility; The default visibility is private, that is, attributes and functions will not be visible outside of the class:
class Foo {
int some_value;
};
There are several ways you can retrieve data from an object, however to put it simply, you should either make the attribute public:
class Foo {
public:
int some_value;
};
or expose it via accessors/mutators:
class Foo {
int some_value;
public:
int get_some_value() { return some_value; }
void set_some_value(int v) { some_value = v; }
};
Another thing to note is that you can not return arrays! In c++, when an array passes a function boundary (that is to say, passed as a parameter to, or returned from), and in a lot of other cases, an array will 'decay' in to a pointer. For example, the following is how I would pass an array of characters (otherwise known as a c-string) to a function:
#include <iostream>
using namespace std;
void print_cstr(const char *cstr) {
cout << cstr << endl;
}
int main() {
const char my_cstr[20] = "foo bar baz qux";
print_cstr(my_cstr);
return 0;
}
So what happens for N-dimensional arrays? Well, if char[1] decays to char*, then char[1][1] will decay to char**, and so on. You might have noticed this with the older main signature in C programs, which is used to pass an array of strings representing arguments passed to the program:
int main(int argc, char **argv) { ... }
It is very important that you realise that this really is no longer an array. C style strings are a bit special, in that they are conventionally terminated with a null byte \0, which means that you can usually tell where the end of the string is, or how long it is. However, you no longer have any information on how long the array is! For example, this is completely legal:
#include <iostream>
using namespace std;
void bad_fn(const int *nums) {
for (unsigned i = 0; i < 20; ++i) {
cout << "num " << i << " = " << nums[i] << endl;
}
}
int main() {
const int my_nums[5] = { 1, 2, 3, 4, 5, };
bad_fn(my_nums);
return 0;
}
Your function will end up reading memory beyond the bounds of the array, as it has no way of knowing where the array begins or ends (after all, array indexes are just pointer arithmetic). If you do not want to have to worry about keeping track of, and passing around the length of your array (and I would suggest that you do not!), please look at using one of the C++ standard library's containers. std::vector and std::array are two examples that would fit in the use case you have provided, and you can find decent documentation for them here.

C/C++ Struct memory layout equivalency

Consider the following C struct and C++ struct declarations:
extern "C" { // if this matters
typedef struct Rect1 {
int x, y;
int w, h;
} Rect1;
}
struct Vector {
int x;
int y;
}
struct Rect2 {
Vector pos;
Vector size;
}
Are the memory layouts of Rect1 and Rect2 objects always identical?
Specifically, can I safely reinterpret_cast from Rect2* to Rect1* and assume that all four int values in the Rect2 object are matched one on one to the four ints in Rect1?
Does it make a difference if I change Rect2 to a non-POD type, e.g. by adding a constructor?

I would think so, but I also think there could (legally) be padding between Rect2::pos and Rect2::size. So to make sure, I would add compiler-specific attributes to "pack" the fields, thereby guaranteeing all the ints are adjacent and compact. This is less about C vs. C++ and more about the fact that you are likely using two "different" compilers when compiling in the two languages, even if those compilers come from a single vendor.
Using reinterpret_cast to convert a pointer to one type to a pointer to another, you are likely to violate "strict aliasing" rules. Assuming you do dereference the pointer afterward, which you would in this case.
Adding a constructor will not change the layout (though it will make the class non-POD), but adding access specifiers like private between the two fields may change the layout (in practice, not only in theory).

Are the memory layouts of Rect1 and Rect2 objects always identical?
Yes. As long as certain obvious requirements hold, they are guaranteed to be identical. Those obvious requirements are about the target platform/architecture being the same in terms of alignment and word sizes. In other words, if you are foolish enough to compile the C and C++ code for different target platforms (e.g., 32bit vs. 64bit) and try to mix them, then you'll be in trouble, otherwise, you don't have to worry, the C++ compiler is basically required to produce the same memory layout as if it was in C, and ABI is fixed in C for a given word size and alignment.
Specifically, can I safely reinterpret_cast from Rect2* to Rect1* and assume that all four int values in the Rect2 object are matched one on one to the four ints in Rect1?
Yes. That follows from the first answer.
Does it make a difference if I change Rect2 to a non-POD type, e.g. by adding a constructor?
No, or at least, not any more. The only important thing is that the class remains a standard-layout class, which is not affected by constructors or any other non-virtual member. That's valid since the C++11 (2011) standard. Before that, the language was about "POD-types", as explained in the link I just gave for standard-layout. If you have a pre-C++11 compiler, then it is very likely still working by the same rules as the C++11 standard anyway (the C++11 standard rules (for standard-layout and trivial types) were basically written to match what all compiler vendors did already).

For a standard-layout class like yours you could easily check how members of a structure are positioned from the structure beginning.
#include <cstddef>
int x_offset = offsetof(struct Rect1,x); // probably 0
int y_offset = offsetof(struct Rect1,y); // probably 4
....
pos_offset = offsetof(struct Rect2,pos); // probably 0
....
http://www.cplusplus.com/reference/cstddef/offsetof/

Yes, they will always be the same.
You could try running the below example here cpp.sh
It runs as you expect.
// Example program
#include <iostream>
#include <string>
typedef struct Rect1 {
int x, y;
int w, h;
} Rect1;
struct Vector {
int x;
int y;
};
struct Rect2 {
Vector pos;
Vector size;
};
struct Rect3 {
Rect3():
pos(),
size()
{}
Vector pos;
Vector size;
};
int main()
{
Rect1 r1;
r1.x = 1;
r1.y = 2;
r1.w = 3;
r1.h = 4;
Rect2* r2 = reinterpret_cast<Rect2*>(&r1);
std::cout << r2->pos.x << std::endl;
std::cout << r2->pos.y << std::endl;
std::cout << r2->size.x << std::endl;
std::cout << r2->size.y << std::endl;
Rect3* r3 = reinterpret_cast<Rect3*>(&r1);
std::cout << r3->pos.x << std::endl;
std::cout << r3->pos.y << std::endl;
std::cout << r3->size.x << std::endl;
std::cout << r3->size.y << std::endl;
}

Does a simple cast to perform a raw copy of a variable break strict aliasing?

I've been reading about strict aliasing quite a lot lately. The C/C++ standards say that the following code is invalid (undefined behavior to be correct), since the compiler might have the value of a cached somewhere and would not recognize that it needs to update the value when I update b;
float *a;
...
int *b = reinterpret_cast<int*>(a);
*b = 1;
The standard also says that char* can alias anything, so (correct me if I'm wrong) compiler would reload all cached values whenever a write access to a char* variable is made. Thus the following code would be correct:
float *a;
...
char *b = reinterpret_cast<char*>(a);
*b = 1;
But what about the cases when pointers are not involved at all? For example, I have the following code, and GCC throws warnings about strict aliasing at me.
float a = 2.4;
int32_t b = reinterpret_cast<int&>(a);
What I want to do is just to copy raw value of a, so strict aliasing shouldn't apply. Is there a possible problem here, or just GCC is overly cautious about that?
EDIT
I know there's a solution using memcpy, but it results in code that is much less readable, so I would like not to use that solution.
EDIT2
int32_t b = *reinterpret_cast<int*>(&a); also does not work.
SOLVED
This seems to be a bug in GCC.

If you want to copy some memory, you could just tell the compiler to do that:
Edit: added a function for more readable code:
#include <iostream>
using std::cout; using std::endl;
#include <string.h>
template <class T, class U>
T memcpy(const U& source)
{
T temp;
memcpy(&temp, &source, sizeof(temp));
return temp;
}
int main()
{
float f = 4.2;
cout << "f: " << f << endl;
int i = memcpy<int>(f);
cout << "i: " << i << endl;
}
[Code]
[Updated Code]
Edit: As user/GMan correctly pointed out in the comments, a full-featured implementation could check that T and U are PODs. However, given that the name of the function is still memcpy, it might be OK to rely on your developers treating it as having the same constraints as the original memcpy. That's up to your organization. Also, use the size of the destination, not the source. (Thanks, Oli.)

Basically the strict aliasing rules is "it is undefined to access memory with another type than its declared one, excepted as array of characters". So, gcc isn't overcautious.

If this is something you need to do often, you can also just use a union, which IMHO is more readable than casting or memcpy for this specific purpose:
union floatIntUnion {
float a;
int32_t b;
};
int main() {
floatIntUnion fiu;
fiu.a = 2.4;
int32_t &x = fiu.b;
cout << x << endl;
}
I realize that this doesn't really answer your question about strict-aliasing, but I think this method makes the code look cleaner and shows your intent better.
And also realize that even doing the copies correctly, there is no guarantee that the int you get out will correspond to the same float on other platforms, so count any network/file I/O of these floats/ints out if you plan to create a cross-platform project.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js