Consider the following struct:
struct Vector4D
{
union
{
double components[4];
struct { double x, y, z, t; } Endpoint;
};
};
It seems to me that I have seen something similar in WinApi's IPAddress struct. The idea is to give me the possibility to use the array components both by index and by name, for example:
Vector4D v;
v.components[2] = 3.0;
ASSERT(v.Endpoint.z == 3.0) //let's ignore precision issues for now
In the C++ standard there is a guarantee that there will be no "empty" space at the beginning of a POD-struct, that is, the element x will be situated right in the beginnig of the Endpoint struct. Good so far. But I don't seem to find any guarantees that there will be no empty space or padding, if you will, between x and y, or y and z, etc. I haven't checked out the C99 standard though.
The problem is that if there is an empty space between Endpoint struct elements, then the idea will not work.
Questions:
Am I right that there indeed is no guarantee that this will work either in C or C++.
Will this practically work on any known implementation? In other words, do you know of any implementation where this doesn't work?
Is there any standard(I mean not compiler-specific) way to express the same idea? Maybe the C++0x alignment features might help?
By the way, this isn't something I am doing in production code, don't worry, just curious. Thanks in advance.
yes
depends on the alignment needs of the architecture and the compilers strategy
no, but you could make a object wrapper (but you will end up with .z() instead of just .z)
Most compilers should support squashing a structure using a pragma or an attribute. #pragma pack for example.
You can circumvent any memory alignment issues by having references to each element of the array, as long as you declare the array before the references in the class to ensure they point to valid data. Having said that I doubt alignment would be an issue with doubles, but could be for other types (float on 64bit arch perhaps?)
#include <iostream>
using namespace std;
struct Vector4D
{
Vector4D() : components(), x(components[0]), y(components[1]), z(components[2]), t(components[3]) { }
double components[4];
double& x;
double& y;
double& z;
double& t;
};
int main()
{
Vector4D v;
v.components[0] = 3.0;
v.components[1] = 1.0;
v.components[2] = 4.0;
v.components[3] = 15.0;
cout << v.x << endl;
cout << v.y << endl;
cout << v.z << endl;
cout << v.t << endl;
}
Hope this helps.
When it comes to the standard, there are two problems with it:
It is unspecified what happens when writing to an element in a union and reading from another, see the C standard 6.2.6.1 and K.1
The standard does not guarantee the layout of the struct match that of the layout of the array, see the C standard 6.7.2.1.10 for details.
Having said this, in practice this will work on normal compilers. In fact, this kind of code is widely spread and is often used to reinterpret values of one type into values of another type.
Padding bytes will not cause an issue as all variables are of type double. The compiler will treat Vector4D as a double array. That means, v.Endpoint.z is essentially the same as v[2].
Related
I know that C & C++ are different languages standardized by different committees.
I know that like C efficiency has been a major design goal for C++ from the beginning. So, I think if any feature doesn't incur any runtime overhead & if it is efficient then it should be added into the language. The C99 standard has some very useful & efficient features and one of them is compound literals. I was reading about compiler literals here.
Following is a program that shows the use of compound literals.
#include <stdio.h>
// Structure to represent a 2D point
struct Point
{
int x, y;
};
// Utility function to print a point
void printPoint(struct Point p)
{
printf("%d, %d", p.x, p.y);
}
int main()
{
// Calling printPoint() without creating any temporary
// Point variable in main()
printPoint((struct Point){2, 3});
/* Without compound literal, above statement would have
been written as
struct Point temp = {2, 3};
printPoint(temp); */
return 0;
}
So, due to the use of compound literals there is no creation of an extra object of type struct Point as mentioned in the comments. So, isn't it efficient because it avoids the need for an extra operation copying objects? So, why does C++ still not support this useful feature? Are there any problems with compound literals?
I know that some compilers like g++ support compound literals as an extension but it usually leads to unportable code & that code isn't strictly standard conforming. Is there any proposal to add this feature to C++ also? If C++ doesn't support any feature of C there must be some reason behind it & I want to know that reason.
I think that there is no need for compound literals in C++, because in some way, this functionality is already covered by its OOP capabilities (objects, constructors and so on).
You program may be simply rewritten in C++ as:
#include <cstdio>
struct Point
{
Point(int x, int y) : x(x), y(y) {}
int x, y;
};
void printPoint(Point p)
{
std::printf("%d, %d", p.x, p.y);
}
int main()
{
printPoint(Point(2, 3)); // passing an anonymous object
}
Consider the following C struct and C++ struct declarations:
extern "C" { // if this matters
typedef struct Rect1 {
int x, y;
int w, h;
} Rect1;
}
struct Vector {
int x;
int y;
}
struct Rect2 {
Vector pos;
Vector size;
}
Are the memory layouts of Rect1 and Rect2 objects always identical?
Specifically, can I safely reinterpret_cast from Rect2* to Rect1* and assume that all four int values in the Rect2 object are matched one on one to the four ints in Rect1?
Does it make a difference if I change Rect2 to a non-POD type, e.g. by adding a constructor?
I would think so, but I also think there could (legally) be padding between Rect2::pos and Rect2::size. So to make sure, I would add compiler-specific attributes to "pack" the fields, thereby guaranteeing all the ints are adjacent and compact. This is less about C vs. C++ and more about the fact that you are likely using two "different" compilers when compiling in the two languages, even if those compilers come from a single vendor.
Using reinterpret_cast to convert a pointer to one type to a pointer to another, you are likely to violate "strict aliasing" rules. Assuming you do dereference the pointer afterward, which you would in this case.
Adding a constructor will not change the layout (though it will make the class non-POD), but adding access specifiers like private between the two fields may change the layout (in practice, not only in theory).
Are the memory layouts of Rect1 and Rect2 objects always identical?
Yes. As long as certain obvious requirements hold, they are guaranteed to be identical. Those obvious requirements are about the target platform/architecture being the same in terms of alignment and word sizes. In other words, if you are foolish enough to compile the C and C++ code for different target platforms (e.g., 32bit vs. 64bit) and try to mix them, then you'll be in trouble, otherwise, you don't have to worry, the C++ compiler is basically required to produce the same memory layout as if it was in C, and ABI is fixed in C for a given word size and alignment.
Specifically, can I safely reinterpret_cast from Rect2* to Rect1* and assume that all four int values in the Rect2 object are matched one on one to the four ints in Rect1?
Yes. That follows from the first answer.
Does it make a difference if I change Rect2 to a non-POD type, e.g. by adding a constructor?
No, or at least, not any more. The only important thing is that the class remains a standard-layout class, which is not affected by constructors or any other non-virtual member. That's valid since the C++11 (2011) standard. Before that, the language was about "POD-types", as explained in the link I just gave for standard-layout. If you have a pre-C++11 compiler, then it is very likely still working by the same rules as the C++11 standard anyway (the C++11 standard rules (for standard-layout and trivial types) were basically written to match what all compiler vendors did already).
For a standard-layout class like yours you could easily check how members of a structure are positioned from the structure beginning.
#include <cstddef>
int x_offset = offsetof(struct Rect1,x); // probably 0
int y_offset = offsetof(struct Rect1,y); // probably 4
....
pos_offset = offsetof(struct Rect2,pos); // probably 0
....
http://www.cplusplus.com/reference/cstddef/offsetof/
Yes, they will always be the same.
You could try running the below example here cpp.sh
It runs as you expect.
// Example program
#include <iostream>
#include <string>
typedef struct Rect1 {
int x, y;
int w, h;
} Rect1;
struct Vector {
int x;
int y;
};
struct Rect2 {
Vector pos;
Vector size;
};
struct Rect3 {
Rect3():
pos(),
size()
{}
Vector pos;
Vector size;
};
int main()
{
Rect1 r1;
r1.x = 1;
r1.y = 2;
r1.w = 3;
r1.h = 4;
Rect2* r2 = reinterpret_cast<Rect2*>(&r1);
std::cout << r2->pos.x << std::endl;
std::cout << r2->pos.y << std::endl;
std::cout << r2->size.x << std::endl;
std::cout << r2->size.y << std::endl;
Rect3* r3 = reinterpret_cast<Rect3*>(&r1);
std::cout << r3->pos.x << std::endl;
std::cout << r3->pos.y << std::endl;
std::cout << r3->size.x << std::endl;
std::cout << r3->size.y << std::endl;
}
Consider the following:
#include <vector>
using namespace std;
struct Vec2
{
float m_x;
float m_y;
};
vector<Vec2> myArray;
int main()
{
myArray.resize(100);
for (int i = 0; i < 100; ++i)
{
myArray[i].m_x = (float)(i);
myArray[i].m_y = (float)(i);
}
float* raw;
raw = reinterpret_cast<float*>(&(myArray[0]));
}
Is raw guaranteed to have 200 contiguous floats with the correct values? That is, does the standard guarantee this?
EDIT: If the above is guaranteed, and if Vec2 has some functions (non-virtual) and a constructor, is the guarantee still there?
NOTE: I realize this is dangerous, in my particular case I have no
choice as I am working with a 3rd party library.
I realize this is dangerous, in my particular case I have no choice as I am working with a 3rd party library.
You may add compile time check of structure size:
live demo
struct Vec2
{
float a;
float b;
};
int main()
{
int assert_s[ sizeof(Vec2) == 2*sizeof(float) ? 1 : -1 ];
}
It would increase your confidence of your approach (which is still unsafe due to reinterpret_cast, as mentioned).
raw = reinterpret_cast(&(myArray[0]));
ISO C++98 9.2/17:
A pointer to a POD struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ]
And finally, runtime check of corresponding addresses would make such solution rather safe. It can be done during unit-tests or even at every start of program (on small test array).
Putting it all together:
live demo
#include <vector>
#include <cassert>
using namespace std;
struct Vec2
{
float a;
float b;
};
int main()
{
int assert_s[ sizeof(Vec2) == 2*sizeof(float) ? 1 : -1 ];
typedef vector<Vec2> Vector;
Vector v(32);
float *first=static_cast<float*>(static_cast<void*>(&v[0]));
for(Vector::size_type i,size=v.size();i!=size;++i)
{
assert((first+i*2) == (&(v[i].a)));
assert((first+i*2+1) == (&(v[i].b)));
}
assert(false != false);
}
No, this is not safe, because the compiler is free to insert padding between or after the two floats in the structure, and so the floats of the structure may not be contiguous.
If you still want to try it, you can add compile time checks to add more surety that it will work:
static_assert(sizeof(Vec2) == sizeof(float) * 2, "Vec2 struct is too big!");
static_assert(offsetof(Vec2, b) == sizeof(float), "Vec2::b at the wrong offset!");
The only guarantee that a reinterpret_cast gives is, that you get the original object when you reinterpret_cast the casted object back to the original data type.
Especially, raw is not guaranteed to have 200 contiguous floats with the correct values.
I've been reading about strict aliasing quite a lot lately. The C/C++ standards say that the following code is invalid (undefined behavior to be correct), since the compiler might have the value of a cached somewhere and would not recognize that it needs to update the value when I update b;
float *a;
...
int *b = reinterpret_cast<int*>(a);
*b = 1;
The standard also says that char* can alias anything, so (correct me if I'm wrong) compiler would reload all cached values whenever a write access to a char* variable is made. Thus the following code would be correct:
float *a;
...
char *b = reinterpret_cast<char*>(a);
*b = 1;
But what about the cases when pointers are not involved at all? For example, I have the following code, and GCC throws warnings about strict aliasing at me.
float a = 2.4;
int32_t b = reinterpret_cast<int&>(a);
What I want to do is just to copy raw value of a, so strict aliasing shouldn't apply. Is there a possible problem here, or just GCC is overly cautious about that?
EDIT
I know there's a solution using memcpy, but it results in code that is much less readable, so I would like not to use that solution.
EDIT2
int32_t b = *reinterpret_cast<int*>(&a); also does not work.
SOLVED
This seems to be a bug in GCC.
If you want to copy some memory, you could just tell the compiler to do that:
Edit: added a function for more readable code:
#include <iostream>
using std::cout; using std::endl;
#include <string.h>
template <class T, class U>
T memcpy(const U& source)
{
T temp;
memcpy(&temp, &source, sizeof(temp));
return temp;
}
int main()
{
float f = 4.2;
cout << "f: " << f << endl;
int i = memcpy<int>(f);
cout << "i: " << i << endl;
}
[Code]
[Updated Code]
Edit: As user/GMan correctly pointed out in the comments, a full-featured implementation could check that T and U are PODs. However, given that the name of the function is still memcpy, it might be OK to rely on your developers treating it as having the same constraints as the original memcpy. That's up to your organization. Also, use the size of the destination, not the source. (Thanks, Oli.)
Basically the strict aliasing rules is "it is undefined to access memory with another type than its declared one, excepted as array of characters". So, gcc isn't overcautious.
If this is something you need to do often, you can also just use a union, which IMHO is more readable than casting or memcpy for this specific purpose:
union floatIntUnion {
float a;
int32_t b;
};
int main() {
floatIntUnion fiu;
fiu.a = 2.4;
int32_t &x = fiu.b;
cout << x << endl;
}
I realize that this doesn't really answer your question about strict-aliasing, but I think this method makes the code look cleaner and shows your intent better.
And also realize that even doing the copies correctly, there is no guarantee that the int you get out will correspond to the same float on other platforms, so count any network/file I/O of these floats/ints out if you plan to create a cross-platform project.
I was looking through the source of OpenDE and I came across some wierd syntax usage of the array indexing operator '[]' on a class. Here's a simplified example to show the syntax:
#include <iostream>
class Point
{
public:
Point() : x(2.8), y(4.2), z(9.5) {}
operator const float *() const
{
return &x;
}
private:
float x, y, z;
};
int main()
{
Point p;
std::cout << "x: " << p[0] << '\n'
<< "y: " << p[1] << '\n'
<< "z: " << p[2];
}
Output:
x: 2.8
y: 4.2
z: 9.5
What's going on here? Why does this syntax work? The Point class contains no overloaded operator [] and here this code is trying to do an automatic conversion to float somewhere.
I've never seen this kind of usage before -- it definitely looks unusual and surprising to say the least.
Thanks
p is being converted implicitly into a const float* const, which points to x. So *p is x, *(p+1) is y, and so on.
It's a pretty weird idea (and confusing!) to do it this way, of course. It's usually preferable to store x, y, and z in an array and have a function to get the entire array if they really want to do things this way.
The idea here is to give access to the members of the Point by either subscript or name. If you want to do that, however, you'd be better off overloading operator[] something like this:
struct Point {
float x, y, z;
float &operator[](size_t subscript) {
switch(subscript) {
case 0: return x;
case 1: return y;
case 2: return z;
default: throw std::range_error("bad subscript");
}
}
};
This way, if the compiler inserts padding between the floats, it will still work -- and anybody who can read C++ should be able to understand it without any problems.
This is just a way of treating your member data as an array. You can also do this with structs. This is useful when you want readability, yet want to be able to iterate over simple data structures. An example use would be to declare a matrix this way:
typedef struct {
CGFloat m11,m12,m13,m14;
CGFloat m21,m22,m23,m24;
CGFloat m31,m32,m33,m34;
CGFloat m41,m42,m43,m44;
} CATransform3D;
You can conveniently reference each cell by name, yet you can also pass around a pointer to m11 everywhere (and C will see your struct as an array, m11 being the first element), and iterate over all the elements.