float bits and strict aliasing - c++

I am trying to extract the bits from a float without invoking undefined behavior. Here is my first attempt:
unsigned foo(float x)
{
unsigned* u = (unsigned*)&x;
return *u;
}
As I understand it, this is not guaranteed to work due to strict aliasing rules, right? Does it work if a take an intermediate step with a character pointer?
unsigned bar(float x)
{
char* c = (char*)&x;
unsigned* u = (unsigned*)c;
return *u;
}
Or do I have to extract the individual bytes myself?
unsigned baz(float x)
{
unsigned char* c = (unsigned char*)&x;
return c[0] | c[1] << 8 | c[2] << 16 | c[3] << 24;
}
Of course this has the disadvantage of depending on endianness, but I could live with that.
The union hack is definitely undefined behavior, right?
unsigned uni(float x)
{
union { float f; unsigned u; };
f = x;
return u;
}
Just for completeness, here is a reference version of foo. Also undefined behavior, right?
unsigned ref(float x)
{
return (unsigned&)x;
}
So, is it possible to extract the bits from a float (assuming both are 32 bits wide, of course)?
EDIT: And here is the memcpy version as proposed by Goz. Since many compilers do not support static_assert yet, I have replaced static_assert with some template metaprogramming:
template <bool, typename T>
struct requirement;
template <typename T>
struct requirement<true, T>
{
typedef T type;
};
unsigned bits(float x)
{
requirement<sizeof(unsigned)==sizeof(float), unsigned>::type u;
memcpy(&u, &x, sizeof u);
return u;
}

About the only way to truly avoid any issues is to memcpy.
unsigned int FloatToInt( float f )
{
static_assert( sizeof( float ) == sizeof( unsigned int ), "Sizes must match" );
unsigned int ret;
memcpy( &ret, &f, sizeof( float ) );
return ret;
}
Because you are memcpying a fixed amount the compiler will optimise it out.
That said the union method is VERY widely supported.

The union hack is definitely undefined behavior, right?
Yes and no. According to the standard, it is definitely undefined behavior. But it is such a commonly used trick that GCC and MSVC and as far as I know, every other popular compiler, explicitly guarantees that it is safe and will work as expected.

The following does not violate the aliasing rule, because it has no use of lvalues accessing different types anywhere
template<typename B, typename A>
B noalias_cast(A a) {
union N {
A a;
B b;
N(A a):a(a) { }
};
return N(a).b;
}
unsigned bar(float x) {
return noalias_cast<unsigned>(x);
}

If you really want to be agnostic about the size of the float type and just return the raw bits, do something like this:
void float_to_bytes(char *buffer, float f) {
union {
float x;
char b[sizeof(float)];
};
x = f;
memcpy(buffer, b, sizeof(float));
}
Then call it like so:
float a = 12345.6789;
char buffer[sizeof(float)];
float_to_bytes(buffer, a);
This technique will, of course, produce output specific to your machine's byte ordering.

Related

Real case that will break when access non-active union members?

According to cppreference and Purpose of Unions in C and C++, the code below is UB:
// convert char[8] to uint64_t
uint64_t convert(char c[8]) {
union{
uint64_t v;
char c[8];
} u;
for(int i = 0; i < 8; i++) {
u.c[i] = c[i];
}
return u.v;
}
// another example
union U {
uint64_t v;
struct{
uint32_t l;
uint32_t h;
}d;
};
uint64_t setlow(uint64_t v, uint32_t l) {
U u{v};
u.d.l = l;
return u.v;
}
However, even though it's UB, this kind of usage gives much convenience and I find it actually works for most compilers(GCC/Clang). So I want to know is there any compiler/implementation in practice that will make the code above break?
The reason the language hasn’t blessed the compiler extension (based on questionable C wording) is that it doesn’t fit with the rest of the language. Consider:
union U {
float f;
int i;
int operator()();
};
static int putget(float &f,int &i) {
i=0;
f=1;
return i;
}
int U::operator()() {return putget(f,i);}
With the function definition immediately available, GCC and Clang elide the store to i, and yet they return 0 as if that store had not only happened but wasn’t dead.

Out of bounds array accesses in C++ and reinterpret_cast

Say I have code like this
struct A {
int header;
unsigned char payload[1];
};
A* a = reinterpret_cast<A*>(new unsigned char[sizeof(A)+100]);
a->payload[50] = 42;
Is this undefined behavior? Creating a pointer that points outside payload should be undefined AFAIK, but I'm unsure whether this is also true in the case where I have allocated the memory after the array.
The standard says p[n] is the same as *(p+ n) and "if the expression P poinst to the i-th element of an array object, the expressions (P)+N point to the i+n-th elements of the array". In the example payload points to an element in the array allocated with new, so this might be ok.
If possible, it would be nice if your answers contained references to the C++ standard.
So the reinterpret_cast is undefined behavior, we can reinterpret_cast to a char or unsigned char we can never cast from a char or unsigned char, if we do:
Accessing the object through the new pointer or reference invokes undefined behavior. This is known as the strict aliasing rule.
So yes this is a violation of the strict aliasing rule.
Consider the code:
struct {char x[4]; char a; } foo;
int work_with_foo(int i)
{
foo.a = 1;
foo.x[i]++;
return foo.a;
}
Even though the program would "own" the storage at foo.x+4, the fact that
access via the array type is only defined for the first four elements would
allow a compiler to, among other things, replace the above code with either
of the following:
int work_with_foo(int i) { foo.a = 1; foo.x[i]++; return 1; }
int work_with_foo(int i) { foo.x[i]++; foo.a = 1; return 1; }
The above substitutions are clearly permissible under the Standard. It is
less clear what alternate ways of writing the increment would force the
compiler to behave as though it reloads foo.a. For example, I think the
code *(i+(char*)&foo)+=1; would have defined behavior when i equals the
offset of foo.a, and I would think the same should be true of
*(i+(char*)&foo.x)+=1; but I'm not sure about *(i+foo.x)+=1; or
*(i+(char*)foo.x)+=1;.
This old C hack is never necessary in C++.
consider:
#include <cstdint>
#include <utility>
#include <memory>
template<std::size_t Size>
struct A {
int header;
unsigned char payload[Size];
};
struct polyheader
{
struct concept
{
virtual int& header() = 0;
virtual unsigned char* payload() = 0;
virtual std::size_t size() const = 0;
virtual ~concept() = default; // not strictly necessary, but a reasonable precaution
};
template<std::size_t Size>
struct model : concept
{
using a_type = A<Size>;
model(a_type a) : _a(std::move(a)) {}
int& header() override {
return _a.header;
}
unsigned char* payload() override {
return _a.payload;
}
std::size_t size() const override {
return Size;
}
A<Size> _a;
};
int& header() { return _impl->header(); }
unsigned char* payload() { return _impl->payload(); }
std::size_t size() const { return _impl->size(); }
template<std::size_t Size>
polyheader(A<Size> a)
: _impl(std::make_unique<model<Size>>(std::move(a)))
{}
std::unique_ptr<concept> _impl;
};
int main()
{
auto p1 = polyheader(A<40>());
auto p2 = polyheader(A<80>());
}

Convert pointer to float?

I have a unsigned char*. Typically this points to a chunk of data, but in some cases, the pointer IS the data, ie. casting a int value to the unsigned char* pointer (unsigned char* intData = (unsigned char*)myInteger;), and vice versa.
However, I need to do this with a float value, and it keeps giving me conversion errors.
unsigned char* data;
float myFloat = (float)data;
How can I do this?
bit_cast:
template <class Dest, class Source>
inline Dest bit_cast(Source const &source) {
static_assert(sizeof(Dest)==sizeof(Source), "size of destination and source objects must be equal");
static_assert(std::is_trivially_copyable<Dest>::value, "destination type must be trivially copyable.");
static_assert(std::is_trivially_copyable<Source>::value, "source type must be trivially copyable");
Dest dest;
std::memcpy(&dest, &source, sizeof(dest));
return dest;
}
Usage:
char *c = nullptr;
float f = bit_cast<float>(c);
c = bit_cast<char *>(f);
The only correct way to use a given variable to store other data is to copy the data byte-wise:
template <typename T>
void store(unsigned char * & p, T const & val)
{
static_assert(sizeof(unsigned char *) >= sizeof(T));
char const * q = reinterpret_cast<char const *>(&val);
std::copy(q, q + sizeof(T), reinterpret_cast<char *>(&p));
}
Usage:
unsigned char * p;
store(p, 1.5);
store(p, 12UL);
The matching retrieval function:
template <typename T>
T load(unsigned char * const & p)
{
static_assert(sizeof(unsigned char *) >= sizeof(T));
T val;
char const * q = reinterpret_cast<char const *>(&p);
std::copy(q, q + sizeof(T), reinterpret_cast<char *>(&val));
return val;
}
Usage:
auto f = load<float>(p);
If your compiler supports it (GCC does) then use a union. This is undefined behavior according to the C++ standard.
union {
unsigned char* p;
float f;
} pun;
pun.p = data;
float myFloat = pun.f;
This works if sizeof(unsigned char *) == sizeof(float). If pointers are larger than floats then you have to rethink your strategy.
See wikipedia article on type punning and in particular the section on use of a union.
GCC allows type punning using a union as long as you use the union directly and not typecasting to a union... see this IBM discussion on type-pun problems for correct and incorrect ways of using GCC for type punning.
Also see wikipedia's article on strong and weak typing and a well researched article on type punning and strict aliasing.
unsigned char* data;
float myFloat = *(float*)data;

Concise way to initialize a block of memory with magic numbers

A few examples of what I'm referring to:
typedef struct SOME_STRUCT {
unsigned int x1;
unsigned int x2;
unsigned int x3;
unsigned int x4;
// What I expected would work, but doesn't; the 2nd parameter gets
// turned into an 8-bit quantity at some point within memset
SOME_STRUCT() { memset( this, 0xFEEDFACE, sizeof( *this ) ); }
// Something that worked, but seems hokey/hackish
SOME_STRUCT() {
unsigned int *me = (unsigned int *)this;
for( int ii = 0; ii < sizeof(*this)/sizeof(*me); ++ii ) {
me[ii] = 0xFEEDFACE;
}
}
// The far-more-verbose-but-C++-way-of-doing-it
// This works, but doesn't lend itself very well
// to being a drop-in way to pull this off on
// any struct.
SOME_STRUCT() : x1( 0xFEEDFACE )
, x2( 0XFEEDFACE )
, x3( 0XFEEDFACE )
, x4( 0XFEEDFACE ) {}
// This would work, but I figured there would be a standard
// function that would alleviate the need to do it myself
SOME_STRUCT() { my_memset( this, 0xFEEDFACE, sizeof(*this) ); }
}
I can't use valgrind here, and my options are limited as far as various debugging libraries I have access to -- which is why I'm doing it myself for this one-off case.
Here’s a partial example of using std::generate() safely:
#include <algorithm>
struct Wizard {
size_t i;
static unsigned char magic[4];
Wizard() : i(0) {}
unsigned char operator()() {
size_t j = i++;
i %= sizeof(magic); // Not strictly necessary due to wrapping.
return magic[j];
}
};
unsigned char Wizard::magic[4] = {0xDE,0xAD,0xBE,0xEF};
std::generate(reinterpret_cast<unsigned char*>(this),
reinterpret_cast<unsigned char*>(this) + sizeof(*this),
Wizard());
(Of course, the endianness may or may not be right, depending on how you’re looking and what you’re expecting to see when you do!)
I would declare this constructor:
SOME_STRUCT( unsigned int magic) : x1 (magic), x2 (magic), x3 (magic), x4 (magic) {}
This is very similar to your third option, and seems to be the natural C++ way of doing it.
A point not made by others is this:
I think it is unsafe to do this for Non-POD types. Ironically, adding the initialization into a constructor makes it non-pod. Therefore I propose a freestanding function that checks for POD-ness statically (sample uses c++0x type_traits but you could use Boost as well)
#include <iostream>
#include <type_traits>
template <typename T>
typename std::enable_if<std::is_pod<T>::value>::type* FeedFace(T& v)
{
static const unsigned char MAGIC[] = { 0xFE, 0xED, 0xFA, 0xCE };
unsigned char *me = reinterpret_cast<unsigned char *>(&v);
for( size_t ii = 0; ii < sizeof(T)/sizeof(unsigned char); ++ii )
me[ii] = MAGIC[ii % sizeof(MAGIC)/sizeof(unsigned char)];
}
struct Pod { char data[37]; };
struct NonPod : Pod { virtual ~NonPod() { } };
int main()
{
Pod pod;
FeedFace(pod);
NonPod nonpod;
// FeedFace(nonpod); // fails to compile (no matching function call)
return 0;
}
I assume this allows for nasty hacky stuff, like this:
#include <iomanip>
#include <iostream>
#include <algorithm>
using namespace std;
int main(void)
{
struct SOME_STRUCT {
unsigned int x1;
unsigned int x2;
unsigned int x3;
unsigned int x4;
} foo;
fill(reinterpret_cast<unsigned int *>(&foo),
reinterpret_cast<unsigned int *>(&foo) + sizeof(foo) / sizeof(unsigned int),
(unsigned int)0xDEADBEEF);
cout << foo.x1 << endl;
cout << foo.x2 << endl;
cout << foo.x3 << endl;
cout << foo.x4 << endl;
return (0);
}
Basically abusing std::fill() with pointer casts.
You could reinterpret_cast this as a char* and then use std::generate with a predicate that rotates through the values you care about. If I get time later I'll try to sketch the code.
Also have you considered for example an LD_PRELOAD memory checking malloc library?
Here's another hacky method.
SOME_STRUCT() {
x1 = 0xFEEDFACE;
memmove(&(this->x2), this, sizeof(*this)-sizeof(x1));
}
Even if your memset() attempt did work, it makes an assumption about the structure packing and is therefore not guaranteed to be correct. There is no programmatic way to iterate through the members of a struct and assign them in C or C++. You will therefore need to be content with assigning the members individually. Having said that, if you feel that you are comfortable with the memory layout of the structure and don't need to worry about portable code, you can just as easily initialize it with a for loop.
unsigned int i, *ar = (unsigned int *)&my_struct;
for (i = 0; i < sizeof(my_struct) / sizeof(unsigned int); i++) {
ar[i] = 0xdeadbeef;
}

C++ union array and vars?

There's no way to do something like this, in C++ is there?
union {
{
Scalar x, y;
}
Scalar v[2];
};
Where x == v[0] and y == v[1]?
Since you are using C++ and not C, and since they are of the same types, why not just make x a reference to v[0] and y a reference to v[1]
How about
union {
struct {
int x;
int y;
};
int v[2];
};
edit:
union a {
struct b { int first, second; } bee;
int v[2];
};
Ugly, but that's more accurate
Try this:
template<class T>
struct U1
{
U1();
T v[2];
T& x;
T& y;
};
template<class T>
U1<T>::U1()
:x(v[0])
,y(v[1])
{}
int main()
{
U1<int> data;
data.x = 1;
data.y = 2;
}
I've used something like this before. I'm not sure its 100% OK by the standard, but it seems to be OK with any compilers I've needed to use it on.
struct Vec2
{
float x;
float y;
float& operator[](int i) { return *(&x+i); }
};
You can add bounds checking etc to operator[] if you want ( you probably should want) and you can provide a const version of operator[] too.
If you're concerned about padding (and don't want to add the appropriate platform specific bits to force the struct to be unpadded) then you can use:
struct Vec2
{
float x;
float y;
float& operator[](int i) {
assert(i>=0);
assert(i<2);
return (i==0)?x:y;
}
const float& operator[](int i) const {
assert(i>=0);
assert(i<2);
return (i==0)?x:y;
}
};
I was looking for a similair thing and eventually came up with a solution.
I was looking to have a data storage object that I could use as both an array of values and as individual values (for end-user flexibility in writing Arduino libraries).
Here is what I came up with:
class data{
float _array[3];
public:
float& X = _array[0];
float& Y = _array[1];
float& Z = _array[2];
float& operator[](int index){
if (index >= 3) return _array[0]; //Make this action whatever you want...
return _array[index];
}
float* operator&(){return _array;}
};
int main(){
data Test_Vector;
Test_Vector[0] = 1.23; Test_Vector[1] = 2.34; Test_Vector[2] = 3.45;
cout<<"Member X = "<<Test_Vector.X;
cout<<"Member Y = "<<Test_Vector.Y;
cout<<"Member Z = "<<Test_Vector.Z;
float* vector_array = &Test_Vector;
cout<<"Array = {"<<vector_array[0]<<", "<<vector_array[1]<<", "<<vector_array[2]<<"}";
}
Thanks to Operator overloading, we can use the data object as if was an array and we can use it for pass-by-reference in function calls (just like an array)!
If someone with More C++ experience has a better way of applying this end product, I would love to see it!
EDIT: Changed up the code to be more cross-platform friendly
Given your example:
union
{
struct
{
Scalar x, y;
};
Scalar v[2];
};
As others have noted, in general, the standard does not guarantee that there will be no padding between x and y, and actually compilers inserting padding in structures is pretty common behavior.
On the other hand, with solutions like:
struct U
{
int v[2];
int& x;
int& y;
};
U::U()
: x(v[0])
, y(v[1])
{}
what I don't like mainly is the fact that I have to mention x, y twice. For cases where I have more than just a few elements (say 10), this becomes much less readable and harder to maintain - e.g. if you want to change the order of x,y then you have to change the indexes below too (well not mandatory but otherwise order in memory wouldn't match order of fields, which would not be recommended). Also, U can no longer be a POD since it needs a user-defined constructor. And finally, the x & y references consume additional memory.
Hence, the (acceptable for me) compromise I've come up with is:
struct Point
{
enum CoordType
{
X,
Y,
COUNT
};
int coords[CoordType::COUNT];
};
typedef Point::CoordType PtCoord;
With this you can then do:
Point p;
for ( int i = 0; i < PtCoord::COUNT; i++ )
p.coords[i] = 100;
std::cout << p.coords[PtCoord::X] << " " << p.coords[PtCoord::Y] << std::endl;
// 100 100
A bit sophisticated but I prefer this over the references suggestion.
Depending on what "Scalar" is, yes, you can do that in C++. The syntax is almost exactly (maybe even exactly exactly, but I'm rusty on unions) what you wrote in your example. It's the same as C, except there are restrictions on the types that can be in the unions (IIRC they must have a default constructor). Here's the relevant Wikipedia article.
With C++11 you have anonymous unions and structs which just export their definitions to the enclosing scope, so you can do this:
typedef int Scalar;
struct Vector
{
union
{
struct
{
Scalar x, y;
};
Scalar v[2];
};
};