Are casts as safe as unions?

Are casts as safe as unions? - c++

I want to split large variables like floats into byte segments and send these serially byte by byte via UART. I'm using C/C++.
One method could be to deepcopy the value I want to send to a union and then send it. I think that would be 100% safe but slow. The union would look like this:
union mySendUnion
{
mySendType sendVal;
char[sizeof(mySendType)] sendArray;
}
Another option could be to cast the pointer to the value I want to send, into a pointer to a particular union. Is this still safe?
The third option could be to cast the pointer to the value I want to send to a char, and then increment a pointer like this:
sendType myValue = 443.2;
char* sendChar = (char*)myValue;
for(char i=0; i< sizeof(sendType) ; i++)
{
Serial.write(*(sendChar+j), 1);
}
I've had succes with the above pointer arithmetics, but I'm not sure if it's safe under all circumstances. My concern is, what if we for instance is using a 32 bit processor and want to send a float. The compiler choose to store this 32 bit float into one memory cell, but does only store one single char into each 32 bit cell.
Each counter increment would then make the program pointer increment one whole memory cell, and we would miss the float.
Is there something in the C standard that prevents this, or could this be an issue with a certain compiler?

First off, you can't write your code in "C/C++". There's no such language as "C/C++", as they are fundamentally different languages. As such, the answer regarding unions differs radically.
As to the title:
Are casts as safe as unions?
No, generally they aren't, because of the strict aliasing rule. That is, if you type-pun a pointer of one certain type with a pointer to an incompatible type, it will result in undefined behavior. The only exception to this rule is when you read or manipulate the byte-wise representation of an object by aliasing it through a pointer to (signed or unsigned) char. As in your case.
Unions, however, are quite different bastards. Type punning via copying to and reading from unions is permitted in C99 and later, but results in undefined behavior in C89 and all versions of C++.
In one direction, you can also safely type pun (in C99 and later) using a pointer to union, if you have the original union as an actual object. Like this:
union p {
char c[sizeof(float)];
float f;
} pun;
union p *punPtr = &pun;
punPtr->f = 3.14;
send_bytes(punPtr->c, sizeof(float));
Because "a pointer to a union points to all of its members and vice versa" (C99, I don't remember the exact pargraph, it's around 6.2.5, IIRC). This isn't true in the other direction, though:
float f = 3.14;
union p *punPtr = &f;
send_bytes(punPtr->c, sizeof(float)); // triggers UB!
To sum up: the following code snippet is valid in both C89, C99, C11 and C++:
float f = 3.14;
char *p = (char *)&f;
size_t i;
for (i = 0; i < sizeof f; i++) {
send_byte(p[i]); // hypotetical function
}
The following is only valid in C99 and later:
union {
char c[sizeof(float)];
float f;
} pun;
pun.f = 3.14;
send_bytes(pun.c, sizeof float); // another hypotetical function
The following, however, would not be valid:
float f = 3.14;
unsigned *u = (unsigned *)&f;
printf("%u\n", *u); // undefined behavior triggered!
Another solution that is always guaranteed to work is memcpy(). The memcpy() function does a bytewise copying between two objects. (Don't get me started on it being "slow" -- in most modern compilers and stdlib implementations, it's an intrinsic function).

A general advice when sending floating point data on a byte stream would be to use some serialization technology, to ensure that the data format is well defined (and preferably architecture neutral, beware of endianness issues!).
You could use XDR -or perhaps ASN1- which is a binary format (see xdr(3) for more). For C++, see also libs11n
Unless speed or data size is very critical, I would suggest instead a textual format like JSON or perhaps YAML (textual formats are more verbose, but easier to debug and to document). There are several good libraries supporting it (e.g. jsoncpp for C++ or jansson for C).
Notice that serial ports are quite slow (w.r.t. CPU). So the serialization processing time is negligible.
Whatever you do, please document the serialization format (even for an internal project).

The cast to [[un]signed] char [const] * is legal and it won't cause issues when reading, so that is a fine option (that is, after fixing char *sendChar = reinterpret_cast<char*>(&myValue);, and since you are at it, make it const)
Now the next problem comes on the other side, when reading, as you cannot safely use the same approach for reading. In general, the cost of copying the variables is much less than the cost of sending over the UART, so I would just use the union when reading out of the serial.

Related

Passing pointers to arrays of unrelated, but compatible, types without copying?

(Disclaimer: At this point, this is mostly academic interest.)
Imagine I have such an external interface, that is, I do not control it's code:
// Provided externally: Cannot (easily) change this:
// fill buffer with n floats:
void data_source_external(float* pDataOut, size_t n);
// send n data words from pDataIn:
void data_sink_external(const uint32_t* pDataIn, size_t n);
Is it possible within standard C++ to "move" / "stream" data between these two interfaces without copying?
That is, is there any way to make the following be non-UB, without copying of the data between two correctly typed buffers?
int main()
{
constexpr size_t n = 64;
float fbuffer[n];
data_source_external(fbuffer, n);
// These hold and can be checked statically:
static_assert(sizeof(float) == sizeof(uint32_t), "same size");
static_assert(alignof(float) == alignof(uint32_t), "same alignment");
static_assert(std::numeric_limits<float>::is_iec559 == true, "IEEE 754");
// This is clearly UB. Any way to make this work without copying the data?
const uint32_t* buffer_alias = static_cast<uint32_t*>(static_cast<void*>(fbuffer));
// **Note**:
// + reinterpret_cast would also be UB.
data_sink_external(buffer_alias, n);
// ...
As far as I can tell the following would be defined behavior, at least with regard to strict aliasing:
...
uint32_t ibuffer[n];
std::memcpy(ibuffer, fbuffer, n * sizeof(uint32_t));
data_sink_external(ibuffer, n);
but given that the ibuffer will have exactly the same bits as the fbuffer this seems quite insane.
Or would we expect optimizing compilers to optimize even this copy away? (In a now deleted comment-like answer a user posted a godbolt link that seems to indicate, at least on first glance, that clang 11 indeed would be able to optimize out the memcpy.)

I didn't test and can't comment yet (cause not enough reputation). But reinterpret_cast may help in this situation.
Documentation
Basically it tells the compiler, hey treat this pointer as if it was the specified type in the cast.

Manipulating byte vector through float pointer

Is it possible to manipulate an std::vector<unsigned char> through its data pointer as if it were a container of float?
Here is an example that compiles and (seemingly?) runs as desired (GCC 4.8, C++11):
#include <iostream>
#include <vector>
int main()
{
std::vector<unsigned char> bytes(2 * sizeof(float));
auto ptr = reinterpret_cast<float *>(bytes.data());
ptr[0] = 1.1;
ptr[1] = 1.2;
std::cout << ptr[0] << ", " << ptr[1] << std::endl;
return 0;
}
This snippet successfully writes/reads data from the byte buffer as if it were an array of float. From reading about reinterpret_cast I'm afraid that this might be undefined behavior. My confidence in understanding the type aliasing details is too little for me to be sure.
Is the code snippet undefined behavior as outlined above? If so, is there another way to achieve this sort of byte manipulation?

Legal answer
No, this is not permitted.
C++ isn't just "a load of bytes" — the compiler (and, more abstractly, the language) have been told that you have a container of unsigned chars, not a container of floats. No floats exist, and you can't pretend that they do.
The rule you're looking for, which is known as strict aliasing, may be found under [basic.lval]/8.
The opposite would work, because it is permitted (via a special rule in that same paragraph) to examine the bytes of any type via an unsigned char*. But in your case, the quickest safe and correct way to "get" a float from something that starts life as unsigned char is to std::memcpy or std::copy those bytes into an actual float that exists:
std::vector<unsigned char> bytes(2 * sizeof(float));
float f1, f2;
// Extracting values
std::memcpy(
reinterpret_cast<unsigned char*>(&f1),
bytes.data(),
sizeof(float)
);
std::memcpy(
reinterpret_cast<unsigned char*>(&f2),
bytes.data() + sizeof(float),
sizeof(float)
);
// Putting them back
f1 = 1.1;
f2 = 1.2;
std::memcpy(
bytes.data(),
reinterpret_cast<unsigned char*>(&f1),
sizeof(float)
);
std::memcpy(
bytes.data() + sizeof(float),
reinterpret_cast<unsigned char*>(&f2),
sizeof(float)
);
This is fine as long as those bytes form a valid representation of float on your system. Granted it looks a little unwieldy, but a quick wrapper function will make short work of it.
A common alternative, assuming you only care about floats and don't need a resizable buffer, is to produce some std::aligned_storage then do a bunch of placement new into the resulting buffer. Since C++17, you could alternatively play around with std::launder, though resizing the vector (read: reallocating its buffer) would also be inadvisable in that scenario.
Also, these approaches are quite involved and result in complex code that not all your readers will be able to follow. If you can launder your data such that it "is" a sequence of floats, you may as well just make yourself a nice std::vector<float> in the first place. Per the above, it is permitted to get and use an unsigned char* to that buffer if you wish.
It ought to be noted that there is much code out there in the wild that uses your original approach (particularly in older projects with a barebones C heritage). On many implementations, it may appear to work. But it is a common misconception that it is valid and/or safe, and you're prone to instruction "re-ordering" (or other optimisations) if you rely on it.
Hedge-betting answer
For what it's worth, if you disable strict aliasing (GCC permits this as an extension, and LLVM doesn't even implement it), then you can probably get away with your original code. Just be careful.

Is it possible to manipulate an std::vector through its data pointer as if it were a container of float?
Not quite. Your example has UB indeed.
However, you can reuse the storage of those bytes to create the floats there. Example:
float* ptr = std::launder(reinterpret_cast<float*>(bytes.data()));
std::uninitialized_fill_n(ptr, 2, 0.0f);
After this, the lifetime of the unsigned char objects has ended, end there are floats there instead. Using ptr is well defined.
Whether this would be useful for you is another matter. Start with a simpler design first: Why not simply use std::vector<float>?

What is the use of intptr_t?

I know it is an integer type that can be cast to/from pointer without loss of data, but why would I ever want to do this? What advantage does having an integer type have over void* for holding the pointer and THE_REAL_TYPE* for pointer arithmetic?
EDIT
The question marked as "already been asked" doesn't answer this. The question there is if using intptr_t as a general replacement for void* is a good idea, and the answers there seem to be "don't use intptr_t", so my question is still valid: What would be a good use case for intptr_t?

The primary reason, you cannot do bitwise operation on a void *, but you can do the same on a intptr_t.
On many occassion, where you need to perform bitwise operation on an address, you can use intptr_t.
However, for bitwise operations, best approach is to use the unsigned counterpart, uintptr_t.
As mentioned in the other answer by #chux, pointer comparison is another important aspect.
Also, FWIW, as per C11 standard, §7.20.1.4,
These types are optional.

There's also a semantic consideration.
A void* is supposed to point to something. Despite modern practicality, a pointer is not a memory address. Okay, it usually/probably/always(!) holds one, but it's not a number. It's a pointer. It refers to a thing.
A intptr_t does not. It's an integer value, that is safe to convert to/from a pointer so you can use it for antique APIs, packing it into a pthread function argument, things like that.
That's why you can do more numbery and bitty things on an intptr_t than you can on a void*, and why you should be self-documenting by using the proper type for the job.
Ultimately, almost everything could be an integer (remember, your computer works on numbers!). Pointers could have been integers. But they're not. They're pointers, because they are meant for different use. And, theoretically, they could be something other than numbers.

The uintptr_t type is very useful when writing memory management code. That kind of code wants to talk to its clients in terms of generic pointers (void *), but internally do all kinds of arithmetic on addresses.
You can do some of the same things by operating in terms of char *, but not everything, and the result looks like pre-Ansi C.
Not all memory management code uses uintptr_t - as an example, the BSD kernel code defines a vm_offset_t with similar properties. But if you are writing e.g. a debug malloc package, why invent your own type?
It's also helpful when you have %p available in your printf, and are writing code that needs to print pointer sized integral variables in hex on a variety of architectures.
I find intptr_t rather less useful, except possibly as a way station when casting, to avoid the dread warning about changing signedness and integer size in the same cast. (Writing portable code that passes -Wall -Werror on all relevant architectures can be a bit of a struggle.)

What is the use of intptr_t?
Example use: order comparing.
Comparing pointers for equality is not a problem.
Other compare operations like >, <= may be UB. C11dr §6.5.8/5 Relational operators.
So convert to intptr_t first.
[Edit] New example: Sort an array of pointers by pointer value.
int ptr_cmp(const void *a, const void *b) {
intptr_t ia = (intptr) (*((void **) a));
intptr_t ib = (intptr) (*((void **) b));
return (ia > ib) - (ia < ib);
}
void *a[N];
...
qsort(a, sizeof a/sizeof a[0], sizeof a[0], ptr_cmp);
[Former example]
Example use: Test if a pointer is of an array of pointers.
#define N 10
char special[N][1];
// UB as testing order of pointer, not of the same array, is UB.
int test_special1(char *candidate) {
return (candidate >= special[0]) && (candidate <= special[N-1]);
}
// OK - integer compare
int test_special2(char *candidate) {
intptr_t ca = (intptr_t) candidate;
intptr_t mn = (intptr_t) special[0];
intptr_t mx = (intptr_t) special[N-1];
return (ca >= mn) && (ca <= mx);
}
As commented by #M.M, the above code may not work as intended. But at least it is not UB. - just non-portably functionality. I was hoping to use this to solve this problem.

(u)intptr_t is used when you want to do arithmetic on pointers, specifically bitwise operations. But as others said, you'll almost always want to use uintptr_t because bitwise operations are better done in unsigned. However if you need to do an arithmetic right shift then you must use intptr_t1. It's usually used for storing data in the pointer, usually called tagged pointer
In x86-64 you can use the high 16/7 bits of the address for data, but you must do sign extension manually to make the pointer canonical because it doesn't have a flag for ignoring the high bits like in ARM2. So for example if you have char* tagged_address then you'll need to do this before dereferencing it
char* pointer = (char*)((intptr_t)tagged_address << 16 >> 16);
The 32-bit Chrome V8 engine uses smi (small integer) optimization where the low bits denote the type
|----- 32 bits -----|
Pointer: |_____address_____w1| # Address to object, w = weak pointer
Smi: |___int31_value____0| # Small integer
So when the pointer's least significant bit is 0 then it'll be right shifted to retrieve the original 31-bit signed int
int v = (intptr_t)address >> 1;
For more information read
Using the extra 16 bits in 64-bit pointers
Pointer magic for efficient dynamic value representations
Another usage is when you pass a signed integer as void* which is usually done in simple callback functions or threads
void* my_thread(void *arg)
{
intptr_t val = (intptr_t)arg;
// Do something
}
int main()
{
pthread_t thread1;
intptr_t some_val = -2;
int r = pthread_create(&thread1, NULL, my_thread, (void*)some_val);
}
1 When the implementation does arithmetic shift on signed types of course
2 Very new x86-64 CPUs may have UAI/LAM support for that

How to cast char array to int at non-aligned position?

Is there a way in C/C++ to cast a char array to an int at any position?
I tried the following, bit it automatically aligns to the nearest 32 bits (on a 32 bit architecture) if I try to use pointer arithmetic with non-const offsets:
unsigned char data[8];
data[0] = 0; data[1] = 1; ... data[7] = 7;
int32_t p = 3;
int32_t d1 = *((int*)(data+3)); // = 0x03040506 CORRECT
int32_t d2 = *((int*)(data+p)); // = 0x00010203 WRONG
Update:
As stated in the comments the input comes in tuples of 3 and I cannot
change that.
I want to convert 3 values to an int for further
processing and this conversion should be as fast as possible.
The
solution does not have to be cross platform. I am working with a very
specific compiler and processor, so it can be assumed that it is a 32
bit architecture with big endian.
The lowest byte of the result does not matter to me (see above).
My main questions at the moment are: Why has d1 the correct value but d2 does not? Is this also true for other compilers? Can this behavior be changed?

No you can't do that in a portable way.
The behaviour encountered when attempting a cast from char* to int* is undefined in both C and C++ (possibly for the very reasons that you've spotted: ints are possibly aligned on 4 byte boundaries and data is, of course, contiguous.)
(The fact that data+3 works but data+p doesn't is possibly due to to compile time vs. runtime evaluation.)
Also note that the signed-ness of char is not specified in either C or C++ so you should use signed char or unsigned char if you're writing code like this.
Your best bet is to use bitwise shift operators (>> and <<) and logical | and & to absorb char values into an int. Also consider using int32_tin case you build to targets with 16 or 64 bit ints.

There is no way, converting a pointer to a wrongly aligned one is undefined.
You can use memcpy to copy the char array into an int32_t.
int32_t d = 0;
memcpy(&d, data+3, 4); // assuming sizeof(int) is 4
Most compilers have built-in functions for memcpy with a constant size argument, so it's likely that this won't produce any runtime overhead.
Even though a cast like you've shown is allowed for correctly aligned pointers, dereferencing such a pointer is a violation of strict aliasing. An object with an effective type of char[] must not be accessed through an lvalue of type int.
In general, type-punning is endianness-dependent, and converting a char array representing RGB colours is probably easier to do in an endianness-agnostic way, something like
int32_t d = (int32_t)data[2] << 16 | (int32_t)data[1] << 8 | data[0];

Casting long int as a struct pointer

I know this is a rather noobish question, but no amount of googling or permutations of code seem to work.
I have a structure which is defined like this.
typedef struct
{
int rate;
int duration;
} DummyStructure;
Now, i try to use code similar to this.
//
DummyStructure* structure;
DummyStructure* structure2;
long int point;
//
structure = (DummyStructure*)malloc(sizeof(DummyStructure));
structure->rate = 19;
structure->duration = 92;
point = (long int)&structure;
// now i'd like to set structure2 to the same memory location as 1.
// point is the 8-byte int location (i assume?) of structure.
// so naturally i'd assume that by casting as a DummyStructure pointer
// that long int would act as a pointer to that 1.
// It doesn't.
structure2 = (DummyStructure*)point;
I stress that i've tried every permutation of ** and * that is possible. I just don't get it. Either it doesn't compile, or it does, and when it does i end up with seemingly random numbers for the fields contained in structure2. I assume that somehow i'm winding up with an incorrect memory location, but how else can you get it except from using the &?
I have the memory location, right? How do i set the structure to that location?
EDIT; I forgot to mention (and subsequent answers have asked) but i'm intending to use this to wrap libvorbis for jni. Using jni means that i can't pass-back any of the structs that libvorbis does, but it requires them for its core functions. Therefore my wrapper is going to use vorbis directly to make the structs, and i pass back to java the pointer to them so that when i need to fill the buffer with more sound, i can simply re-reference the struct objects from the integer value of the pointer.

Why are you trying to cast pointers to integers and back? Is it just to learn, to figure something out, to work around some (untold) restriction, or what? It's a rather strange thing to be doing in a "plain" program such as this, as there is no point.
One likely cause of your problems is that there's no guarantee that a pointer will even fit in a long int. You can check by adding this code:
printf("a regular pointer is %u bytes, long int is %u",
(unsigned int) sizeof structure, (unsigned int) sizeof point);
If the numbers printed are different, that's probably the largest cause of your problems.
If you're using C99, you should #include <stdint.h> and then use the intptr_t type instead of unsigned long to hold a pointer, in any case.

structure is already a pointer, so you don't have to take the address there:
long int point = reinterpret_cast<long int>(structure);
DummyStructure* structure2 = reinterpret_cast<DummyStructure*>(point);

structure is already a pointer. You just want to do point = (long int) structure; (although, realistically, why a long int is involved at all, I don't know. It's a lot easier to just do structure2=structure; which works fine since structure and structure2 are both pointers.)
When you do &structure you're getting the memory location where the pointer itself is stored, which is why it isn't the correct value. You really probably don't want to ever use &structure unless it's being passed into a function which is going to change which DummyStructure structure points to.

Others have answered your question, but I'd like to make a more general comment. You mention JNI; in this case, you don't want long int, but jlong (which will be a typedef to either long int or long long int, depending on the machine. The problem is that long will have a different size, depending on the machine, and will map to a different Java type. Of course, you're counting on the fact that jlong will be big enough to hold a pointer, but since jlong is 64 bits, this seems like a safe bet for the immediate future (and beyond—I don't see a time coming where 64 bits of addressing doesn't suffice).
Also, I would suggest you borrow an idea from Swig, and avoid the subtleties of pointer to integral conversions, by using something like the following:
jlong forJNI = 0;
*reinterpret_cast<DummyStructure*>( &forJNI ) = structure;
// ...
structure2 = *reinterpret_cast<DummyStructure*>( &forJNI );
This is ugly, but it is guaranteed to work (with one caveat) for all
systems where sizeof(DummyStructure*) <= 64.
Just be sure to compile with strict aliasing turned off. (You have to
do this anytime you cast between pointers and ints. IMHO, you shouldn't
have to in cases where the casts are visible to the compiler, but some
compiler writers prefer breaking code intentionally, even when the
intent is clear.)

Long ints aren't the same as pointers. Why don't you just do:
DummyStructure** point;
structure = malloc(sizeof(DummyStructure));
structure->rate = 19;
structure->duration = 92;
point = &structure;
structure2 = *point;
The problem is probably a combination of the fact that 1) you don't dereference point. structure2 is a pointer to structure which is itself a pointer. You'd have to do:
structure2 = *((DummyStructure*)point);
But on top of that is the fact that long ints aren't the same as pointers. There's probably also a signedness issue here.

point = (long int)&structure;
This takes the address of structure which is a DummyStructure* and assign it to point. So point should be a double pointer (pointer to pointer). And when you assign structure2, it should be properly type casted.
typedef struct
{
int rate;
int duration;
} DummyStructure;
DummyStructure* structure;
DummyStructure* structure2;
long int **point;
structure = (DummyStructure*)malloc(sizeof(DummyStructure));
structure->rate = 19;
structure->duration = 92;
point = (long int **)&structure;
structure2 = (DummyStructure*)*point;
If your intention is to make structure2 point to the same memory location as structure, why don't you directly assign it rather than having an intermediate long int **.

The bug is that point is the address of structure, which is itself a pointer to a DummyStructure. In order for structure2 to point to the same thing as structure, you need to dereference point. Ignoring for a second all length, signedness, and similar issues,
structure2 = *(DummyStructure**)point;
would fix your code. But why not just:
structure2 = structure;
If you really want to hold a pointer in something generic, hold it in a void*. At least that's the right size.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Are casts as safe as unions? - c++

Related

Passing pointers to arrays of unrelated, but compatible, types without copying?

Manipulating byte vector through float pointer

What is the use of intptr_t?

How to cast char array to int at non-aligned position?

Casting long int as a struct pointer

Categories

Resources