What is the most simple and efficient why to copy an int to a boost/std::array?
The following seems to work, but I'm not sure if this is the most appropriate way to do it:
int r = rand();
boost::array<char, sizeof(int)> send_buf;
std::copy(reinterpret_cast<char*>(&r), reinterpret_cast<char*>(&r + sizeof(int)), &send_buf[0]);
Just for comparison, here's the same thing with memcpy:
#include <cstring>
int r = rand();
boost::array<char, sizeof(int)> send_buf;
std::memcpy(&send_buf[0], &r, sizeof(int));
Your call whether an explosion of casts (and the opportunity to get them wrong) is better or worse than the C++ "sin" of using a function also present in C ;-)
Personally I think memcpy is quite a good "alarm" for this kind of operation, for the same reason that C++-style casts are a good "alarm" (easy to spot while reading, easy to search for). But you might prefer to have the same alarm for everything, in which case you can cast the arguments of memcpy to void*.
Btw, I might use sizeof r for both sizes rather than sizeof(int), but it sort of depends whether the context demands that the array "is big enough for r (which happens to be an int)" or "is the same size as an int (which r happens to be)". Since it's a send buffer, I guess the buffer is the size that the wire protocol demands and r is supposed to match the buffer, rather than the other way around. So sizeof(int) is probably appropriate but 4 or PROTOCOL_INTEGER_SIZE might be more appropriate still.
The idea is correct, but you have a bug:
reinterpret_cast<char*>(&r + sizeof(int))
Should be:
reinterpret_cast<char*>(&r) + sizeof(int)
or
reinterpret_cast<char*>(&r+1)
These or a memcpy equivalent are OK. Anything else risks alignment issues.
It's common currency to use reinterpret_cast for those purposes but the Standard makes it pretty clear that static_cast via void* is perfectly acceptable. In fact in the case of a type like int then reinterpret_cast<char*>(&r) is defined to have the semantics of static_cast<char*>(static_cast<void*>(&r)). Why not be explicit and use that outright?
If you get into the habit, you have less chance in the future of using a reinterpret_cast which will end up having implementation-defined semantics rather than a static_cast chain which will always have well-defined semantics.
Do note that you're allowed to treat a pointer to a single object as if it were a pointer into an array of one (cf. 5.7/4). This is convenient for obtaining the second pointer.
int r = rand();
boost::array<char, sizeof(int)> send_buf;
auto cast = [](int* p) { return static_cast<char*>(static_cast<void*>(p)); };
std::copy(cast(&r), cast(&r + 1), &send_buf[0]);
Minor bug as pointed out by Michael Anderson
But you could do this:
#include <iostream>
union U
{
int intVal;
char charVal[sizeof(int)];
};
int main()
{
U val;
val.intVal = 6;
std::cout << (int)val.charVal[0] << ":" << (int)val.charVal[1] << ":" << (int)val.charVal[2] << ":" << (int)val.charVal[3] << "\n";
}
Related
This has been bugging me for a very long time: how to do pointer conversion from anything to char * to dump binary to disk.
In C, you don't even think about it.
double d = 3.14;
char *cp = (char *)&d;
// do what u would do to dump to disk
However, in C++, where everyone is saying C-cast is frowned upon, I've been doing this:
double d = 3.14;
auto cp = reinterpret_cast<char *>(&d);
Now this is copied from cppreference,
so I assume this is the proper way.
However, I've read from multiple sources saying this is UB.
(e.g. this one)
So I can't help wonder if there is any "DB" way at all (According to that post, there's none).
Another scenario I often encounter is to implement an API like this:
void serialize(void *buffer);
where you would dump a lot of things to this buffer. Now, I've been doing this:
void serialize(void *buffer) {
int intToDump;
float floatToDump;
int *ip = reinterpret_cast<int *>(buffer);
ip[0] = intToDump;
float *fp = reinterpret_cast<float *>(&ip[1]);
fp[0] = floatToDump;
}
Well, I guess this is UB as well.
Now, is there truly no "DB" way to accomplish either of these tasks?
I've seen someone using uintptr_t to accomplish sth similar to serialize task with pointer as integer math along with sizeof,
but I'm guessing here that it's UB as well.
Even though they are UB, compiler writers usually do the rational things to make sure everything is okay.
And I'm okay with that: it's not an unreasonable thing to ask for.
So my questions really are, for the two common tasks mentioned above:
Is there truly no "DB" way to accomplish them that will satisfy the ultimate C++ freaks?
Any better way to accomplish them other than what I've been doing?
Thanks!
Your serialize implementation's behavior is undefined because you violate the strict aliasing rules. The strict aliasing rules say, in short, that you cannot reference any object via a pointer or reference to a different type. There is one major exception to that rule though: any object may be referenced via a pointer to char, unsigned char, or (since C++17) std::byte. Note that this exception does not apply the other way around; a char array may not be accessed via a pointer to a type other than char.
That means that you can make your serialize function well-defined by changing it as so:
void serialize(char* buffer) {
int intToDump = 42;
float floatToDump = 3.14;
std::memcpy(buffer, &intToDump, sizeof(intToDump));
std::memcpy(buffer + sizeof(intToDump), &floatToDump, sizeof(floatToDump));
// Or you could do byte-by-byte manual copy loops
// i.e.
//for (std::size_t i = 0; i < sizeof(intToDump); ++i, ++buffer) {
// *buffer = reinterpret_cast<char*>(&intToDump)[i];
//}
//for (std::size_t i = 0; i < sizeof(floatToDump); ++i, ++buffer) {
// *buffer = reinterpret_cast<char*>(&floatToDump)[i];
//}
}
Here, rather than casting buffer to a pointer to an incompatible type, std::memcpy casts a pointer to the object to serialize to a pointer to unsigned char. In doing so, the strict aliasing rules are not violated, and the program's behavior remains well-defined. Note that the exact representation is still unspecified; as it will depend on your CPU's endianess.
Let's assume that A and B are two classes (or structures) having no inheritance relationships (thus, object slicing cannot work). I also have an object b of the type B. I would like to interpret its binary value as a value of type A:
A a = b;
I could use reinterpret_cast, but I would need to use pointers:
A a = reinterpret_cast<A>(b); // error: invalid cast
A a = *reinterpret_cast<A *>(&b); // correct [EDIT: see *footnote]
Is there a more compact way (without pointers) that does the same? (Including the case where sizeof(A) != sizeof(B))
Example of code that works using pointers: [EDIT: see *footnote]
#include <iostream>
using namespace std;
struct C {
int i;
string s;
};
struct S {
unsigned char data[sizeof(C)];
};
int main() {
C c;
c.i = 4;
c.s = "this is a string";
S s = *reinterpret_cast<S *>(&c);
C s1 = *reinterpret_cast<C *>(&s);
cout << s1.i << " " << s1.s << endl;
cout << reinterpret_cast<C *>(&s)->i << endl;
return 0;
}
*footnote: It worked when I tried it, but it is actually an undefined behavior (which means that it may work or not) - see comments below
No. I think there's nothing in the C++ syntax that allows you to implicitly ignore types. First, that's against the notion of static typing. Second, C++ lacks standardization at binary level. So, whatever you do to trick the compiler about the types you're using might be specific to a compiler implementation.
That being said, if you really wanna do it, you should check how your compiler's data alignment/padding works (i.e.: struct padding in c++) and if there's a way to control it (i.e.: What is the meaning of "__attribute__((packed, aligned(4))) "). If you're planning to do this across compilers (i.e.: with data transmitted across the network), then you should be extra careful. There are also platform issues, like different addressing models and endianness.
Yes, you can do it without a pointer:
A a = reinterpret_cast<A &>(b); // note the '&'
Note that this may be undefined behaviour. Check out the exact conditions at http://en.cppreference.com/w/cpp/language/reinterpret_cast
According to Effective C++, "casting object addresses to char* pointers and then using pointer arithemetic on them almost always yields undefined behavior."
Is this true for plain-old-data? for example in this template function I wrote long ago to print the bits of an object. It works splendidly on x86, but... is it portable?
#include <iostream>
template< typename TYPE >
void PrintBits( TYPE data ) {
unsigned char *c = reinterpret_cast<unsigned char *>(&data);
std::size_t i = sizeof(data);
std::size_t b;
while ( i>0 ) {
i--;
b=8;
while ( b > 0 ) {
b--;
std::cout << ( ( c[i] & (1<<b) ) ? '1' : '0' );
}
}
std::cout << "\n";
}
int main ( void ) {
unsigned int f = 0xf0f0f0f0;
PrintBits<unsigned int>( f );
return 0;
}
It certainly is not portable. Even if you stick to fundamental types, there is endianness and there is sizeof, so your function will print different results on big-endian machines, or on machines where sizeof(int) is 16 or 64. Another issue is that not all PODs are fundamental types, structs may be POD, too.
POD struct members may have internal paddings according to the implementation-defined alignment rules. So if you pass this POD struct:
struct PaddedPOD
{
char c;
int i;
}
your code would print the contents of padding, too. And that padding will be different even on the same compiler with different pragmas and options.
On the other side, maybe it's just what you wanted.
So, it's not portable, but it's not UB. There are some standard guarantees: you can copy PODs to and from array of char or unsigned char, and the result of this copying via char buffer will hold the original value. That implies that you can safely traverse that array, so your function is safe. But nobody guarantees that this array (or object representation) of objects with same type and value will be the same on different computers.
BTW, I couldn't find that passage in Effective C++. Would you quote it, pls? I could say, if a part of your code already contains lots of #ifdef thiscompilerversion, sometimes it makes sense to go all-nonstandard and use some hacks that lead to undefined behavior, but work as intended on this compiler version with this pragmas and options. In that sense, yes, casting to char * often leads to UB.
Yes, POD types can always be treated as an array of chars, of size sizeof (TYPE). POD types are just like the corresponding C types (that's what makes them "plain, old"). Since C doesn't have function overloading, writing "generic" functions to do things like write them to files or network streams depends on the ability to access them as char arrays.
I am trying to use implement the LSB lookup method suggested by Andrew Grant in an answer to this question: Position of least significant bit that is set
However, it's resulting in a segmentation fault. Here is a small program demonstrating the problem:
#include <iostream>
typedef unsigned char Byte;
int main()
{
int value = 300;
Byte* byteArray = (Byte*)value;
if (byteArray[0] > 0)
{
std::cout<< "This line is never reached. Trying to access the array index results in a seg-fault." << std::endl;
}
return 0;
}
What am I doing wrong?
I've read that it's not good practice to use 'C-Style' casts in C++. Should I use reinterpret_cast<Byte*>(value) instead? This still results in a segmentation fault, though.
Use this:
(Byte*) &value;
You don't want a pointer to address 300, you want a pointer to where 300 is stored. So, you use the address-of operator & to get the address of value.
While Erik answered your overall question, as a followup I would say emphatically -- yes, reinterpret_cast should be used rather than a C-style cast.
Byte* byteArray = reinterpret_cast<Byte*>(&value);
The line should be:
Byte* byteArray = (Byte*)&value;
You should not have to put the (void *) in front of it.
-Chert
char *array=(char*)(void*)&value;
Basically you take a pointer to the beginning of the string and recast it to a pointer to a byte.
#Erik already fixed your primary problem, but there is a subtle one that you still have. If you are only looking for the least significant bit, there is no need to bother with the cast at all.
int main()
{
int value = 300;
if (value & 0x00000001)
{
std::cout<< "LSB is set" << std::endl;
}
return 0;
}
I've been reading about strict aliasing quite a lot lately. The C/C++ standards say that the following code is invalid (undefined behavior to be correct), since the compiler might have the value of a cached somewhere and would not recognize that it needs to update the value when I update b;
float *a;
...
int *b = reinterpret_cast<int*>(a);
*b = 1;
The standard also says that char* can alias anything, so (correct me if I'm wrong) compiler would reload all cached values whenever a write access to a char* variable is made. Thus the following code would be correct:
float *a;
...
char *b = reinterpret_cast<char*>(a);
*b = 1;
But what about the cases when pointers are not involved at all? For example, I have the following code, and GCC throws warnings about strict aliasing at me.
float a = 2.4;
int32_t b = reinterpret_cast<int&>(a);
What I want to do is just to copy raw value of a, so strict aliasing shouldn't apply. Is there a possible problem here, or just GCC is overly cautious about that?
EDIT
I know there's a solution using memcpy, but it results in code that is much less readable, so I would like not to use that solution.
EDIT2
int32_t b = *reinterpret_cast<int*>(&a); also does not work.
SOLVED
This seems to be a bug in GCC.
If you want to copy some memory, you could just tell the compiler to do that:
Edit: added a function for more readable code:
#include <iostream>
using std::cout; using std::endl;
#include <string.h>
template <class T, class U>
T memcpy(const U& source)
{
T temp;
memcpy(&temp, &source, sizeof(temp));
return temp;
}
int main()
{
float f = 4.2;
cout << "f: " << f << endl;
int i = memcpy<int>(f);
cout << "i: " << i << endl;
}
[Code]
[Updated Code]
Edit: As user/GMan correctly pointed out in the comments, a full-featured implementation could check that T and U are PODs. However, given that the name of the function is still memcpy, it might be OK to rely on your developers treating it as having the same constraints as the original memcpy. That's up to your organization. Also, use the size of the destination, not the source. (Thanks, Oli.)
Basically the strict aliasing rules is "it is undefined to access memory with another type than its declared one, excepted as array of characters". So, gcc isn't overcautious.
If this is something you need to do often, you can also just use a union, which IMHO is more readable than casting or memcpy for this specific purpose:
union floatIntUnion {
float a;
int32_t b;
};
int main() {
floatIntUnion fiu;
fiu.a = 2.4;
int32_t &x = fiu.b;
cout << x << endl;
}
I realize that this doesn't really answer your question about strict-aliasing, but I think this method makes the code look cleaner and shows your intent better.
And also realize that even doing the copies correctly, there is no guarantee that the int you get out will correspond to the same float on other platforms, so count any network/file I/O of these floats/ints out if you plan to create a cross-platform project.