I am trying to insert a uint16_t value into a uint8_t array using pointers. I would think below would work, but haven't been able to do it. Any clues as to what the problem is?
uint8_t myarray[10];
uint16_t value = 10000;
uint16_t * myptr = (uint16_t *)(myarray+2);
*myptr = value;
I know I can do it like so, but why doesn't above work?
uint8_t myarray[10];
uint16_t value = 10000;
uint8_t * myptr = (myarray+2);
uint8_t * myptr2 =(myarray+3);
*myptr = value>>8;
*myptr2 =value;
The second version writes the most significant byte (with value 39) to myarray[2], and the least significant (with value 16) to myarray[3].
The first version will write the two bytes in an order determined by the endianness of your computer. Most modern computers are little-endian, meaning that the least significant byte of a multi-byte integer value comes first in memory - so this version will write the two bytes in the opposite order to the other version.
I'm assuming that that's the problem you're seeing; if it's something else, then please be more specific than "haven't been able to do it".
Also, the first version technically has undefined behaviour, and might do something completely unexpected on a sufficiently exotic computer. I suggest that you stick to well-defined code like the second version; only use dubious optimisations if profiling reveals both that the well-defined code is too slow, and that the dodgy pointer-aliasing code is faster. I would also suggest using reinterpret_cast rather than the evil C-style cast; it wouldn't change the behaviour, but it would be easier to see that there's something wonky going on.
You can do it like this:
uint8_t * value_data = reinterpret_cast<uint8_t*>(&value); // cast to `(unsigned) char*` is allowed by standard
myarray[0] = value_data[0];
myarray[1] = value_data[1];
Related
I Need to switch the order of bytes so that an int16 with contents (byte1, byte2) -> (byte2, byte1). I did this using a union:
union ConversionUnion
{
uint8_t m_8[4];
uint16_t m_16[2];
uint32_t m_32;
};
//use
uint16_t example = 0xFFDE
ConversionUnion converter;
converter.m_16[0] = example;
std::swap(converter.m_8[0], converter.m_8[1]);
example = converter.m_16[0]; //0xDEFF
Now this does work on gcc, but i have been informed that this is undefined behavior (gcc 6.3, C++11).
Questions:
1) Is this really undefined behavior, I ask because i've seen this before in embedded code. Other stackoverflow questions seem to debate this, who's actually correct (for C++11 & C++14).
2) If this is undefined behavior, can byte order swapping be done without a bunch of bit shifting in a portable way. I really hate bit shifting, its horribly ugly.
Type punning is allowed via char*, so why not just use that rather than a union?
uint16_t example = 0xFFDE;
char *char_alias = reinterpret_cast<char*>(&example);
std::swap(char_alias[0], char_alias[1]);
Relying on undefined behavior or obscure semantics of the language (union) is not necessarily more idiomatic or easier to read. I find this loop much easier to parse:
uint32_t example = 0xc001c0de;
unsigned char *p = reinterpret_cast<unsigned char*>(&example);
for (size_t low = 0, high = sizeof(example) - 1;
high > low;
++low, --high)
{
std::swap(p[low], p[high]);
}
People disagree to some extent about this one. I think that
it's undefined behavior
but having my opinion is not a valuable addition.
Byte swapping is easy with unsigned types (BTW byte-swapping signed types doesn't make sense). Just extract individual bytes and rearrange. Hide the ugliness in a constexpr function or a macro.
constexpr uint16_t bswap(uint16_t value);
{
uint16_t high_byte = (value >> 8) & 0xff;
uint16_t low_byte = value & 0xff;
return (low_byte << 8) | high_byte;
}
BTW if you see something in embedded code, and it works, this isn't an indication that it's safe! Embedded code often sacrifices portability for efficiency, sometimes using undefined behavior where it was the only way to convince a particular compiler to generate efficient code.
Is there a way in C/C++ to cast a char array to an int at any position?
I tried the following, bit it automatically aligns to the nearest 32 bits (on a 32 bit architecture) if I try to use pointer arithmetic with non-const offsets:
unsigned char data[8];
data[0] = 0; data[1] = 1; ... data[7] = 7;
int32_t p = 3;
int32_t d1 = *((int*)(data+3)); // = 0x03040506 CORRECT
int32_t d2 = *((int*)(data+p)); // = 0x00010203 WRONG
Update:
As stated in the comments the input comes in tuples of 3 and I cannot
change that.
I want to convert 3 values to an int for further
processing and this conversion should be as fast as possible.
The
solution does not have to be cross platform. I am working with a very
specific compiler and processor, so it can be assumed that it is a 32
bit architecture with big endian.
The lowest byte of the result does not matter to me (see above).
My main questions at the moment are: Why has d1 the correct value but d2 does not? Is this also true for other compilers? Can this behavior be changed?
No you can't do that in a portable way.
The behaviour encountered when attempting a cast from char* to int* is undefined in both C and C++ (possibly for the very reasons that you've spotted: ints are possibly aligned on 4 byte boundaries and data is, of course, contiguous.)
(The fact that data+3 works but data+p doesn't is possibly due to to compile time vs. runtime evaluation.)
Also note that the signed-ness of char is not specified in either C or C++ so you should use signed char or unsigned char if you're writing code like this.
Your best bet is to use bitwise shift operators (>> and <<) and logical | and & to absorb char values into an int. Also consider using int32_tin case you build to targets with 16 or 64 bit ints.
There is no way, converting a pointer to a wrongly aligned one is undefined.
You can use memcpy to copy the char array into an int32_t.
int32_t d = 0;
memcpy(&d, data+3, 4); // assuming sizeof(int) is 4
Most compilers have built-in functions for memcpy with a constant size argument, so it's likely that this won't produce any runtime overhead.
Even though a cast like you've shown is allowed for correctly aligned pointers, dereferencing such a pointer is a violation of strict aliasing. An object with an effective type of char[] must not be accessed through an lvalue of type int.
In general, type-punning is endianness-dependent, and converting a char array representing RGB colours is probably easier to do in an endianness-agnostic way, something like
int32_t d = (int32_t)data[2] << 16 | (int32_t)data[1] << 8 | data[0];
I know that the most common method to test endianity programmatically is to cast to char* like this:
short temp = 0x1234;
char* tempChar = (char*)&temp;
But can it be done by casting to short* like this:
unsigned char test[2] = {1,0};
if ( *(short *)test == 1)
//Little-Endian
else
//Big-Endian
Am I right that the "test" buffer will be saved (on x86 platforms) in the memory using Little-Endian convention (from right-to-left: "0" at lower address, "1" at higher) just like in case with the "temp" var?
And more generally if I have a string:
char tab[] = "abcdef";
How would it be stored in the memory? Will it be reversed like: "fedcba"?
Thx. in advance:-)
PS.
Is there any way to see how exactly the data of a program look like in the system memory?.
I would like to see that byte-swap in Little-Endian in "real life".
Your code would probably work in practice (you could have just tried it!). However, technically, it invokes undefined behaviour; the standard doesn't allow you to access a char array through a pointer of another type.
And more generally if I have a string: char tab[] = "abcdef"; How would it be stored in the memory? Will it be reversed like: "fedcba"?
No. Otherwise tab[0] would give you f.
Your alternative method for checking endianness would work.
char tab[] = "abcdef" would be stored in that same order: abcdef
Endianness comes into play when you access multiple bytes (short, int, and so on). When you try to access tab[] as a short array using a little endian machine, you'd read it as ba, dc, fe (whatever their actual byte equivalents are, this is the order the chars are "evaluated" in the short).
It would be safer, i.e. standards-compliant, to use a union.
Both ways are not guaranteed to work, furthermore, latter invokes undefined behavior.
First fails if sizeof(char) == sizeof(short).
Second may fail for the same reason, and is also unsafe: result of the pointer cast may have wrong alignment for short, and accessing the (short) value invokes undefined behavior (3.10.15).
But yes, the char buffer is stored sequentially into memory so that &test[0] < &test[1],
and more generally, as others have already said, char tab[] = "abcdef" is not reversed or otherwise permuted regardless of endianness.
Would the following be the most efficient way to get an int16 (short) value from a byte array?
inline __int16* ReadINT16(unsigned char* ByteArray,__int32 Offset){
return (__int16*)&ByteArray[Offset];
};
If the byte array contains a dump of the bytes in the same endian format as the machine, this code is being called on. Alternatives are welcome.
It depends on what you mean by "efficient", but note that in some architectures this method will fail if Offset is odd, since the resulting 16 bit int will be misaligned and you will get an exception when you subsequently try to access it. You should only use this method if you can guarantee that Offset is even, e.g.
inline int16_t ReadINT16(uint8_t *ByteArray, int32_t Offset){
assert((Offset & 1) == 0); // Offset must be multiple of 2
return *(int16_t*)&ByteArray[Offset];
};
Note also that I've changed this slightly so that it returns a 16 bit value directly, since returning a pointer and then subsequently de-referencing it will most likely less "efficient" than just returning a 16 bit value directly. I've also switched to standard Posix types for integers - I recommend you do the same.
I'm surprised no one has suggested this yet for a solution that is both alignment safe and correct across all architectures. (well, any architecture where there are 8 bits to a byte).
inline int16_t ReadINT16(uint8_t *ByteArray, int32_t Offset)
{
int16_t result;
memcpy(&result, ByteArray+Offset, sizeof(int16_t));
return result;
};
And I suppose the overhead of memcpy could be avoided:
inline int16_t ReadINT16(uint8_t *ByteArray, int32_t Offset)
{
int16_t result;
uint8_t* ptr1=(uint8_t*)&result;
uint8_t* ptr2 = ptr1+1;
*ptr1 = *ByteArray;
*ptr2 = *(ByteArray+1);
return result;
};
I believe alignment issues don't generate exceptions on x86. And if I recall, Windows (when it ran on Dec Alpha and others) would trap the alignment exception and fix it up (at a modest perf hit). And I do remember learning the hard way that Sparc on SunOS just flat out crashes when you have an alignment issue.
inline __int16* ReadINT16(unsigned char* ByteArray,__int32 Offset)
{
return (__int16*)&ByteArray[Offset];
};
Unfortunately this has undefined behavour in C++, because you are accessing storage using two different types which is not allowed under the strict aliasing rules. You can access the storage of a type using a char*, but not the other way around.
From previous questions I asked, the only safe way really is to use memcpy to copy the bytes into an int and then use that. (Which will likely be optimised to the same code you'd hope anyway, so just looks horribly inefficient).
Your code will probably work, and most people seem to do this... But the point is that you can't go crying to your compiler vendor when one day it generates code that doesn't do what you'd hope.
I see no problem with this, that's exactly what I'd do. As long as the byte array is safe to access and you make sure that the offset is correct (shorts are 2 bytes so you may want to make sure that they can't do odd offsets or something like that)
I have been working on a legacy C++ application and am definitely outside of my comfort-zone (a good thing). I was wondering if anyone out there would be so kind as to give me a few pointers (pun intended).
I need to cast 2 bytes in an unsigned char array to an unsigned short. The bytes are consecutive.
For an example of what I am trying to do:
I receive a string from a socket and place it in an unsigned char array. I can ignore the first byte and then the next 2 bytes should be converted to an unsigned char. This will be on windows only so there are no Big/Little Endian issues (that I am aware of).
Here is what I have now (not working obviously):
//packetBuffer is an unsigned char array containing the string "123456789" for testing
//I need to convert bytes 2 and 3 into the short, 2 being the most significant byte
//so I would expect to get 515 (2*256 + 3) instead all the code I have tried gives me
//either errors or 2 (only converting one byte
unsigned short myShort;
myShort = static_cast<unsigned_short>(packetBuffer[1])
Well, you are widening the char into a short value. What you want is to interpret two bytes as an short. static_cast cannot cast from unsigned char* to unsigned short*. You have to cast to void*, then to unsigned short*:
unsigned short *p = static_cast<unsigned short*>(static_cast<void*>(&packetBuffer[1]));
Now, you can dereference p and get the short value. But the problem with this approach is that you cast from unsigned char*, to void* and then to some different type. The Standard doesn't guarantee the address remains the same (and in addition, dereferencing that pointer would be undefined behavior). A better approach is to use bit-shifting, which will always work:
unsigned short p = (packetBuffer[1] << 8) | packetBuffer[2];
This is probably well below what you care about, but keep in mind that you could easily get an unaligned access doing this. x86 is forgiving and the abort that the unaligned access causes will be caught internally and will end up with a copy and return of the value so your app won't know any different (though it's significantly slower than an aligned access). If, however, this code will run on a non-x86 (you don't mention the target platform, so I'm assuming x86 desktop Windows), then doing this will cause a processor data abort and you'll have to manually copy the data to an aligned address before trying to cast it.
In short, if you're going to be doing this access a lot, you might look at making adjustments to the code so as not to have unaligned reads and you'll see a perfromance benefit.
unsigned short myShort = *(unsigned short *)&packetBuffer[1];
The bit shift above has a bug:
unsigned short p = (packetBuffer[1] << 8) | packetBuffer[2];
if packetBuffer is in bytes (8 bits wide) then the above shift can and will turn packetBuffer into a zero, leaving you with only packetBuffer[2];
Despite that this is still preferred to pointers. To avoid the above problem, I waste a few lines of code (other than quite-literal-zero-optimization) it results in the same machine code:
unsigned short p;
p = packetBuffer[1]; p <<= 8; p |= packetBuffer[2];
Or to save some clock cycles and not shift the bits off the end:
unsigned short p;
p = (((unsigned short)packetBuffer[1])<<8) | packetBuffer[2];
You have to be careful with pointers, the optimizer will bite you, as well as memory alignments and a long list of other problems. Yes, done right it is faster, done wrong the bug can linger for a long time and strike when least desired.
Say you were lazy and wanted to do some 16 bit math on an 8 bit array. (little endian)
unsigned short *s;
unsigned char b[10];
s=(unsigned short *)&b[0];
if(b[0]&7)
{
*s = *s+8;
*s &= ~7;
}
do_something_With(b);
*s=*s+8;
do_something_With(b);
*s=*s+8;
do_something_With(b);
There is no guarantee that a perfectly bug free compiler will create the code you expect. The byte array b sent to the do_something_with() function may never get modified by the *s operations. Nothing in the code above says that it should. If you don't optimize your code then you may never see this problem (until someone does optimize or changes compilers or compiler versions). If you use a debugger you may never see this problem (until it is too late).
The compiler doesn't see the connection between s and b, they are two completely separate items. The optimizer may choose not to write *s back to memory because it sees that *s has a number of operations so it can keep that value in a register and only save it to memory at the end (if ever).
There are three basic ways to fix the pointer problem above:
Declare s as volatile.
Use a union.
Use a function or functions whenever changing types.
You should not cast a unsigned char pointer into an unsigned short pointer (for that matter cast from a pointer of smaller data type to a larger data type). This is because it is assumed that the address will be aligned correctly. A better approach is to shift the bytes into a real unsigned short object, or memcpy to a unsigned short array.
No doubt, you can adjust the compiler settings to get around this limitation, but this is a very subtle thing that will break in the future if the code gets passed around and reused.
Maybe this is a very late solution but i just want to share with you. When you want to convert primitives or other types you can use union. See below:
union CharToStruct {
char charArray[2];
unsigned short value;
};
short toShort(char* value){
CharToStruct cs;
cs.charArray[0] = value[1]; // most significant bit of short is not first bit of char array
cs.charArray[1] = value[0];
return cs.value;
}
When you create an array with below hex values and call toShort function, you will get a short value with 3.
char array[2];
array[0] = 0x00;
array[1] = 0x03;
short i = toShort(array);
cout << i << endl; // or printf("%h", i);
static cast has a different syntax, plus you need to work with pointers, what you want to do is:
unsigned short *myShort = static_cast<unsigned short*>(&packetBuffer[1]);
Did nobody see the input was a string!
/* If it is a string as explicitly stated in the question.
*/
int byte1 = packetBuffer[1] - '0'; // convert 1st byte from char to number.
int byte2 = packetBuffer[2] - '0';
unsigned short result = (byte1 * 256) + byte2;
/* Alternatively if is an array of bytes.
*/
int byte1 = packetBuffer[1];
int byte2 = packetBuffer[2];
unsigned short result = (byte1 * 256) + byte2;
This also avoids the problems with alignment that most of the other solutions may have on certain platforms. Note A short is at least two bytes. Most systems will give you a memory error if you try and de-reference a short pointer that is not 2 byte aligned (or whatever the sizeof(short) on your system is)!
char packetBuffer[] = {1, 2, 3};
unsigned short myShort = * reinterpret_cast<unsigned short*>(&packetBuffer[1]);
I (had to) do this all the time. big endian is an obvious problem. What really will get you is incorrect data when the machine dislike misaligned reads! (and write).
you may want to write a test cast and an assert to see if it reads properly. So when ran on a big endian machine or more importantly a machine that dislikes misaligned reads an assert error will occur instead of a weird hard to trace 'bug' ;)
On windows you can use:
unsigned short i = MAKEWORD(lowbyte,hibyte);
I realize this is an old thread, and I can't say that I tried every suggestion made here. I'm just making my self comfortable with mfc, and I was looking for a way to convert a uint to two bytes, and back again at the other end of a socket.
There are alot of bit shifting examples you can find on the net, but none of them seemed to actually work. Alot of the examples seem overly complicated; I mean we're just talking about grabbing 2 bytes out of a uint, sending them over the wire, and plugging them back into a uint at the other end, right?
This is the solution I finally came up with:
class ByteConverter
{
public:
static void uIntToBytes(unsigned int theUint, char* bytes)
{
unsigned int tInt = theUint;
void *uintConverter = &tInt;
char *theBytes = (char*)uintConverter;
bytes[0] = theBytes[0];
bytes[1] = theBytes[1];
}
static unsigned int bytesToUint(char *bytes)
{
unsigned theUint = 0;
void *uintConverter = &theUint;
char *thebytes = (char*)uintConverter;
thebytes[0] = bytes[0];
thebytes[1] = bytes[1];
return theUint;
}
};
Used like this:
unsigned int theUint;
char bytes[2];
CString msg;
ByteConverter::uIntToBytes(65000,bytes);
theUint = ByteConverter::bytesToUint(bytes);
msg.Format(_T("theUint = %d"), theUint);
AfxMessageBox(msg, MB_ICONINFORMATION | MB_OK);
Hope this helps someone out.