Portable conversion from unsigned to signed in C++ - c++

I have std::vector<unsigned short int> vals for which I need to invert the order of the bytes (assume 2) and store them as short int. I was doing it as:
std::vector<short int> result;
for(unsigned short int& x : vals) {
x = ((x << 8) | (x >> 8));
result.push_back(static_cast<short int>(x));
}
Reading online, I find that static_cast has implementation-defined behavior. I also found std::bit_cast, which preserves the bits and interprets them in the new type.
Does that mean that using std::bit_cast<short int>(x) above should be preferred over static_cast?
I tried, and both give the same results for me. Is it correct to assume that bit_cast will give the same results to anyone else using my code, while static_cast could give them different results?

If you want to handle bytes - you should not convert them to a non byte type (especially not short int and friends because they does not need to be 16 bits exactly).
You should read the bytes as an array of char or std::byte. Then you can swap those values in safe and portable manner.
Converting those bytes to a numeric type is not doable in portable way.

Related

Understanding typecasting(pointers)

I am reading the Beej's Guide to network programming book and I am having trouble understanding a function. The function expects a char * pointer but it dereferences the pointer and casts it to a (unsigned long int) and perform some bitwise operations. Why couldn't we just pass it as a
(unsigned int *) instead of (unsigned char *). Also if the parameter was replaced by (void *) and then inside code we did some thing like:
*(unsigned long int *)buf[0] << 24
will we get the same result? (Sorry this is my first time asking a question here so let me know if any more info is required).
unsigned long int unpacku32(unsigned char *buf)
{
return ((unsigned long int)buf[0]<<24) |
((unsigned long int)buf[1]<<16) |
((unsigned long int)buf[2]<< 8) |
buf[3];
}
What you're suggesting is not guaranteed to work. Unless buf points to an actual unsigned long, you're attempting to read an object of one type as another which is not allowed (unless you're reading as an unsigned char). There could be further issues if the pointer value you create is not properly aligned for its type.
Then there is also the issue of endianness. Bytes sent over a network are typically sent in big-endian format, i.e. most significant byte first. If your system is little-endian, it will interpret the bytes in the reverse order.
The function you posted demonstrates the proper way of deserializing an unsigned long from a byte buffer in a standard compliant manner.
That would make it dependable on the endianness of the platform. So we pick out the parts from the defined order to make it platform neutral.
buf[0] is treated as 8 bit unsigned value. If we do this:
(unsigned long int)buf[0] << 24, by casting we tell to treat it not as 8 bit value, but as 64 bit so we got more space to work with.
We shifted only buf[0], buf[1] and other fields are not considered during shifting process.
If you want to convert to unsigned long lets say a string "aabbccd" and we don't care about endianness we can do this like below:
char* str = const_cast<char *>("aabbccd\0");
unsigned long value = *(reinterpret_cast<unsigned long *>(str));
std::cout << value << std::endl;
std::cout << reinterpret_cast<char *>(&value) << std::endl;
It should be pointed, unsigned long can store up to 8 chars only, because its 64 bit integer.
However if many platforms are going to use same data, doing it like this maybe be not enough due to endianness. The approach given in your book is as someone mentioned platform neutral.
The function expects a char * pointer but it dereferences the
pointer and casts it to a (unsigned long int) and perform some
bitwise operations.
Actually, what the code does is use the array index operator to pull out the first byte from the buffer, casts that to an unsigned long int, and then does some bitwise operations. The pointer that's dereferenced is an unsigned char * not anything to do with long integers.
Why couldn't we just pass it as a (unsigned int *) instead of
(unsigned char *).
Because it isn't a pointer to any kind of integer. It's a pointer to a buffer of unsigned char, i.e. bytes. Treating a pointer as if it were a pointer to a different type is likely to lead to a violation of the "Strict Aliasing Rule" (which I encourage you to read about).
Also if the parameter was replaced by (void *) and then inside code we
did some thing like *(unsigned long int *)buf[0] << 24 will we get
the same result?
No. If you define buf as a void*, then buf[0] is a meaningless expression. If buf is defined as, or cast to, an unsigned long int *, then buf[0] is an unsigned long int, not the unsigned char that the algorithm is expecting. There will almost certainly be too many bits set (as many as 64, not 8) and the result of the expression will be invalid.

How would I put 4 chars into a single int? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I am working on a replica GM buffer system in C++ to get familiar with bits and such, and it's not bad, but I've run into a problem. How do I push 4 different chars into an int? I am not the best at bitwise stuff, I've never used it. I've got no idea how to.
In the thing, I have an array of chars; of size byteArraySize and when I call the grab int function, it would take the bytes from bufferPointer + 4 to bufferPointer; backwards to grab the int properly.
I read a bit on bitshifting (lol), and I thought I could like bitshift every char's bits i to the right. I've just got no clue where to start.
Any help is greatly appreciated.
Pedantically, in pure standard C++14 or C++11, you probably cannot.
AFAIK, nothing forbids an hypothetical C++14 implementation to have all of char, short, unsigned short, int, unsigned int, long, unsigned long, long long, unsigned long long to be the same types (at least the same internal representation), and be all 64 bits, (or 96 bits, or 128 bits) and all of sizeof 1. The recent C and C++ standards mandate that long long has at least 64 bits.
IIRC, some weird C implementation above some Common Lisp is doing similar things.
But of course, there is no such C++14 implementation in practice.
In practice, on most implementations, char-s are 8 bits bytes (perhaps signed perhaps unsigned) and int-s are often 32 bits words (e.g. are std::int32_t), and you obviously could code
inline int pack4chars(char c1, char c2, char c3, char c4) {
return ((int)(((unsigned char)c1) << 24)
| (int)(((unsigned char)c2) << 16)
| (int)(((unsigned char)c3) << 8)
| (int)((unsigned char)c4));
}
The cast to (unsigned char) is needed because some implementations have signed char-s and others have unsigned ones.
Read also about endianness, serialization, htonl(3)
Yes, you can pack 4 chars (actually sizeof(int) chars) into an int. Here's how you could do it:
unsigned int packChars(unsigned char *c)
{
unsigned int val = u0;
for (size_t idx = 0; idx < sizeof(unsigned int); ++idx) {
val |= c[idx] << (idx * CHAR_BIT);
}
}
I'm using unsigned types, because bit shifting gets tricky when sign bits are involved. Also not that the code above is intentionally generic in the sizes used: sizeof(unsigned int) gives you the number of char units which fit into unsigned int, and CHAR_BIT specifies the number of bits in a char.
First of all you should be aware, that sizeof(int) does not have to be 4 *sizeof(char). Standard only guarantees, that sizeof(int) >= sizeof(char) and nothing more.
In fact int can be the same size with char's size (or bigger), But you never know, unless you find this out.
One possible solution is to use a union which have all its members aligned from the same offset in memory.
Example:
union Color
{
std::uint32_t m_rgba;
struct
{
std::uint8_t m_a;
std::uint8_t m_b;
std::uint8_t m_g;
std::uint8_t m_r;
};
};
Color white = { 0xffffffff };

How to cast char array to int at non-aligned position?

Is there a way in C/C++ to cast a char array to an int at any position?
I tried the following, bit it automatically aligns to the nearest 32 bits (on a 32 bit architecture) if I try to use pointer arithmetic with non-const offsets:
unsigned char data[8];
data[0] = 0; data[1] = 1; ... data[7] = 7;
int32_t p = 3;
int32_t d1 = *((int*)(data+3)); // = 0x03040506 CORRECT
int32_t d2 = *((int*)(data+p)); // = 0x00010203 WRONG
Update:
As stated in the comments the input comes in tuples of 3 and I cannot
change that.
I want to convert 3 values to an int for further
processing and this conversion should be as fast as possible.
The
solution does not have to be cross platform. I am working with a very
specific compiler and processor, so it can be assumed that it is a 32
bit architecture with big endian.
The lowest byte of the result does not matter to me (see above).
My main questions at the moment are: Why has d1 the correct value but d2 does not? Is this also true for other compilers? Can this behavior be changed?
No you can't do that in a portable way.
The behaviour encountered when attempting a cast from char* to int* is undefined in both C and C++ (possibly for the very reasons that you've spotted: ints are possibly aligned on 4 byte boundaries and data is, of course, contiguous.)
(The fact that data+3 works but data+p doesn't is possibly due to to compile time vs. runtime evaluation.)
Also note that the signed-ness of char is not specified in either C or C++ so you should use signed char or unsigned char if you're writing code like this.
Your best bet is to use bitwise shift operators (>> and <<) and logical | and & to absorb char values into an int. Also consider using int32_tin case you build to targets with 16 or 64 bit ints.
There is no way, converting a pointer to a wrongly aligned one is undefined.
You can use memcpy to copy the char array into an int32_t.
int32_t d = 0;
memcpy(&d, data+3, 4); // assuming sizeof(int) is 4
Most compilers have built-in functions for memcpy with a constant size argument, so it's likely that this won't produce any runtime overhead.
Even though a cast like you've shown is allowed for correctly aligned pointers, dereferencing such a pointer is a violation of strict aliasing. An object with an effective type of char[] must not be accessed through an lvalue of type int.
In general, type-punning is endianness-dependent, and converting a char array representing RGB colours is probably easier to do in an endianness-agnostic way, something like
int32_t d = (int32_t)data[2] << 16 | (int32_t)data[1] << 8 | data[0];

size of a hex pattern in cpp

I have a hex pattern stored in a variable, how to do I know what is the size of the hex pattern
E.g. --
#define MY_PATTERN 0xFFFF
now I want to know the size of MY_PATTERN, to use somewhere in my code.
sizeof (MY_PATTERN)
this is giving me warning -- "integer conversion resulted in truncation".
How can I fix this ? What is the way I should write it ?
The pattern can increase or decrease in size so I can't hard code it.
Don't do it.
There's no such thing in C++ as a "hex pattern". What you actually use is an integer literal. See paragraph "The type of the literal". Thus, sizeof (0xffff) is equal to sizeof(int). And the bad thing is: the exact size may vary.
From the design point of view, I can't really think of a situation where such a solution is acceptable. You're not even deriving a type from a literal value, which would be a suspicious as well, but at least, a typesafe solution. Sizes of values are mostly used in operations working with memory buffers directly, like memcpy() or fwrite(). Sizes defined in such indirect ways lead to a very brittle binary interface and maintenance difficulties. What if you compile a program on both x86 and Motorola 68000 machines and want them to interoperate via a network protocol, or want to write some files on the first machine, and read them on another? sizeof(int) is 4 for the first and 2 for the second. It will break.
Instead, explicitly use the exactly sized types, like int8_t, uint32_t, etc. They're defined in the <cstdint> header.
This will solve your problem:
#define MY_PATTERN 0xFFFF
struct TypeInfo
{
template<typename T>
static size_t SizeOfType(T) { return sizeof(T); }
};
void main()
{
size_t size_of_type = TypeInfo::SizeOfType(MY_PATTERN);
}
as pointed out by Nighthawk441 you can just do:
sizeof(MY_PATTERN);
Just make sure to use a size_t wherever you are getting a warning and that should solve your problem.
You could explicitly typedef various types to hold hex numbers with restricted sizes such that:
typedef unsigned char one_byte_hex;
typedef unsigned short two_byte_hex;
typedef unsigned int four_byte_hex;
one_byte_hex pattern = 0xFF;
two_byte_hex bigger_pattern = 0xFFFF;
four_byte_hex big_pattern = 0xFFFFFFFF;
//sizeof(pattern) == 1
//sizeof(bigger_pattern) == 2
//sizeof(biggest_pattern) == 4
four_byte_hex new_pattern = static_cast<four_byte_hex>(pattern);
//sizeof(new_pattern) == 4
It would be easier to just treat all hex numbers as unsigned ints regardless of pattern used though.
Alternatively, you could put together a function which checks how many times it can shift the bits of the pattern until it's 0.
size_t sizeof_pattern(unsigned int pattern)
{
size_t bits = 0;
size_t bytes = 0;
unsigned int tmp = pattern;
while(tmp >> 1 != 0){
bits++;
tmp = tmp >> 1;
}
bytes = (bits + 1) / 8; //add 1 to bits to shift range from 0-31 to 1-32 so we can divide properly. 8 bits per byte.
if((bits + 1) % 8 != 0){
bytes++; //requires one more byte to store value since we have remaining bits.
}
return bytes;
}

C++: how to cast 2 bytes in an array to an unsigned short

I have been working on a legacy C++ application and am definitely outside of my comfort-zone (a good thing). I was wondering if anyone out there would be so kind as to give me a few pointers (pun intended).
I need to cast 2 bytes in an unsigned char array to an unsigned short. The bytes are consecutive.
For an example of what I am trying to do:
I receive a string from a socket and place it in an unsigned char array. I can ignore the first byte and then the next 2 bytes should be converted to an unsigned char. This will be on windows only so there are no Big/Little Endian issues (that I am aware of).
Here is what I have now (not working obviously):
//packetBuffer is an unsigned char array containing the string "123456789" for testing
//I need to convert bytes 2 and 3 into the short, 2 being the most significant byte
//so I would expect to get 515 (2*256 + 3) instead all the code I have tried gives me
//either errors or 2 (only converting one byte
unsigned short myShort;
myShort = static_cast<unsigned_short>(packetBuffer[1])
Well, you are widening the char into a short value. What you want is to interpret two bytes as an short. static_cast cannot cast from unsigned char* to unsigned short*. You have to cast to void*, then to unsigned short*:
unsigned short *p = static_cast<unsigned short*>(static_cast<void*>(&packetBuffer[1]));
Now, you can dereference p and get the short value. But the problem with this approach is that you cast from unsigned char*, to void* and then to some different type. The Standard doesn't guarantee the address remains the same (and in addition, dereferencing that pointer would be undefined behavior). A better approach is to use bit-shifting, which will always work:
unsigned short p = (packetBuffer[1] << 8) | packetBuffer[2];
This is probably well below what you care about, but keep in mind that you could easily get an unaligned access doing this. x86 is forgiving and the abort that the unaligned access causes will be caught internally and will end up with a copy and return of the value so your app won't know any different (though it's significantly slower than an aligned access). If, however, this code will run on a non-x86 (you don't mention the target platform, so I'm assuming x86 desktop Windows), then doing this will cause a processor data abort and you'll have to manually copy the data to an aligned address before trying to cast it.
In short, if you're going to be doing this access a lot, you might look at making adjustments to the code so as not to have unaligned reads and you'll see a perfromance benefit.
unsigned short myShort = *(unsigned short *)&packetBuffer[1];
The bit shift above has a bug:
unsigned short p = (packetBuffer[1] << 8) | packetBuffer[2];
if packetBuffer is in bytes (8 bits wide) then the above shift can and will turn packetBuffer into a zero, leaving you with only packetBuffer[2];
Despite that this is still preferred to pointers. To avoid the above problem, I waste a few lines of code (other than quite-literal-zero-optimization) it results in the same machine code:
unsigned short p;
p = packetBuffer[1]; p <<= 8; p |= packetBuffer[2];
Or to save some clock cycles and not shift the bits off the end:
unsigned short p;
p = (((unsigned short)packetBuffer[1])<<8) | packetBuffer[2];
You have to be careful with pointers, the optimizer will bite you, as well as memory alignments and a long list of other problems. Yes, done right it is faster, done wrong the bug can linger for a long time and strike when least desired.
Say you were lazy and wanted to do some 16 bit math on an 8 bit array. (little endian)
unsigned short *s;
unsigned char b[10];
s=(unsigned short *)&b[0];
if(b[0]&7)
{
*s = *s+8;
*s &= ~7;
}
do_something_With(b);
*s=*s+8;
do_something_With(b);
*s=*s+8;
do_something_With(b);
There is no guarantee that a perfectly bug free compiler will create the code you expect. The byte array b sent to the do_something_with() function may never get modified by the *s operations. Nothing in the code above says that it should. If you don't optimize your code then you may never see this problem (until someone does optimize or changes compilers or compiler versions). If you use a debugger you may never see this problem (until it is too late).
The compiler doesn't see the connection between s and b, they are two completely separate items. The optimizer may choose not to write *s back to memory because it sees that *s has a number of operations so it can keep that value in a register and only save it to memory at the end (if ever).
There are three basic ways to fix the pointer problem above:
Declare s as volatile.
Use a union.
Use a function or functions whenever changing types.
You should not cast a unsigned char pointer into an unsigned short pointer (for that matter cast from a pointer of smaller data type to a larger data type). This is because it is assumed that the address will be aligned correctly. A better approach is to shift the bytes into a real unsigned short object, or memcpy to a unsigned short array.
No doubt, you can adjust the compiler settings to get around this limitation, but this is a very subtle thing that will break in the future if the code gets passed around and reused.
Maybe this is a very late solution but i just want to share with you. When you want to convert primitives or other types you can use union. See below:
union CharToStruct {
char charArray[2];
unsigned short value;
};
short toShort(char* value){
CharToStruct cs;
cs.charArray[0] = value[1]; // most significant bit of short is not first bit of char array
cs.charArray[1] = value[0];
return cs.value;
}
When you create an array with below hex values and call toShort function, you will get a short value with 3.
char array[2];
array[0] = 0x00;
array[1] = 0x03;
short i = toShort(array);
cout << i << endl; // or printf("%h", i);
static cast has a different syntax, plus you need to work with pointers, what you want to do is:
unsigned short *myShort = static_cast<unsigned short*>(&packetBuffer[1]);
Did nobody see the input was a string!
/* If it is a string as explicitly stated in the question.
*/
int byte1 = packetBuffer[1] - '0'; // convert 1st byte from char to number.
int byte2 = packetBuffer[2] - '0';
unsigned short result = (byte1 * 256) + byte2;
/* Alternatively if is an array of bytes.
*/
int byte1 = packetBuffer[1];
int byte2 = packetBuffer[2];
unsigned short result = (byte1 * 256) + byte2;
This also avoids the problems with alignment that most of the other solutions may have on certain platforms. Note A short is at least two bytes. Most systems will give you a memory error if you try and de-reference a short pointer that is not 2 byte aligned (or whatever the sizeof(short) on your system is)!
char packetBuffer[] = {1, 2, 3};
unsigned short myShort = * reinterpret_cast<unsigned short*>(&packetBuffer[1]);
I (had to) do this all the time. big endian is an obvious problem. What really will get you is incorrect data when the machine dislike misaligned reads! (and write).
you may want to write a test cast and an assert to see if it reads properly. So when ran on a big endian machine or more importantly a machine that dislikes misaligned reads an assert error will occur instead of a weird hard to trace 'bug' ;)
On windows you can use:
unsigned short i = MAKEWORD(lowbyte,hibyte);
I realize this is an old thread, and I can't say that I tried every suggestion made here. I'm just making my self comfortable with mfc, and I was looking for a way to convert a uint to two bytes, and back again at the other end of a socket.
There are alot of bit shifting examples you can find on the net, but none of them seemed to actually work. Alot of the examples seem overly complicated; I mean we're just talking about grabbing 2 bytes out of a uint, sending them over the wire, and plugging them back into a uint at the other end, right?
This is the solution I finally came up with:
class ByteConverter
{
public:
static void uIntToBytes(unsigned int theUint, char* bytes)
{
unsigned int tInt = theUint;
void *uintConverter = &tInt;
char *theBytes = (char*)uintConverter;
bytes[0] = theBytes[0];
bytes[1] = theBytes[1];
}
static unsigned int bytesToUint(char *bytes)
{
unsigned theUint = 0;
void *uintConverter = &theUint;
char *thebytes = (char*)uintConverter;
thebytes[0] = bytes[0];
thebytes[1] = bytes[1];
return theUint;
}
};
Used like this:
unsigned int theUint;
char bytes[2];
CString msg;
ByteConverter::uIntToBytes(65000,bytes);
theUint = ByteConverter::bytesToUint(bytes);
msg.Format(_T("theUint = %d"), theUint);
AfxMessageBox(msg, MB_ICONINFORMATION | MB_OK);
Hope this helps someone out.