Convert four bytes to Integer using C++ - c++

I am trying to convert 4 bytes to an integer using C++.
This is my code:
int buffToInteger(char * buffer)
{
int a = (int)(buffer[0] << 24 | buffer[1] << 16 | buffer[2] << 8 | buffer[3]);
return a;
}
The code above works in almost all cases, for example:
When my buffer is: "[\x00, \x00, \x40, \x00]" the code will return 16384 as expected.
But when the buffer is filled with: "[\x00, \x00, \x3e, \xe3]", the code won't work as expected and will return "ffffffe1".
Does anyone know why this happens?

Your buffer contains signed characters. So, actually, buffer[0] == -29, which upon conversion to int gets sign-extended to 0xffffffe3, and in turn (0x3e << 8) | 0xffffffe3 == 0xffffffe3.
You need ensure your individual buffer bytes are interpreted unsigned, either by declaring buffer as unsigned char *, or by explicitly casting:
int a = int((unsigned char)(buffer[0]) << 24 |
(unsigned char)(buffer[1]) << 16 |
(unsigned char)(buffer[2]) << 8 |
(unsigned char)(buffer[3]));

In the expression buffer[0] << 24 the value 24 is an int, so buffer[0] will also be converted to an int before the shift is performed.
On your system a char is apparently signed, and will then be sign extended when converted to int.

There's a implict promotion to a signed int in your shifts.
That's because char is (apparently) signed on your platform (the common thing) and << promotes to integers implicitly. In fact none of this would work otherwise because << 8 (and higher) would scrub all your bits!
If you're stuck with using a buffer of signed chars this will give you what you want:
#include <iostream>
#include <iomanip>
int buffToInteger(char * buffer)
{
int a = static_cast<int>(static_cast<unsigned char>(buffer[0]) << 24 |
static_cast<unsigned char>(buffer[1]) << 16 |
static_cast<unsigned char>(buffer[2]) << 8 |
static_cast<unsigned char>(buffer[3]));
return a;
}
int main(void) {
char buff[4]={0x0,0x0,0x3e,static_cast<char>(0xe3)};
int a=buffToInteger(buff);
std::cout<<std::hex<<a<<std::endl;
// your code goes here
return 0;
}
Be careful about bit shifting on signed values. Promotions don't just add bytes but may convert values.
For example a gotcha here is that you can't use static_cast<unsigned int>(buffer[1]) (etc.) directly because that converts the signed char value to a signed int and then reinterprets that value as an unsigned.
If anyone asks me all implicit numeric conversions are bad. No program should have so many that they would become a chore. It's a softness in the C++ inherited from C that causes all sorts of problems that far exceed their value.
It's even worse in C++ because they make the already confusing overloading rules even more confusing.

I think this could be also done with use of memcpy:
int buffToInteger(char* buffer)
{
int a;
memcpy( &a, buffer, sizeof( int ) );
return a;
}
This is much faster than the example mentioned in the original post, because it just treats all bytes "as is" and there is no need to do any operations such as bit shift etc.
It also doesn't cause any signed-unsigned issues.

char buffer[4];
int a;
a = *(int*)&buffer;
This takes a buffer reference, type casts it to an int reference and then dereferences it.

int buffToInteger(char * buffer)
{
return *reinterpret_cast<int*>(buffer);
}
This conversion is simple and fast. We only tell compiler to treat a byte array in a memory as a single integer

Related

Bitwise operations on elements from array of chars

I have made array of hexadecimal numbers that I would like to add together bitwise. In my program I want to add 0xFF with 0x7F00. Here is my approach
#include <iostream>
using namespace std;
int main() {
char data[2] = {0xFF, 0x7F};
cout << (data[0] | (data[1] << 8)) << endl;
system("pause");
return 0;
}
I expect the result to be 0x7FFF which is 32767 in decimal, but I get -1 (0xFF in hex).
The problem you're having stems from two facts:
The bitwise operators requires integral promotion of both operands.
char can be either signed or unsigned
Promotion will convert values of smaller types (like char or short) to int, and as part of that signed values will be sign-extended. If char is signed, then the value 0xff will be converted to the (32-bit) int value 0xffffffff, which is -1.
It doesn't matter what value you use in the bitwise OR, the result will still be 0xffffffff.
The simple solution is to explicitly use unsigned char (or even better uint8_t) as the type for the array elements:
uint8_t data[2] = {0xFF, 0x7F};

What is *(int*)&data[18] actually doing in this code?

I came across this syntax for reading a BMP file in C++
#include <fstream>
int main() {
std::ifstream in('filename.bmp', std::ifstream::binary);
in.seekg(0, in.end);
size = in.tellg();
in.seekg(0);
unsigned char * data = new unsigned char[size];
in.read((unsigned char *)data, size);
int width = *(int*)&data[18];
// omitted remainder for minimal example
}
and I don't understand what the line
int width = *(int*)&data[18];
is actually doing. Why doesn't a simple cast from unsigned char * to int, int width = (int)data[18];, work?
Note
As #user4581301 indicated in the comments, this depends on the implementation and will fail in many instances. And as #NathanOliver- Reinstate Monica and #ChrisMM pointed out this is Undefined Behavior and the result is not guaranteed.
According to the bitmap header format, the width of the bitmap in pixels is stored as a signed 32-bit integer beginning at byte offset 18. The syntax
int width = *(int*)&data[18];
reads bytes 19 through 22, inclusive (assuming a 32-bit int) and interprets the result as an integer.
How?
&data[18] gets the address of the unsigned char at index 18
(int*) casts the address from unsigned char* to int* to avoid loss of precision on 64 bit architectures
*(int*) dereferences the address to get the referred int value
So basically, it takes the address of data[18] and reads the bytes at that address as if they were an integer.
Why doesn't a simple cast to `int` work?
sizeof(data[18]) is 1, because unsigned char is one byte (0-255) but sizeof(&data[18]) is 4 if the system is 32-bit and 8 if it is 64-bit, this can be larger (or even smaller for 16-bit systems) but with the exception of 16-bit systems it should be at minimum 4 bytes. Obviously reading more than 4 bytes is not desired in this case, and the cast to (int*) and subsequent dereference to int yields 4 bytes, and indeed the 4 bytes between offsets 18 and 21, inclusive. A simple cast from unsigned char to int will also yield 4 bytes, but only one byte of the information from data. This is illustrated by the following example:
#include <iostream>
#include <bitset>
int main() {
// Populate 18-21 with a recognizable pattern for demonstration
std::bitset<8> _bits(std::string("10011010"));
unsigned long bits = _bits.to_ulong();
for (int ii = 18; ii < 22; ii ++) {
data[ii] = static_cast<unsigned char>(bits);
}
std::cout << "data[18] -> 1 byte "
<< std::bitset<32>(data[18]) << std::endl;
std::cout << "*(unsigned short*)&data[18] -> 2 bytes "
<< std::bitset<32>(*(unsigned short*)&data[18]) << std::endl;
std::cout << "*(int*)&data[18] -> 4 bytes "
<< std::bitset<32>(*(int*)&data[18]) << std::endl;
}
data[18] -> 1 byte 00000000000000000000000010011010
*(unsigned short*)&data[18] -> 2 bytes 00000000000000001001101010011010
*(int*)&data[18] -> 4 bytes 10011010100110101001101010011010

C/C++ Converting a 64 bit integer to char array

I have the following simple program that uses a union to convert between a 64 bit integer and its corresponding byte array:
union u
{
uint64_t ui;
char c[sizeof(uint64_t)];
};
int main(int argc, char *argv[])
{
u test;
test.ui = 0x0123456789abcdefLL;
for(unsigned int idx = 0; idx < sizeof(uint64_t); idx++)
{
cout << "test.c[" << idx << "] = 0x" << hex << +test.c[idx] << endl;
}
return 0;
}
What I would expect as output is:
test.c[0] = 0xef
test.c[1] = 0xcd
test.c[2] = 0xab
test.c[3] = 0x89
test.c[4] = 0x67
test.c[5] = 0x45
test.c[6] = 0x23
test.c[7] = 0x1
But what I actually get is:
test.c[0] = 0xffffffef
test.c[1] = 0xffffffcd
test.c[2] = 0xffffffab
test.c[3] = 0xffffff89
test.c[4] = 0x67
test.c[5] = 0x45
test.c[6] = 0x23
test.c[7] = 0x1
I'm seeing this on Ubuntu LTS 14.04 with GCC.
I've been trying to get my head around this for some time now. Why are the first 4 elements of the char array displayed as 32 bit integers, with 0xffffff prepended to them? And why only the first 4, why not all of them?
Interestingly enough, when I use the array to write to a stream (which was the original purpose of the whole thing), the correct values are written. But comparing the array char by char obviously leads to problems, since the first 4 chars are not equal 0xef, 0xcd, and so on.
Using char is not the right thing to do since it could be signed or unsigned. Use unsigned char.
union u
{
uint64_t ui;
unsigned char c[sizeof(uint64_t)];
};
char gets promoted to an int because of the prepended unary + operator. . Since your chars are signed, any element with the highest by set to 1 is interpreted as a negative number and promoted to an integer with the same negative value. There are a few different ways to solve this:
Drop the +: ... << test.c[idx] << .... This may print the char as a character rather than a number, so is probably not a good solution.
Declare c as unsigned char. This will promote it to an unsigned int.
Explicitly cast +test.c[idx] before it is passed: ... << (unsigned char)(+test.c[idx]) << ...
Set the upper bytes of the integer to zero using binary &: ... << +test.c[idx] & 0xFF << .... This will only display the lowest-order byte no matter how the char is promoted.
Use either unsigned char or use test.c[idx] & 0xff to avoid sign extension when a char value > 0x7f is converted to int.
It is unsigned char vs signed char and its casting to integer
The unary plus causes the char to be promoted to a int (integral promotion). Because you have signed chars the value will be used as such and the other bytes will reflect that.
It is not true that only the four are ints, they all are. You just don't see it from the representtion since the leading zeroes are not shown.
Either use unsigned chars or & 0xff for promotion to get the desired result.

>> and << and data types in C++

I have looked over the guide given in this answer, but I still don't understand bit-shifting. In particular I am confused about the data types come into play.
The following:
unsigned int a = pow(2,31);
cout << (a << 1);
indeed produces 0 as I expect because the int is 32 bits, so moving the 1 to the left, pushes it into nothing.
But the following
unsigned int a = 1;
unsigned char b = (unsigned char)a;
cout << (unsigned int)(b<<8);
produces 256. Why is that? My guess would have been that a char is 8 bit and so moving 1 left 8 places should give zero.
Is there a function/shift that does this? (i.e. evaluates 1<<8 to 0).
Narrow integral values are promoted to int or unsigned int before being used. It's called integral promotion.

Integer into char array

I need to convert integer value into char array on bit layer. Let's say int has 4 bytes and I need to split it into 4 chunks of length 1 byte as char array.
Example:
int a = 22445;
// this is in binary 00000000 00000000 1010111 10101101
...
//and the result I expect
char b[4];
b[0] = 0; //first chunk
b[1] = 0; //second chunk
b[2] = 87; //third chunk - in binary 1010111
b[3] = 173; //fourth chunk - 10101101
I need this conversion make really fast, if possible without any loops (some tricks with bit operations perhaps). The goal is thousands of such conversions in one second.
I'm not sure if I recommend this, but you can #include <stddef.h> and <sys/types.h> and write:
*(u32_t *)b = htonl((u32_t)a);
(The htonl is to ensure that the integer is in big-endian order before you store it.)
int a = 22445;
char *b = (char *)&a;
char b2 = *(b+2); // = 87
char b3 = *(b+3); // = 173
Depending on how you want negative numbers represented, you can simply convert to unsigned and then use masks and shifts:
unsigned char b[4];
unsigned ua = a;
b[0] = (ua >> 24) & 0xff;
b[1] = (ua >> 16) & 0xff;
b[2] = (ua >> 8) & 0xff
b[3] = ua & 0xff;
(Due to the C rules for converting negative numbers to unsigned, this will produce the twos complement representation for negative numbers, which is almost certainly what you want).
To access the binary representation of any type, you can cast a pointer to a char-pointer:
T x; // anything at all!
// In C++
unsigned char const * const p = reinterpret_cast<unsigned char const *>(&x);
/* In C */
unsigned char const * const p = (unsigned char const *)(&x);
// Example usage:
for (std::size_t i = 0; i != sizeof(T); ++i)
std::printf("Byte %u is 0x%02X.\n", p[i]);
That is, you can treat p as the pointer to the first element of an array unsigned char[sizeof(T)]. (In your case, T = int.)
I used unsigned char here so that you don't get any sign extension problems when printing the binary value (e.g. through printf in my example). If you want to write the data to a file, you'd use char instead.
You have already accepted an answer, but I will still give mine, which might suit you better (or the same...). This is what I tested with:
int a[3] = {22445, 13, 1208132};
for (int i = 0; i < 3; i++)
{
unsigned char * c = (unsigned char *)&a[i];
cout << (unsigned int)c[0] << endl;
cout << (unsigned int)c[1] << endl;
cout << (unsigned int)c[2] << endl;
cout << (unsigned int)c[3] << endl;
cout << "---" << endl;
}
...and it works for me. Now I know you requested a char array, but this is equivalent. You also requested that c[0] == 0, c[1] == 0, c[2] == 87, c[3] == 173 for the first case, here the order is reversed.
Basically, you use the SAME value, you only access it differently.
Why haven't I used htonl(), you might ask?
Well since performance is an issue, I think you're better off not using it because it seems like a waste of (precious?) cycles to call a function which ensures that bytes will be in some order, when they could have been in that order already on some systems, and when you could have modified your code to use a different order if that was not the case.
So instead, you could have checked the order before, and then used different loops (more code, but improved performance) based on what the result of the test was.
Also, if you don't know if your system uses a 2 or 4 byte int, you could check that before, and again use different loops based on the result.
Point is: you will have more code, but you will not waste cycles in a critical area, which is inside the loop.
If you still have performance issues, you could unroll the loop (duplicate code inside the loop, and reduce loop counts) as this will also save you a couple of cycles.
Note that using c[0], c[1] etc.. is equivalent to *(c), *(c+1) as far as C++ is concerned.
typedef union{
byte intAsBytes[4];
int int32;
}U_INTtoBYTE;