c++ - store byte[4] in an int - c++

I want to take a byte array with 4 bytes in it, and store it in an int.
For example (non-working code):
unsigned char _bytes[4];
int * combine;
_bytes[0] = 1;
_bytes[1] = 1;
_bytes[2] = 1;
_bytes[3] = 1;
combine = &_bytes[0];
I do not want to use bit shifting to put the bytes in the int, I would like to point at the bytes memory and use them as an int if possible.

In Standard C++ it's not possible to do this reliably. The strict aliasing rule says that when you read through an expression of type int, it must actually designate an int object (or a const int etc.) otherwise it causes undefined behaviour.
However you can do the opposite: declare an int and then fill in the bytes:
int combine;
unsigned char *bytes = reinterpret_cast<unsigned char *>(&combine);
bytes[0] = 1;
bytes[1] = 1;
bytes[2] = 1;
bytes[3] = 1;
std::cout << combine << std::endl;
Of course, which value you get out of this depends on how your system represents integers. If you want your code to use the same mapping on different systems then you can't use memory aliasing; you'd have to use an equation instead.

Related

Divide char* into few variables

I have some char array: char char[8] which containing for example two ints, on first 4 indexes is first int, and on next 4 indexes there is second int.
char array[8] = {0,0,0,1,0,0,0,1};
int a = array[0-3]; // =1;
int b = array[4-8]; // =1;
How to cast this array to two int's?
There can be any other type, not necessarily int, but this is only some example:
I know i can copy this array to two char arrays which size will be 4 and then cast each of array to int. But i think this isn't nice, and breaks the principle of clean code.
If your data has the correct endianness, you can extract blitable types from a byte buffer with memcpy:
int8_t array[8] = {0,0,0,1,0,0,0,1};
int32_t a, b;
memcpy(&a, array + 0, sizeof a);
memcpy(&b, array + 4, sizeof b);
While #Vivek is correct that ntohl can be used to normalize endianness, you have to do that as a second step. Do not play games with pointers as that violates strict aliasing and leads to undefined behavior (in practice, either alignment exceptions or the optimizer discarding large portions of your code as unreachable).
int8_t array[8] = {0,0,0,1,0,0,0,1};
int32_t tmp;
memcpy(&tmp, array + 0, sizeof tmp);
int a = ntohl(tmp);
memcpy(&tmp, array + 4, sizeof tmp);
int b = ntohl(tmp);
Please note that almost all optimizing compilers are smart enough to not call a function when they see memcpy with a small constant count argument.
Let's use a little bit of the C++ algorithms, such as std::accumulate:
#include <numeric>
#include <iostream>
int getTotal(const char* value, int start, int end)
{
return std::accumulate(value + start, value + end, 0,
[](int n, char ch){ return n * 10 + (ch-'0');});
}
int main()
{
char value[8] = {'1','2','3','4','0','0','1','4'};
int total1 = getTotal(value, 0, 4);
int total2 = getTotal(value, 4, 8);
std::cout << total1 << " " << total2;
}
Note the usage of std::accumulate and the lambda function. All we did was have a running total, multiplying each subtotal by 10. The character is translated to a number by simply subtracting '0'.
Live Example
You can type cast the bytes from the array to an int *. Then dereferencing will cause 4 bytes to be read as an int. Then doing an ntohl, will ensure that the bytes in the int are arranged as per the host order.
char array[8] = {0,0,0,1,0,0,0,1};
int a = *((int *)array);
int b = *((int *)&array[4]);
a = ntohl(a);
b = ntohl(b);
This will set a and b to 1 on both little and big endian systems.
If the compiler is set for strict aliasing, memcpy could be used to achieve the same, as follows:
char array[8] = {0,0,0,1,0,0,0,1};
int a, b;
memcpy(&a, array, sizeof(int));
memcpy(&b, array+4, sizeof(int));
a = ntohl(a);
b = ntohl(b);

Convert uint64_t to uint8_t[8]

How can I convert uint64_t to uint8_t[8] without loosing information in C++?
I tried the following:
uint64_t number = 23425432542254234532;
uint8_t result[8];
for(int i = 0; i < 8; i++) {
std::memcpy(result[i], number, 1);
}
You are almost there. Firstly, the literal 23425432542254234532 is too big to fit in uint64_t.
Secondly, as you can see from the documentation, std::memcpy has the following declaration:
void * memcpy ( void * destination, const void * source, size_t num );
As you can see, it takes pointers (addresses) as arguments. Not uint64_t, nor uint8_t. You can easily get the address of the integer using the address-of operator.
Thridly, you are only copying the first byte of the integer into each array element. You would need to increment the input pointer in every iteration. But the loop is unnecessary. You can copy all bytes in one go like this:
std::memcpy(result, &number, sizeof number);
Do realize that the order of the bytes depend on the endianness of the cpu.
First, do you want the conversion to be big-endian, or little-endian? Most of the previous answers are going to start giving you the bytes in the opposite order, and break your program,` as soon as you switch architectures.
If you need to get consistent results, you would want to convert your 64-bit input into big-endian (network) byte order, or perhaps to little-endian. For example, on GNU glib, the function is GUINT64_TO_BE(), but there is an equivalent built-in function for most compilers.
Having done that, there are several alternatives:
Copy with memcpy() or memmove()
This is the method that the language standard guarantees will work, although here I use one function from a third-party library (to convert the argument to big-endian byte order on all platforms). For example:
#include <stdint.h>
#include <stdlib.h>
#include <glib.h>
union eight_bytes {
uint64_t u64;
uint8_t b8[sizeof(uint64_t)];
};
eight_bytes u64_to_eight_bytes( const uint64_t input )
{
eight_bytes result;
const uint64_t big_endian = (uint64_t)GUINT64_TO_BE((guint64)input);
memcpy( &result.b8, &big_endian, sizeof(big_endian) );
return result;
}
On Linux x86_64 with clang++ -std=c++17 -O, this compiles to essentially the instructions:
bswapq %rdi
movq %rdi, %rax
retq
If you wanted the results in little-endian order on all platforms, you could replace GUINT64_TO_BE() with GUINT64_TO_LE() and remove the first instruction, then declare the function inline to remove the third instruction. (Or, if you’re certain that cross-platform compatibility does not matter, you might risk just omitting the normalization.)
So, on a modern, 64-bit compiler, this code is just as efficient as anything else. On another target, it might not be.
Type-Punning
The common way to write this in C would be to declare the union as before, set its uint64_t member, and then read its uint8_t[8] member. This is legal in C.
I personally like it because it allows me to express the entire operation as static single assignments.
However, in C++, it is formally undefined behavior. In practice, all C++ compilers I’m aware of support it for Plain Old Data (the formal term in the language standard), of the same size, with no padding bits, but not for more complicated classes that have virtual function tables and the like. It seems more likely to me that a future version of the Standard will officially support type-punning on POD than that any important compiler will ever break it silently.
The C++ Guidelines Way
Bjarne Stroustrup recommended that, if you are going to type-pun instead of copying, you use reinterpret_cast, such as
uint8_t (&array_of_bytes)[sizeof(uint64_t)] =
*reinterpret_cast<uint8_t(*)[sizeof(uint64_t)]>(
&proper_endian_uint64);
His reasoning was that both an explicit cast and type-punning through a union are undefined behavior, but the cast makes it blatant and unmistakable that you are shooting yourself in the foot on purpose, whereas reading a different union member than the active one can be a very subtle bug.
If I understand correctly you can do this that way for instance:
uint64_t number = 23425432542254234532;
uint8_t *p = (uint8_t *)&number;
//if you need a copy
uint8_t result[8];
for(int i = 0; i < 8; i++) {
result[i] = p[i];
}
When copying memory around between incompatible types, the first thing to be aware of is strict aliasing - you don't want to alias pointers incorrectly. Alignment is also to be considered.
You were almost there, the for is not needed.
uint64_t number = 0x2342543254225423; // trimmed to fit
uint8_t result[sizeof(number)];
std::memcpy(result, &number, sizeof(number));
Note: be aware of the endianness of the platform as well.
Either use a union, or do it with bitwise operations- memcpy is for blocks of memory and might not be the best option here.
uint64_t number = 23425432542254234532;
uint8_t result[8];
for(int i = 0; i < 8; i++) {
result[i] = uint8_t((number >> 8*(7 - i)) & 0xFF);
}
Or, although I'm told this breaks the rules, it works on my compiler:
union
{
uint64_t a;
uint8_t b[8];
};
a = 23425432542254234532;
//Can now read off the value of b
uint8_t copy[8];
for(int i = 0; i < 8; i++)
{
copy[i]= b[i];
}
The packing and unpacking can be done with masks. One more thing to worry about is the byte order. packing and unpacking should use the same byte order. Beware - This is not super efficient implementation and do not come with guarantees on small CPU that are not native 64-bit.
void unpack_uint64(uint64_t number, uint8_t *result) {
result[0] = number & 0x00000000000000FF ; number = number >> 8 ;
result[1] = number & 0x00000000000000FF ; number = number >> 8 ;
result[2] = number & 0x00000000000000FF ; number = number >> 8 ;
result[3] = number & 0x00000000000000FF ; number = number >> 8 ;
result[4] = number & 0x00000000000000FF ; number = number >> 8 ;
result[5] = number & 0x00000000000000FF ; number = number >> 8 ;
result[6] = number & 0x00000000000000FF ; number = number >> 8 ;
result[7] = number & 0x00000000000000FF ;
}
uint64_t pack_uint64(uint8_t *buffer) {
uint64_t value ;
value = buffer[7] ;
value = (value << 8 ) + buffer[6] ;
value = (value << 8 ) + buffer[5] ;
value = (value << 8 ) + buffer[4] ;
value = (value << 8 ) + buffer[3] ;
value = (value << 8 ) + buffer[2] ;
value = (value << 8 ) + buffer[1] ;
value = (value << 8 ) + buffer[0] ;
return value ;
}
#include<cstdint>
#include<iostream>
struct ByteArray
{
uint8_t b[8] = { 0,0,0,0,0,0,0,0 };
};
ByteArray split(uint64_t x)
{
ByteArray pack;
const uint8_t MASK = 0xFF;
for (auto i = 0; i < 7; ++i)
{
pack.b[i] = x & MASK;
x = x >> 8;
}
return pack;
}
int main()
{
uint64_t val_64 = UINT64_MAX;
auto pack = split(val_64);
for (auto i = 0; i < 7; ++i)
{
std::cout << (uint32_t)pack.b[i] << std::endl;
}
system("Pause");
return 0;
}
Although union approach which is addressed by Straw1239 is better and cleaner.Please do care about compiler/platform compatibility with endianness.

Converting 4 bytes in little endian order into an unsigned integer

I have a string of 256*4 bytes of data. These 256* 4 bytes need to be converted into 256 unsigned integers. The order in which they come is little endian, i.e. the first four bytes in the string are the little endian representation of the first integer, the next 4 bytes are the little endian representation of the next integer, and so on.
What is the best way to parse through this data and merge these bytes into unsigned integers? I know I have to use bitshift operators but I don't know in what way.
Hope this helps you
unsigned int arr[256];
char ch[256*4] = "your string";
for(int i = 0,k=0;i<256*4;i+=4,k++)
{
arr[k] = ch[i]|ch[i+1]<<8|ch[i+2]<<16|ch[i+3]<<24;
}
Alternatively, we can use C/C++ casting to interpret a char buffer as an array of unsigned int. This can help get away with shifting and endianness dependency.
#include <stdio.h>
int main()
{
char buf[256*4] = "abcd";
unsigned int *p_int = ( unsigned int * )buf;
unsigned short idx = 0;
unsigned int val = 0;
for( idx = 0; idx < 256; idx++ )
{
val = *p_int++;
printf( "idx = %d, val = %d \n", idx, val );
}
}
This would print out 256 values, the first one is
idx = 0, val = 1684234849
(and all remaining numbers = 0).
As a side note, "abcd" converts to 1684234849 because it's run on X86 (Little Endian), in which "abcd" is 0x64636261 (with 'a' is 0x61, and 'd' is 0x64 - in Little Endian, the LSB is in the smallest address). So 0x64636261 = 1684234849.
Note also, if using C++, reinterpret_cast should be used in this case:
const char *p_buf = "abcd";
const unsigned int *p_int = reinterpret_cast< const unsigned int * >( p_buf );
If your host system is little-endian, just read along 4 bytes, shift properly and copy them to int
char bytes[4] = "....";
int i = bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24);
If your host is big-endian, do the same and reverse the bytes in the int, or reverse it on-the-fly while copying with bit-shifting, i.e. just change the indexes of bytes[] from 0-3 to 3-0
But you shouldn't even do that just copy the whole char array to the int array if your PC is in little-endian
#define LEN 256
char bytes[LEN*4] = "blahblahblah";
unsigned int uint[LEN];
memcpy(uint, bytes, sizeof bytes);
That said, the best way is to avoid copying at all and use the same array for both types
union
{
char bytes[LEN*4];
unsigned int uint[LEN];
} myArrays;
// copy data to myArrays.bytes[], do something with those bytes if necessary
// after populating myArrays.bytes[], get the ints by myArrays.uint[i]

Integer into char array

I need to convert integer value into char array on bit layer. Let's say int has 4 bytes and I need to split it into 4 chunks of length 1 byte as char array.
Example:
int a = 22445;
// this is in binary 00000000 00000000 1010111 10101101
...
//and the result I expect
char b[4];
b[0] = 0; //first chunk
b[1] = 0; //second chunk
b[2] = 87; //third chunk - in binary 1010111
b[3] = 173; //fourth chunk - 10101101
I need this conversion make really fast, if possible without any loops (some tricks with bit operations perhaps). The goal is thousands of such conversions in one second.
I'm not sure if I recommend this, but you can #include <stddef.h> and <sys/types.h> and write:
*(u32_t *)b = htonl((u32_t)a);
(The htonl is to ensure that the integer is in big-endian order before you store it.)
int a = 22445;
char *b = (char *)&a;
char b2 = *(b+2); // = 87
char b3 = *(b+3); // = 173
Depending on how you want negative numbers represented, you can simply convert to unsigned and then use masks and shifts:
unsigned char b[4];
unsigned ua = a;
b[0] = (ua >> 24) & 0xff;
b[1] = (ua >> 16) & 0xff;
b[2] = (ua >> 8) & 0xff
b[3] = ua & 0xff;
(Due to the C rules for converting negative numbers to unsigned, this will produce the twos complement representation for negative numbers, which is almost certainly what you want).
To access the binary representation of any type, you can cast a pointer to a char-pointer:
T x; // anything at all!
// In C++
unsigned char const * const p = reinterpret_cast<unsigned char const *>(&x);
/* In C */
unsigned char const * const p = (unsigned char const *)(&x);
// Example usage:
for (std::size_t i = 0; i != sizeof(T); ++i)
std::printf("Byte %u is 0x%02X.\n", p[i]);
That is, you can treat p as the pointer to the first element of an array unsigned char[sizeof(T)]. (In your case, T = int.)
I used unsigned char here so that you don't get any sign extension problems when printing the binary value (e.g. through printf in my example). If you want to write the data to a file, you'd use char instead.
You have already accepted an answer, but I will still give mine, which might suit you better (or the same...). This is what I tested with:
int a[3] = {22445, 13, 1208132};
for (int i = 0; i < 3; i++)
{
unsigned char * c = (unsigned char *)&a[i];
cout << (unsigned int)c[0] << endl;
cout << (unsigned int)c[1] << endl;
cout << (unsigned int)c[2] << endl;
cout << (unsigned int)c[3] << endl;
cout << "---" << endl;
}
...and it works for me. Now I know you requested a char array, but this is equivalent. You also requested that c[0] == 0, c[1] == 0, c[2] == 87, c[3] == 173 for the first case, here the order is reversed.
Basically, you use the SAME value, you only access it differently.
Why haven't I used htonl(), you might ask?
Well since performance is an issue, I think you're better off not using it because it seems like a waste of (precious?) cycles to call a function which ensures that bytes will be in some order, when they could have been in that order already on some systems, and when you could have modified your code to use a different order if that was not the case.
So instead, you could have checked the order before, and then used different loops (more code, but improved performance) based on what the result of the test was.
Also, if you don't know if your system uses a 2 or 4 byte int, you could check that before, and again use different loops based on the result.
Point is: you will have more code, but you will not waste cycles in a critical area, which is inside the loop.
If you still have performance issues, you could unroll the loop (duplicate code inside the loop, and reduce loop counts) as this will also save you a couple of cycles.
Note that using c[0], c[1] etc.. is equivalent to *(c), *(c+1) as far as C++ is concerned.
typedef union{
byte intAsBytes[4];
int int32;
}U_INTtoBYTE;

Store an int in a char array?

I want to store a 4-byte int in a char array... such that the first 4 locations of the char array are the 4 bytes of the int.
Then, I want to pull the int back out of the array...
Also, bonus points if someone can give me code for doing this in a loop... IE writing like 8 ints into a 32 byte array.
int har = 0x01010101;
char a[4];
int har2;
// write har into char such that:
// a[0] == 0x01, a[1] == 0x01, a[2] == 0x01, a[3] == 0x01 etc.....
// then, pull the bytes out of the array such that:
// har2 == har
Thanks guys!
EDIT: Assume int are 4 bytes...
EDIT2: Please don't care about endianness... I will be worrying about endianness. I just want different ways to acheive the above in C/C++. Thanks
EDIT3: If you can't tell, I'm trying to write a serialization class on the low level... so I'm looking for different strategies to serialize some common data types.
Unless you care about byte order and such, memcpy will do the trick:
memcpy(a, &har, sizeof(har));
...
memcpy(&har2, a, sizeof(har2));
Of course, there's no guarantee that sizeof(int)==4 on any particular implementation (and there are real-world implementations for which this is in fact false).
Writing a loop should be trivial from here.
Not the most optimal way, but is endian safe.
int har = 0x01010101;
char a[4];
a[0] = har & 0xff;
a[1] = (har>>8) & 0xff;
a[2] = (har>>16) & 0xff;
a[3] = (har>>24) & 0xff;
#include <stdio.h>
int main(void) {
char a[sizeof(int)];
*((int *) a) = 0x01010101;
printf("%d\n", *((int *) a));
return 0;
}
Keep in mind:
A pointer to an object or incomplete type may be converted to a pointer to a different
object or incomplete type. If the resulting pointer is not correctly aligned for the
pointed-to type, the behavior is undefined.
Note: Accessing a union through an element that wasn't the last one assigned to is undefined behavior.
(assuming a platform where characters are 8bits and ints are 4 bytes)
A bit mask of 0xFF will mask off one character so
char arr[4];
int a = 5;
arr[3] = a & 0xff;
arr[2] = (a & 0xff00) >>8;
arr[1] = (a & 0xff0000) >>16;
arr[0] = (a & 0xff000000)>>24;
would make arr[0] hold the most significant byte and arr[3] hold the least.
edit:Just so you understand the trick & is bit wise 'and' where as && is logical 'and'.
Thanks to the comments about the forgotten shift.
int main() {
typedef union foo {
int x;
char a[4];
} foo;
foo p;
p.x = 0x01010101;
printf("%x ", p.a[0]);
printf("%x ", p.a[1]);
printf("%x ", p.a[2]);
printf("%x ", p.a[3]);
return 0;
}
Bear in mind that the a[0] holds the LSB and a[3] holds the MSB, on a little endian machine.
Don't use unions, Pavel clarifies:
It's U.B., because C++ prohibits
accessing any union member other than
the last one that was written to. In
particular, the compiler is free to
optimize away the assignment to int
member out completely with the code
above, since its value is not
subsequently used (it only sees the
subsequent read for the char[4]
member, and has no obligation to
provide any meaningful value there).
In practice, g++ in particular is
known for pulling such tricks, so this
isn't just theory. On the other hand,
using static_cast<void*> followed by
static_cast<char*> is guaranteed to
work.
– Pavel Minaev
You can also use placement new for this:
void foo (int i) {
char * c = new (&i) char[sizeof(i)];
}
#include <stdint.h>
int main(int argc, char* argv[]) {
/* 8 ints in a loop */
int i;
int* intPtr
int intArr[8] = {1, 2, 3, 4, 5, 6, 7, 8};
char* charArr = malloc(32);
for (i = 0; i < 8; i++) {
intPtr = (int*) &(charArr[i * 4]);
/* ^ ^ ^ ^ */
/* point at | | | */
/* cast as int* | | */
/* Address of | */
/* Location in char array */
*intPtr = intArr[i]; /* write int at location pointed to */
}
/* Read ints out */
for (i = 0; i < 8; i++) {
intPtr = (int*) &(charArr[i * 4]);
intArr[i] = *intPtr;
}
char* myArr = malloc(13);
int myInt;
uint8_t* p8; /* unsigned 8-bit integer */
uint16_t* p16; /* unsigned 16-bit integer */
uint32_t* p32; /* unsigned 32-bit integer */
/* Using sizes other than 4-byte ints, */
/* set all bits in myArr to 1 */
p8 = (uint8_t*) &(myArr[0]);
p16 = (uint16_t*) &(myArr[1]);
p32 = (uint32_t*) &(myArr[5]);
*p8 = 255;
*p16 = 65535;
*p32 = 4294967295;
/* Get the values back out */
p16 = (uint16_t*) &(myArr[1]);
uint16_t my16 = *p16;
/* Put the 16 bit int into a regular int */
myInt = (int) my16;
}
char a[10];
int i=9;
a=boost::lexical_cast<char>(i)
found this is the best way to convert char into int and vice-versa.
alternative to boost::lexical_cast is sprintf.
char temp[5];
temp[0]="h"
temp[1]="e"
temp[2]="l"
temp[3]="l"
temp[5]='\0'
sprintf(temp+4,%d",9)
cout<<temp;
output would be :hell9
union value {
int i;
char bytes[sizof(int)];
};
value v;
v.i = 2;
char* bytes = v.bytes;