C++ Bits in 64 bit integer - c++

Hello I have a struct here that is 7 bytes and I'd like to write it to a 64 bit integer. Next, I'd like to extract out this struct later from the 64 bit integer.
Any ideas on this?
#include "stdafx.h"
struct myStruct
{
unsigned char a;
unsigned char b;
unsigned char b;
unsigned int someNumber;
};
int _tmain(int argc, _TCHAR* argv[])
{
myStruct * m = new myStruct();
m->a = 11;
m->b = 8;
m->c = 12;
m->someNumber = 30;
printf("\n%s\t\t%i\t%i\t%i\t%i\n\n", "struct", m->a, m->b, m->c, m->someNumber);
unsigned long num = 0;
// todo: use bitwise operations from m into num (total of 7 bytes)
printf("%s\t\t%i\n\n", "ulong", num);
m = new myStruct();
// todo: use bitwise operations from num into m;
printf("%s\t\t%i\t%i\t%i\t%i\n\n", "struct", m->a, m->b, m->c, m->someNumber);
return 0;
}

You should to do something like this:
class structured_uint64
{
uint64_t data;
public:
structured_uint64(uint64_t x = 0):data(x) {}
operator uint64_t&() { return data; }
unsigned uint8_t low_byte(size_t n) const { return data >> (n * 8); }
void low_byte(size_t n, uint8_t val) {
uint64_t mask = static_cast<uint64_t>(0xff) << (8 * n);
data = (data & ~mask) | (static_cast<uint64_t>(val) << (8 * n));
}
unsigned uint32_t hi_word() const { return (data >> 24); }
// et cetera
};
(there is, of course, lots of room for variation on the details of the interface and where among the 64 bits the constituents are placed)
Using different types to alias the same portion of memory is a generally bad idea. The thing is, it's very valuable for the optimizer to be able to use reasoning like:
"Okay, I've read a uint64_t at the start of this block, and nowhere in the middle does the program write to any uint64_ts, therefore the value must be unchanged!"
which means it will get the wrong answer if you tried to change the value of the uint64_t object through a uint32_t reference. And as this is very dependent what optimizations are possible and done, it is actually pretty easy to never run across the problem in test cases, but see it in the real program you're trying to write -- and you'll spend forever trying to find the bug because you convinced yourself it's not this problem.
So, you really should do the insertion/extraction of the fields with bit twiddling (or intrinsics, if profiling shows that this is a performance issue and there are useful ones available) rather than trying to set up a clever struct.
If you really know what you're doing, you can make the aliasing work, I believe. But it should only be done if you really know what you're doing, and that includes knowing relevant rules from the standard inside and out (which I don't, and so I can't advise you on how to make it work). And even then you probably shouldn't do it.
Also, if you intend your integral types to be a specific size, you should really use the correct types. For example, never use unsigned int for an integer that is supposed to be exactly 32 bits. Instead use uint32_t. Not only is it self-documenting, but you won't run into a nasty surprise when you try to build your program in an environment where unsigned int is not 32 bits.

Use a union. Each element of a union occupies the same address space. The struct is one element, the unsigned long long is another.
#include <stdio.h>
union data
{
struct
{
unsigned char a;
unsigned char b;
unsigned char c;
unsigned int d;
} e;
unsigned long long f;
};
int main()
{
data dat;
dat.f = 0xFFFFFFFFFFFFFFFF;
dat.e.a = 1;
dat.e.b = 2;
dat.e.c = 3;
dat.e.d = 4;
printf("f=%016llX\n",dat.f);
printf("%02X %02X %02X %08X\n",dat.e.a,dat.e.b,dat.e.c,dat.e.d);
return 0;
}
Output, but note one byte of the original unsigned long long remains. Compilers like to align data such as 4-byte integers on addresses divisible by 4, so three bytes, then a pad byte so the integer is at offset 4 and the struct has a total size of 8.
f=00000004FF030201
01 02 03 00000004
This can be controlled in compiler-dependent fashion. Below is for Microsoft C++:
#include <stdio.h>
#pragma pack(push,1)
union data
{
struct
{
unsigned char a;
unsigned char b;
unsigned char c;
unsigned int d;
} e;
unsigned long long f;
};
#pragma pack(pop)
int main()
{
data dat;
dat.f = 0xFFFFFFFFFFFFFFFF;
dat.e.a = 1;
dat.e.b = 2;
dat.e.c = 3;
dat.e.d = 4;
printf("f=%016llX\n",dat.f);
printf("%02X %02X %02X %08X\n",dat.e.a,dat.e.b,dat.e.c,dat.e.d);
return 0;
}
Note the struct occupies seven bytes now and the highest byte of the unsigned long long is now unchanged:
f=FF00000004030201
01 02 03 00000004

Got it.
static unsigned long long compress(char a, char b, char c, unsigned int someNumber)
{
unsigned long long x = 0;
x = x | a;
x = x << 8;
x = x | b;
x = x << 8;
x = x | c;
x = x << 32;
x = x | someNumber;
return x;
}
myStruct * decompress(unsigned long long x)
{
printBinary(x);
myStruct * m = new myStruct();
m->someNumber = x | 4294967296;
x = x >> 32;
m->c = x | 256;
x = x >> 8;
m->b = x | 256;
x = x >> 8;
m->a = x | 256;
return m;
}

Related

Why is the size of the union greater than expected?

#include <iostream>
typedef union dbits {
double d;
struct {
unsigned int M1: 20;
unsigned int M2: 20;
unsigned int M3: 12;
unsigned int E: 11;
unsigned int s: 1;
};
};
int main(){
std::cout << "sizeof(dbits) = " << sizeof(dbits) << '\n';
}
output: sizeof(dbits) = 16, but if
typedef union dbits {
double d;
struct {
unsigned int M1: 12;
unsigned int M2: 20;
unsigned int M3: 20;
unsigned int E: 11;
unsigned int s: 1;
};
};
Output: sizeof(dbits) = 8
Why does the size of the union increase?
In the first and second union, the same number of bits in the bit fields in the structure, why the different size?
I would like to write like this:
typedef union dbits {
double d;
struct {
unsigned long long M: 52;
unsigned int E: 11;
unsigned int s: 1;
};
};
But, sizeof(dbits) = 16, but not 8, Why?
And how convenient it is to use bit fields in structures to parse bit in double?
members of a bit field will not cross boundaries of the specified storage type. So
unsigned int M1: 20;
unsigned int M2: 20;
will be 2 unsigned int using 20 out of 32 bit each.
In your second case 12 + 20 == 32 fits in a single unsigned int.
As for your last case members with different storage type can never share. So you get one unsigned long long and one unsigned int instead of a single unsigned long long as you desired.
You should use uint64_t so you get exact bit counts. unsigned int could e anything from 16 to 128 (or more) bit.
Note: bitfields are highly implementation defined, this is just the common way it usually works.

Convert int bits to float verbatim and print them

I'm trying to just copy the contents of a 32-bit unsigned int to be used as float. Not casting it, just re-interpreting the integer bits to be used as float. I'm aware memcpy is the most-suggested option for this. However, when I do memcpy from uint_32 to float, and print out the individual bits, I see they are quite different.
Here is my code snippet:
#include <iostream>
#include <stdint.h>
#include <cstring>
using namespace std;
void print_bits(unsigned n) {
unsigned i;
for(i=1u<<31;i > 0; i/=2)
(n & i) ? printf("1"): printf("0");
}
union {
uint32_t u_int;
float u_float;
} my_union;
int main()
{
uint32_t my_int = 0xc6f05705;
float my_float;
//Method 1 using memcpy
memcpy(&my_float, &my_int, sizeof(my_float));
//Print using function
print_bits(my_int);
printf("\n");
print_bits(my_float);
//Print using printf
printf("\n%0x\n",my_int);
printf("%0x\n",my_float);
//Method 2 using unions
my_union.u_int = 0xc6f05705;
printf("union int = %0x\n",my_union.u_int);
printf("union float = %0x\n",my_union.u_float);
return 0;
}
Outputs:
11000110111100000101011100000101
11111111111111111000011111010101
c6f05705
400865
union int = c6f05705
union float = 40087b
Can someone explain what's happening? I expected the bits to match. Didn't work with a union either.
You need to change the function print_bits to
inline
int is_big_endian(void)
{
const union
{
uint32_t i;
char c[sizeof(uint32_t)];
} e = { 0x01000000 };
return e.c[0];
}
void print_bits( const void *src, unsigned int size )
{
//Check for the order of bytes in memory of the compiler:
int t, c;
if (is_big_endian())
{
t = 0;
c = 1;
}
else
{
t = size - 1;
c = -1;
}
for (; t >= 0 && t <= size - 1; t += c)
{ //print the bits of each byte from the MSB to the LSB
unsigned char i;
unsigned char n = ((unsigned char*)src)[t];
for(i = 1 << (CHAR_BIT - 1); i > 0; i /= 2)
{
printf("%d", (n & i) != 0);
}
}
printf("\n");
}
and call it like this:
int a = 7;
print_bits(&a, sizeof(a));
that way there won't be any type conversion when you call print_bits and it would work for any struct size.
EDIT: I replaced 7 with CHAR_BIT - 1 because the size of byte can be different than 8 bits.
EDIT 2: I added support for both little endian and big endian compilers.
Also as #M.M suggested in the comments if you want to you can use template to make the function call be: print_bits(a) instead of print_bits(&a, sizeof(a))

Does this correctly combine two unsigned 32-bit integers into one unsigned 64-bit integer in C++?

Does this correctly combine two unsigned 32-bit integers into one unsigned 64-bit integer in C++?
std::uint32_t a = ...
std::uint32_t b = ...
std::uint64_t result = ((std::uint64_t)a << 32) | (std::uint64_t)b)
Is this code valid for all the unsigned integer values of a & b?
Actually, I want unique result values for all possible unsigned integer values of a & b. The aim is to keep the size/length of the result minimal (in this case, we can bind it in 64 bit).
Yes, it works as you'd expect (if they are really unsigned).
Another method:
uint32_t a = xxx;
uint32_t b = xxx;
uint64_t result;
uint32_t * p = (uint32_t *)&result;
p[0] = b;
p[1] = a;
Or maybe better:
union
{
uint32_t b;
uint32_t a;
uint64_t res;
} u3264;
u3264 u;
u.a = xxx;
u.b = yyy;
// 64 bit result in u.res

How to store a 64 bit integer in two 32 bit integers and convert back again

I'm pretty sure its just a matter of some bitwise operations, I'm just not entirely sure of exactly what I should be doing, and all searches return back "64 bit vs 32 bit".
pack:
u32 x, y;
u64 v = ((u64)x) << 32 | y;
unpack:
x = (u32)((v & 0xFFFFFFFF00000000LL) >> 32);
y = (u32)(v & 0xFFFFFFFFLL);
Or this, if you're not interested in what the two 32-bits numbers mean:
u32 x[2];
u64 z;
memcpy(x,&z,sizeof(z));
memcpy(&z,x,sizeof(z));
Use a union and get rid of the bit-operations:
<stdint.h> // for int32_t, int64_t
union {
int64_t big;
struct {
int32_t x;
int32_t y;
};
};
assert(&y == &x + sizeof(x));
simple as that. big consists of both x and y.
I don't know if this is any better than the union or memcpy solutions, but I had to unpack/pack signed 64bit integers and didn't really want to mask or shift anything, so I ended up simply treating the 64bit value as two 32bit values and assign them directly like so:
#include <stdio.h>
#include <stdint.h>
void repack(int64_t in)
{
int32_t a, b;
printf("input: %016llx\n", (long long int) in);
a = ((int32_t *) &in)[0];
b = ((int32_t *) &in)[1];
printf("unpacked: %08x %08x\n", b, a);
((int32_t *) &in)[0] = a;
((int32_t *) &in)[1] = b;
printf("repacked: %016llx\n\n", (long long int) in);
}
The basic method is as follows:
uint64_t int64;
uint32_t int32_1, int32_2;
int32_1 = int64 & 0xFFFFFFFF;
int32_2 = (int64 & (0xFFFFFFFF << 32) ) >> 32;
// ...
int64 = int32_1 | (int32_2 << 32);
Note that your integers must be unsigned; or the operations are undefined.
long x = 0xFEDCBA9876543210;
cout << hex << "0x" << x << endl;
int a = x ;
cout << hex << "0x" << a << endl;
int b = (x >> 32);
cout << hex << "0x" << b << endl;
Not sure if this way of doing is good for portability or others but I use...
#include <stdio.h>
#include <stdint.h>
typedef enum {false, true} bool;
#ifndef UINT32_WIDTH
#define UINT32_WIDTH 32 // defined in stdint.h, but compiler error ??
#endif
typedef struct{
struct{ // anonymous struct
uint32_t x;
uint32_t y;
};}ts_point;
typedef struct{
struct{ // anonymous struct
uint32_t line;
uint32_t column;
};}ts_position;
bool is_little_endian()
{
uint8_t n = 1;
return *(char *)&n == 1;
}
int main(void)
{
uint32_t x, y;
uint64_t packed;
ts_point *point; // struct offers a "mask" to retreive data
ts_position *position; // in an ordered and comprehensive way.
x = -12;
y = -23;
printf("at start: x = %i | y = %i\n", x, y);
if (is_little_endian()){
packed = (uint64_t)y << UINT32_WIDTH | (uint64_t)x;
}else{
packed = (uint64_t)x << UINT32_WIDTH | (uint64_t)y;
}
printf("packed: position = %llu\n", packed);
point = (ts_point*)&packed;
printf("unpacked: x = %i | y = %i\n", point->x, point->y); // access via pointer
position = (ts_position*)&packed;
printf("unpacked: line = %i | column = %i\n", position->line, position->column);
return 0;
}
I like the way I do as it's offer lots of readiness and can be applied in manay ways ie. 02x32, 04x16, 08x08, etc.
I'm new at C so feel free to critic my code and way of doing... thanks

Store an int in a char array?

I want to store a 4-byte int in a char array... such that the first 4 locations of the char array are the 4 bytes of the int.
Then, I want to pull the int back out of the array...
Also, bonus points if someone can give me code for doing this in a loop... IE writing like 8 ints into a 32 byte array.
int har = 0x01010101;
char a[4];
int har2;
// write har into char such that:
// a[0] == 0x01, a[1] == 0x01, a[2] == 0x01, a[3] == 0x01 etc.....
// then, pull the bytes out of the array such that:
// har2 == har
Thanks guys!
EDIT: Assume int are 4 bytes...
EDIT2: Please don't care about endianness... I will be worrying about endianness. I just want different ways to acheive the above in C/C++. Thanks
EDIT3: If you can't tell, I'm trying to write a serialization class on the low level... so I'm looking for different strategies to serialize some common data types.
Unless you care about byte order and such, memcpy will do the trick:
memcpy(a, &har, sizeof(har));
...
memcpy(&har2, a, sizeof(har2));
Of course, there's no guarantee that sizeof(int)==4 on any particular implementation (and there are real-world implementations for which this is in fact false).
Writing a loop should be trivial from here.
Not the most optimal way, but is endian safe.
int har = 0x01010101;
char a[4];
a[0] = har & 0xff;
a[1] = (har>>8) & 0xff;
a[2] = (har>>16) & 0xff;
a[3] = (har>>24) & 0xff;
#include <stdio.h>
int main(void) {
char a[sizeof(int)];
*((int *) a) = 0x01010101;
printf("%d\n", *((int *) a));
return 0;
}
Keep in mind:
A pointer to an object or incomplete type may be converted to a pointer to a different
object or incomplete type. If the resulting pointer is not correctly aligned for the
pointed-to type, the behavior is undefined.
Note: Accessing a union through an element that wasn't the last one assigned to is undefined behavior.
(assuming a platform where characters are 8bits and ints are 4 bytes)
A bit mask of 0xFF will mask off one character so
char arr[4];
int a = 5;
arr[3] = a & 0xff;
arr[2] = (a & 0xff00) >>8;
arr[1] = (a & 0xff0000) >>16;
arr[0] = (a & 0xff000000)>>24;
would make arr[0] hold the most significant byte and arr[3] hold the least.
edit:Just so you understand the trick & is bit wise 'and' where as && is logical 'and'.
Thanks to the comments about the forgotten shift.
int main() {
typedef union foo {
int x;
char a[4];
} foo;
foo p;
p.x = 0x01010101;
printf("%x ", p.a[0]);
printf("%x ", p.a[1]);
printf("%x ", p.a[2]);
printf("%x ", p.a[3]);
return 0;
}
Bear in mind that the a[0] holds the LSB and a[3] holds the MSB, on a little endian machine.
Don't use unions, Pavel clarifies:
It's U.B., because C++ prohibits
accessing any union member other than
the last one that was written to. In
particular, the compiler is free to
optimize away the assignment to int
member out completely with the code
above, since its value is not
subsequently used (it only sees the
subsequent read for the char[4]
member, and has no obligation to
provide any meaningful value there).
In practice, g++ in particular is
known for pulling such tricks, so this
isn't just theory. On the other hand,
using static_cast<void*> followed by
static_cast<char*> is guaranteed to
work.
– Pavel Minaev
You can also use placement new for this:
void foo (int i) {
char * c = new (&i) char[sizeof(i)];
}
#include <stdint.h>
int main(int argc, char* argv[]) {
/* 8 ints in a loop */
int i;
int* intPtr
int intArr[8] = {1, 2, 3, 4, 5, 6, 7, 8};
char* charArr = malloc(32);
for (i = 0; i < 8; i++) {
intPtr = (int*) &(charArr[i * 4]);
/* ^ ^ ^ ^ */
/* point at | | | */
/* cast as int* | | */
/* Address of | */
/* Location in char array */
*intPtr = intArr[i]; /* write int at location pointed to */
}
/* Read ints out */
for (i = 0; i < 8; i++) {
intPtr = (int*) &(charArr[i * 4]);
intArr[i] = *intPtr;
}
char* myArr = malloc(13);
int myInt;
uint8_t* p8; /* unsigned 8-bit integer */
uint16_t* p16; /* unsigned 16-bit integer */
uint32_t* p32; /* unsigned 32-bit integer */
/* Using sizes other than 4-byte ints, */
/* set all bits in myArr to 1 */
p8 = (uint8_t*) &(myArr[0]);
p16 = (uint16_t*) &(myArr[1]);
p32 = (uint32_t*) &(myArr[5]);
*p8 = 255;
*p16 = 65535;
*p32 = 4294967295;
/* Get the values back out */
p16 = (uint16_t*) &(myArr[1]);
uint16_t my16 = *p16;
/* Put the 16 bit int into a regular int */
myInt = (int) my16;
}
char a[10];
int i=9;
a=boost::lexical_cast<char>(i)
found this is the best way to convert char into int and vice-versa.
alternative to boost::lexical_cast is sprintf.
char temp[5];
temp[0]="h"
temp[1]="e"
temp[2]="l"
temp[3]="l"
temp[5]='\0'
sprintf(temp+4,%d",9)
cout<<temp;
output would be :hell9
union value {
int i;
char bytes[sizof(int)];
};
value v;
v.i = 2;
char* bytes = v.bytes;