reverse a number's bits - c++

Here is a C++ class for revering bits from LeetCode discuss. https://leetcode.com/discuss/29324/c-solution-9ms-without-loop-without-calculation
For example, given input 43261596 (represented in binary as 00000010100101000001111010011100), return 964176192 (represented in binary as 00111001011110000010100101000000).
Is there anyone can explain it? Thank you so very much!!
class Solution {
public:
uint32_t reverseBits(uint32_t n) {
struct bs
{
unsigned int _00:1; unsigned int _01:1; unsigned int _02:1; unsigned int _03:1;
unsigned int _04:1; unsigned int _05:1; unsigned int _06:1; unsigned int _07:1;
unsigned int _08:1; unsigned int _09:1; unsigned int _10:1; unsigned int _11:1;
unsigned int _12:1; unsigned int _13:1; unsigned int _14:1; unsigned int _15:1;
unsigned int _16:1; unsigned int _17:1; unsigned int _18:1; unsigned int _19:1;
unsigned int _20:1; unsigned int _21:1; unsigned int _22:1; unsigned int _23:1;
unsigned int _24:1; unsigned int _25:1; unsigned int _26:1; unsigned int _27:1;
unsigned int _28:1; unsigned int _29:1; unsigned int _30:1; unsigned int _31:1;
} *b = (bs*)&n,
c =
{
b->_31, b->_30, b->_29, b->_28
, b->_27, b->_26, b->_25, b->_24
, b->_23, b->_22, b->_21, b->_20
, b->_19, b->_18, b->_17, b->_16
, b->_15, b->_14, b->_13, b->_12
, b->_11, b->_10, b->_09, b->_08
, b->_07, b->_06, b->_05, b->_04
, b->_03, b->_02, b->_01, b->_00
};
return *(unsigned int *)&c;
}
};

Consider casting as providing a different layout stencil on memory.
Using this stencil picture, the code is a layout of a stencil of 32-bits on an unsigned integer memory location.
So instead of treating the memory as a uint32_t, it is treating the memory as 32 bits.
A pointer to the 32-bit structure is created.
The pointer is assigned to the same memory location as the uint32_t variable.
The pointer will allow different treatment of the memory location.
A temporary variable, of 32-bits (using the structure), is created.
The variable is initialized using an initialization list.
The bit fields in the initialization list are from the original variable, listed in reverse order.
So, in the list:
new bit 0 <-- old bit 31
new bit 1 <-- old bit 30
The foundation of this approach relies on initialization lists.
The author is letting the compiler reverse the bits.

The solution uses brute force to revert the bits.
It declares a bitfield structure (that's when the members are followed by :1) with 32 bit fields of one bit each.
The 32 bit input is then seen as such structure, by casting the address of the input to a pointer to the structure. Then c is declared as a variable of that type which is initialized by reverting the order of the bits.
Finally, the bitfield represented by c is reinterpreted as an integer and you're done.
The assembler is not very interesting, as the gcc explorer shows:
https://goo.gl/KYHDY6

It doesn't convert per see, but it just looks at the same memory address differently. It uses the value of the int n, but gets a pointer to that address, typecasts the pointer, and that way, you can interpret the number as a struct of 32 individual bits. So through this struct b you have access to the individual bits of the number.
Then, of a new struct c, each bit is bluntly set by putting bit 31 of the number in bit 0 of the output struct c, bit 30 in bit 1, etcetera.
After that, the value at the memory location of the struct is returned.

First of all, the posted code has a small bug. The line
return *(unsigned int *)&c;
will not return an accurate number if sizeof(unsigned int) is not equal to sizeof(uint32_t).
That line should be
return *(uint32_t*)&c;
Coming to the question of how it works, I will try to explain it with a smaller type, an uint8_t.
The function
uint8_t reverseBits(uint8_t n) {
struct bs
{
unsigned int _00:1; unsigned int _01:1; unsigned int _02:1; unsigned int _03:1;
unsigned int _04:1; unsigned int _05:1; unsigned int _06:1; unsigned int _07:1;
} *b = (bs*)&n,
c =
{
b->_07, b->_06, b->_05, b->_04
, b->_03, b->_02, b->_01, b->_00
};
return *(uint8_t *)&c;
}
uses a local struct. The local struct is defined as:
struct bs
{
unsigned int _00:1; unsigned int _01:1; unsigned int _02:1; unsigned int _03:1;
unsigned int _04:1; unsigned int _05:1; unsigned int _06:1; unsigned int _07:1;
};
That struct has eight members. Each member of the struct is a bitfield of width 1. The space required for an object of type bs is 8 bits.
If you separate the definition of the struct and the variables of that type, the function will be:
uint8_t reverseBits(uint8_t n) {
struct bs
{
unsigned int _00:1; unsigned int _01:1; unsigned int _02:1; unsigned int _03:1;
unsigned int _04:1; unsigned int _05:1; unsigned int _06:1; unsigned int _07:1;
};
bs *b = (bs*)&n;
bs c =
{
b->_07, b->_06, b->_05, b->_04
, b->_03, b->_02, b->_01, b->_00
};
return *(uint8_t *)&c;
}
Now, lets' say the input to the function is 0xB7, which is 1011 0111 in binary. The line
bs *b = (bs*)&n;
says:
Take the address of n ( &n )
Treat it like it is a pointer of type bs* ( (bs*)&n )
Assign the pointer to a variable. (bs *b =)
By doing that, we are able to pick each bit of n and get their values by using the members of b. At the end of that line,
The value of b->_00 is 1
The value of b->_01 is 0
The value of b->_02 is 1
The value of b->_03 is 1
The value of b->_04 is 0
The value of b->_05 is 1
The value of b->_06 is 1
The value of b->_07 is 1
The statement
bs c =
{
b->_07, b->_06, b->_05, b->_04
, b->_03, b->_02, b->_01, b->_00
};
simply creates c such that the bits of c are reversed from the bits of *b.
The line
return *(uint8_t *)&c;
says:
Take the address of c., whose value is the bit pattern 1110 1101.
Treat it like it is a pointer of type uint8_t*.
Dereference the pointer and return the resulting uint8_t
That returns an uint8_t whose value is bitwise reversed from the input argument.

This isn't exactly obfuscated but a comment or two would assist the innocent. The key is in the middle of the variable declarations, and the first step is to recognize that there is only one line of 'code' here, everything else is variable declarations and initialization.
Between declaration and initialization we find:
} *b = (bs*)&n,
c =
{
This declares a variable 'b' which is a pointer (*) to a struct "bs" just defined. It then casts the address of function argument 'n', a unit_32_t, to the type pointer-to-bs, and assigns it to 'b', effectively creating a union of uint_32_t and the bit array bs.
A second variable, an actual struct bs, named "c", is then declared, and it is initialized through the pointer 'b'. b->_31 initializes c._00, and so on.
So after "b" and "c" are created, in that order, there's nothing left to do but return the value of "c".
The author of the code, and the compiler, know that after a struct definition ends, variables of that type or related to that type can be created, before ";", and that's why #Thomas Matthews closes with, "The author is letting the compiler reverse the bits."

Related

Is it danger cast int * to unsigned int *

I have variable of type int *alen. Trying to pass it to function:
typedef int(__stdcall *Tfnc)(
unsigned int *alen
);
with casting
(*Tfnc)( (unsigned int *)alen )
Can I expect problems in case value is never negative?
Under the C++ standard, what you are doing is undefined behavior. The memory layout of unsigned and signed ints is not guaranteed to be compatible, as far as I know.
On most platforms (which use 2s complement integers), this will not be a problem.
The remaining issue is strict aliasing, where the compiler is free to presume that pointers to one type and pointers to another type are not pointers to the same thing.
typedef int(__stdcall *Tfnc)(
unsigned int *alen
);
int test() {
int x = 3;
Tfnc pf = [](unsigned int* bob) { *bob = 2; };
pf((unsigned int*)&x);
return x;
}
the above code might be allowed to ignore the modification to the x while it is modified through the unsigned int*, even on 2s complement hardware.
That is the price of undefined behavior.
No it won't be of any problem, until and unless the int value you pass is not negative.
But if the given value is negative then the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n where n is the number of bits used to represent the unsigned type).

Sizeof struct is not clear even with padding

In the following struct, I feel the size should be 20, but it's coming out to be 24.
class X {
unsigned int a;
unsigned int b;
double c;
unsigned int d;
};
Why can't the compiler arrange d from 16-20 bytes ?
This is so that c remains aligned on an 8-byte boundary in an array of X.
X x[2];
X[0]:
0-3 unsigned int a
4-7 unsigned int b
8-15 double c
16-19 unsigned int d
20-23 [PAD]
X[1]:
24-27 unsigned int a
28-31 unsigned int b
32-39 double c
40-43 unsigned int d
44-47 [PAD]
So, you can see that you need 4 bytes of pad between the array elements. This is captured by adding 4-bytes at the end of the object. If the second array element started at position 20, then you find that x[1].c was not 8 byte aligned.
So, finally, d does start at byte 16 as you expect, but the end of d is not the end of the object.
In POD structs like this, the address of the struct is the address of its first element. The size is the distance in memory between successive elements of an array of type X[]. For the double to be aligned in the second element of a hypothetical array, the first int and the entire struct must be aligned to the same strictness as double. This requires padding at the end.
On which platform you are running ? on my ubuntu machine it is giving me output 20 only !!
sizeof unsigned int is 4 and double is 8 and thats why the output is 20 on my machine!!
This varies from hardware to hardware !!

Limiting structures size by use of :

Why this piece of code is needed ?
typedef struct corr_id_{
unsigned int size:8;
unsigned int valueType:8;
unsigned int classId:8;
unsigned int reserved:8;
} CorrId;
I did some investigation around it and found that this way we are limiting the memory consumption to just what we need.
For E.g.
typedef struct corr_id_new{
unsigned int size;
unsigned int valueType;
unsigned int classId;
unsigned int reserved;
} CorrId_NEW;
typedef struct corr_id_{
unsigned int size:8;
unsigned int valueType:8;
unsigned int classId:8;
unsigned int reserved:8;
} CorrId;
int main(){
CorrId_NEW Obj1;
CorrId Obj2;
std::cout<<sizeof(Obj1)<<endl;
std::cout<<sizeof(Obj2)<<endl;
}
Output:-
16
4
I want to understand the real use case of such scenarios? why can't we declare the struct something like this,
typedef struct corr_id_new{
unsigned _int8 size;
unsigned _int8 valueType;
unsigned _int8 classId;
unsigned _int8 reserved;
} CorrId_NEW;
Does this has something to do with compiler optimizations? Or, what are the benefits of declaring the structure that way?
I want to understand the real use case of such scenarios?
For example, structure of status register of some CPU may look like this:
In order to represent it via structure, you could use bitfield:
struct CSR
{
unsigned N: 1;
unsigned Z: 1;
unsigned C: 1;
unsigned V: 1;
unsigned : 20;
unsigned I: 1;
unsigned : 2;
unsigned M: 5;
};
You can see here that fields are not multiplies of 8, so you can't use int8_t, or something similar.
Lets see a simple scenario,
typedef struct student{
unsigned int age:8; // max 8-bits is enough to store a students's age 255 years
unsigned int roll_no:16; //max roll_no can be 2^16, which long enough
unsigned int classId:4; //class ID can be 4-bits long (0-15), as per need.
unsigned int reserved:4; // reserved
};
Above case all work is done in 32-bits only.
But if you use just a integer it would have taken 4*32 bits.
If we take age as 32-bit integer, It can store in range of 0 to 2^32. But don't forget a normal person's age is just max 100 or 140 or 150 (even somebody studying in this age also), which needs max 8-bits to store, So why to waste remaining 24-bits.
You are right, the last structure definition with unsigned _int8 is almost equivalent to the definition using :8. Almost, because byte order can make a difference here, so you might find that the memory layout is reversed in the two cases.
The main purpose of the :8 notation is to allow the use of fractional bytes, as in
struct foo {
uint32_t a:1;
uint32_t b:2;
uint32_t c:3;
uint32_t d:4;
uint32_t e:5;
uint32_t f:6;
uint32_t g:7;
uint32_t h:4;
}
To minimize padding, I strongly suggest to learn the padding rules yourself, they are not hard to grasp. If you do, you can know that your version with unsigned _int8 does not add any padding. Or, if you don't feel like learning those rules, just use __attribute__((__packed__)) on your struct, but that may introduce a severe performance penalty.
It's often used with pragma pack to create bitfields with labels, e.g.:
#pragma pack(0)
struct eg {
unsigned int one : 4;
unsigned int two : 8;
unsigned int three : 16
};
Can be cast for whatever purpose to an int32_t, and vice versa. This might be useful when reading serialized data that follows a (language agnostic) protocol -- you extract an int and cast it to a struct eg to match the fields and field sizes defined in the protocol. You could also skip the conversion and just read an int sized chunk into such a struct, point being that the bitfield sizes match the protocol field sizes. This is extremely common in network programming -- if you want to send a packet following the protocol, you just populate your struct, serialize, and transmit.
Note that pragma pack is not standard C but it is recognized by various common compilers. Without pragma pack, however, the compiler is free to place padding between fields, reducing the use value for the purposes described above.

Define an enum to be smaller than one byte / Why is this struct larger than one byte?

I'd like to define an enum to be smaller than one byte while maintaining type safety.
Defining an enum as:
enum MyEnum : unsigned char
{
i ,j, k, w
};
I can shrink it to one byte, however I'd like to make it use only 2 bits since I will at most have 4 values in it. Can this be done?
In my struct where I use the enum, the following does not work
struct MyStruct
{
MyEnum mEnum : 2; // This will be 4 bytes in size
};
Thanks!
Update:
The questions comes from this scenario:
enum MyEnum : unsigned char
{
i ,j, k, w
};
struct MyStruct
{
union
{
signed int mXa:3;
unsigned int mXb:3;
};
union
{
signed int mYa:3;
unsigned int mYb:3;
};
MyEnum mEnum:2;
};
sizeof(MyStruct) is showing 9 bytes. Ideally I'd like the struct to be 1 bytes in size.
Update for implemented solution:
This struct is one byte and offers the same functionality and type safety:
enum MyEnum :unsigned char
{
i,j,k,w
};
struct MyStruct
{
union
{
struct { MyEnum mEnum:2; char mXa:3; char mXb:3;};
struct { MyEnum mEnum:2; unsigned char mYa:3; unsigned char mYb:3;};
};
};
As per standard definition, a types sizeof must be at least 1 byte. This is the smallest addressable unit of memory.
The feature of bitfields you are mentioning allows to define members of structures to have smaller sizes, but the struct itself may not be because
It must be of at least 1 byte too
Alignment considerations might need it to be even bigger
additionally you may not take the address of bitfield members, since as said above, a byte is the smallest addressable unit of memory (You can already see that by sizeofactually returning the number of bytes, not bits, so if you expected less than CHAR_BIT bits, sizeof would not even be able to express it).
bitfields can only share space if they use the same underlying type. And any unused bits are actually left unused; if the sum of bits in an unsigned int bitfield is 3 bits, it still takes 4 bytes total. Since both enums have unsigned int members, they're both 4 bytes, but since they are bitfields, they have an alignment of one. So the first enum is 4 bytes, and the second is four bytes, then the MyEnum is 1 byte. Since all of those have an alignment of one, no padding is needed.
Unfortunately, union doesn't really work with bitfields really at all. Bitfields are for integer types only. The most I could get your data to without serious redesign is 3 bytes: http://coliru.stacked-crooked.com/view?id=c6ad03c93d7893ca2095fabc7f72ca48-e54ee7a04e4b807da0930236d4cc94dc
enum MyEnum : unsigned char
{
i ,j, k, w
};
union MyUnion
{
signed char ma:3; //char to save memory
unsigned char mb:3;
};
struct MyStruct
{
MyUnion X;
MyUnion Y;
MyEnum mEnum;
}; //this structure is three bytes
In the complete redesign category, you have this: http://coliru.stacked-crooked.com/view?id=58269eef03981e5c219bf86167972906-e54ee7a04e4b807da0930236d4cc94dc
No. C++ defines "char" to be the smallest addressable unit of memory for the platform. You can't address 2 bits.
Bit packing 'Works for me'
#include <iostream>
enum MyEnum : unsigned char
{
i ,j, k, w
};
struct MyStruct
{
MyEnum mEnum : 2;
unsigned char val : 6;
};
int main()
{
std::cout << sizeof(MyStruct);
}
prints out 1. How / what are you measuring?
Edit: Live link
Are you doing something like having a pointer as the next thing in the struct? In which case, you'll have 30bits of dead space as pointers must be 4 byte aligned on most 32bit systems.
Edit: With your updated example, its the unions which are breaking you
enum MyEnum : unsigned char
{
i ,j, k, w
};
struct MyStruct
{
unsigned char mXb:3;
unsigned char mYb:3;
MyEnum mEnum:2;
};
Has size 1. I'm not sure how unions and bit packing work together though, so I'm no more help.

Why does char datatype behave like int datatype when we assign negative value to it?

I know that when we assign a negative value to a unsigned datatype then the two's compliment of it gets stored, that is the maximum value that the datatype can store minus the negative value we have assigned.
To test that, I have written a program which illustrates that, however I am not able to understand the behavior of char datatype.
#include <iostream>
using namespace std;
template<class T>
void compare(T a,T b)
{
cout<<dec<<"a:"<<(int)a<<"\tb:"<<(int)b<<endl; //first line
cout<<hex<<"a:"<<(int)a<<"\tb:"<<(int)b<<endl; //second line
if(a>b)
cout<<"a is greater than b"<<endl;
else
cout<<"b is greater than a"<<endl;
}
int main()
{
unsigned short as=2;
unsigned short bs=-4;
compare(as,bs);
unsigned int al = 2;
unsigned int bl =-4;
compare(al,bl);
char ac=2;
char bc=-4;
compare(ac,bc);
int ai =2;
int bi =-4;
compare(ai,bi);
}
Output is
a:2 b:65532
a:2 b:fffc
b is greater than a
a:2 b:-4
a:2 b:fffffffc
b is greater than a
a:2 b:-4
a:2 b:fffffffc
a is greater than b
a:2 b:-4
a:2 b:fffffffc
a is greater than b
The compare(...) function is called for times with arguments of different datatypes
unsigned short- 2 bytes , therefore -4 gets stored as 65532.
unsigned int - 4 bytes, however as we are trying to typecast it to int while outputting it it is shown -4 in the output, so it is tricking the compiler, however the hex output and the logical comparison result shows that the internal representation is in two's compliment.
char - 1 byte, this is where I am getting confused.
int - 4 bytes, signed datatype, nothing unexpected, normal result.
The question I have to ask is why char is behaving like a signed int?
Even though we are typecasting to int before outputting the first line in the result, why is char showing values similar to int, even when char is 1 byte and int 4 byte. unsigned short showed different value, because its memory requirement was 2 byte.
unsigned int and int is showing same result in the first line of the result, because both are 4 bytes, and the compiler gets tricked successfully, and is acceptable.
But why is char also showing the same value, as if its memory layout was the same as that of int?
And the logical comparison also shows that char does not behave as a unsigned datatype, but a signed one. unsigned datatypes are showing b as greater than one. While char is showing a is greater than b , in terms with signed datatype. Why?
Isn't char 1 byte unsigned datatype?
This is what I learnt when I did a course on C and C++ in by B.Tech degree.
Any explaination would be helpful.
The compiler used is mingw 2.19.1.
Isn't char 1 byte unsigned datatype?
Maybe, maybe not. The signedness of char is implementation-defined.
In your current implementation, it is obviously signed.
And in the output from the compare method, you get four bytes shown, because you cast to int for the output, so the char value -4 gets converted to the int value -4.