I am doing a little game physics networking project right now, and I am trying to optimize the packets I am sending using this guide:
https://gafferongames.com/post/snapshot_compression/
In the "Optimize Quaternions" section it says:
Don’t always drop the same component due to numerical precision issues. Instead, find the component with the largest absolute value and ENCODE its index using two bits [0,3] (0=x, 1=y, 2=z, 3=w), then send the index of the largest component and the smallest three components over the network
Now my question is, how do I encode an integer down to 2 bits... or have I misunderstood the task?
I know very little about compressing data, but reducing a 4 byte integer (32 bits) down to ONLY 2 bits seems a bit insane to me. Is that even possible, or have I completely misunderstood everything?
EDIT:
Here is some code of what I have so far:
void HavNetConnection::sendBodyPacket(HavNetBodyPacket bp)
{
RakNet::BitStream bsOut;
bsOut.Write((RakNet::MessageID)ID_BODY_PACKET);
float maxAbs = std::abs(bp.rotation(0));
int maxIndex = 0;
for (int i = 1; i < 4; i++)
{
float rotAbs = std::abs(bp.rotation(i));
if (rotAbs > maxAbs) {
maxAbs = rotAbs;
maxIndex = i;
}
}
bsOut.Write(bp.position(0));
bsOut.Write(bp.position(1));
bsOut.Write(bp.position(2));
bsOut.Write(bp.linearVelocity(0));
bsOut.Write(bp.linearVelocity(1));
bsOut.Write(bp.linearVelocity(2));
bsOut.Write(bp.rotation(0));
bsOut.Write(bp.rotation(1));
bsOut.Write(bp.rotation(2));
bsOut.Write(bp.rotation(3));
bsOut.Write(bp.bodyId.toRawInt(bp.bodyId));
bsOut.Write(bp.stepCount);
// Send body packets over UDP (UNRELIABLE), priority could be low.
m_peer->Send(&bsOut, MEDIUM_PRIORITY, UNRELIABLE,
0, RakNet::UNASSIGNED_SYSTEM_ADDRESS, true);
}
The simplest solution to your problem is to use bitfields:
// working type (use your existing Quaternion implementation instead)
struct Quaternion{
float w,x,y,z;
Quaternion(float w_=1.0f, float x_=0.0f, float y_=0.0f, float z_=0.0f) : w(w_), x(x_), y(y_), z(z_) {}
};
struct PacketQuaternion
{
enum LargestElement{
W=0, X=1, Y=2, Z=3,
};
LargestElement le : 2; // 2 bits;
signed int i1 : 9, i2 : 9, i3 : 9; // 9 bits each
PacketQuaternion() : le(W), i1(0), i2(0), i3(0) {}
operator Quaternion() const { // convert packet quaternion to regular quaternion
const float s = 1.0f/float(1<<8); // scale int to [-1, 1]; you could also scale to [-sqrt(.5), sqrt(.5)]
const float f1=s*i1, f2 = s*i2, f3 = s*i3;
const float f0 = std::sqrt(1.0f - f1*f1-f2*f2-f3*f3);
switch(le){
case W: return Quaternion(f0, f1, f2, f3);
case X: return Quaternion(f1, f0, f2, f3);
case Y: return Quaternion(f1, f2, f0, f3);
case Z: return Quaternion(f1, f2, f3, f0);
}
return Quaternion(); // default, can't happen
}
};
If you have a look at the assembler code this generates, you will see a bit of shifting to extract le and i1 to i3 -- essentially the same code you could write manually as well.
Your PacketQuaternion structure will always occupy a whole number of bytes, so (on any non-exotic platform) you will still waste 3 bits (you could just use 10 bits per integer field here, unless you have other use for those bits).
I left out the code to convert from regular quaternion to PacketQuaternion, but that should be relatively simple as well.
Generally (as always when networking is involved), be extra careful that data is converted correctly in all directions, especially, if different architectures or different compilers are involved!
Also, as others have noted, make sure that network bandwidth indeed is a bottle neck before doing aggressive optimization here.
I'm guessing they want you to fit the 2 bits into some value you are already sending that doesn't need all of the available bits, or to pack several small bit fields into a single int for transmission.
You can do things like this:
// these are going to be used as 2 bit fields,
// so we can only go to 3.
enum addresses
{
x = 0, // 00
y = 1, // 01
z = 2, // 10
w = 3 // 11
};
int val_to_send;
// set the value to send, and shift it 2 bits left.
val_to_send = 1234;
// bit pattern: 0000 0100 1101 0010
// bit shift left by 2 bits
val_to_send = val_to_send << 2;
// bit pattern: 0001 0011 0100 1000
// set the address to the last 2 bits.
// this value is address w (bit pattern 11) for example...
val_to_send |= w;
// bit pattern: 0001 0011 0100 1011
send_value(val_to_send);
On the receive end:
receive_value(&rx_value);
// pick off the address by masking with the low 2 bits
address = rx_value & 0x3;
// address now = 3 (w)
// bit shift right to restore the value
rx_value = rx_value >> 2;
// rx_value = 1234 again.
You can 'pack' bits this way, any number of bits at a time.
int address_list;
// set address to w (11)
address_list = w;
// 0000 0011
// bit shift left by 2 bits
address_list = address_list << 2;
// 0000 1100
// now add address x (00)
address_list |= x;
// 0000 1100
// bit shift left 2 more bits
address_list = address_list << 2;
// 0011 0000
// add the address y (01)
address_list |= y;
// 0011 0001
// bit shift left 2 more bits
address_list = address_list << 2;
// 1100 0100
// add the address z. (10)
address_list |= z;
// 1100 0110
// w x y z are now in the lower byte of 'address_list'
This packs 4 addresses into the lower byte of 'address_list';
You just have to do the unpacking on the other end.
This has some implementation details to work out. You only have 30 bits now for the value, not 32. If the data is a signed int, you have more work to do to avoid shifting the sign bit out to the left, etc.
But, fundamentally, this is how you can stuff bit patterns into data that you are sending.
Obviously this assumes that sending is more expensive than the work of packing bits into bytes and ints, etc. This is often the case, especially where low baud rates are involved, as in serial ports.
There are a lot of possible understandings and misunderstandings in play here.
ttemple addressed your technical problem of sending less than a byte.
I want to reiterate the more theoretical points.
This is not done
You originally misunderstood the quoted passage.
We do not use two bits to say “not sending 2121387”,
but to say “not sending z-component”.
That these match exactly, should be easy to see.
This is impossible
If you want to send a 32 bit integer which might take any of the 2^32 possible values,
you need at least 32 bits.
As n bits can represent at most exactly 2^n states,
any smaller amount of bits just will not suffice.
This is kinda possible
Beyond your actual question:
When we relax the requirement that we will always use 2 bits
and have sufficiently strong assumptions
on the probability distribution of the values,
we can get the expected value of the number of bits down.
Ideas like this are used all over the place in the linked article.
Example
Let c be some integer that is 0 almost all the time (97%, say)
and can take any value the rest of the time (3%).
Then we can take one bit to say whether “c is zero”
and need no further bits most of the time.
In the cases where c is not zero,
we spend another 32 bits to encode it regularly.
In total we need 0.97*1+0.03*(1+32) = 1.96 bits on average.
But we need 33 bits sometimes,
which makes this compatible with my earlier assertion of impossibility.
This is complicated
Depending on your background (in math, bit-fiddling etc.) it might just seem like an enormous, unknowable piece of black magic.
(It isn't. You can learn this stuff.)
You do not seem completely lost and a quick learner
but I agree with Remy Lebeau
that you seem to be out of your depth.
Do you really need to do this?
Or are you optimizing prematurely?
If it runs well enough, let it run.
Concentrate on the important stuff.
Related
I want to be able to merge bytes from two unsigned long parameters, taking exactly half of the bytes, the half that starts with the least significant byte of the second param and the rest of the first param.
For example:
x = 0x89ABCDEF12893456
y = 0x76543210ABCDEF19
result_merged = 0x89ABCDEFABCDEF19
First, I need to check whether the system that I work on is little endian or big endian. I already wrote a function that checks that, called is_big_endian().
now I know that char char *c = (char*) &y will give me the "first"(MSB) or "last"(LSB) (depends whether is big endian or not) byte of y.
Now, I do want to use AND(&) bitwise operator to merge x and y bytes, the question is how can I get only half of the bytes, starting from the LSB.
I mean I can use a "for" loop to go over size_of and then split by 2, but i'm confused how exactly should I do it.
And I also thought about "masking" the bytes, because I already know for sure that the given parameters are "long" which means 16 bits. so maybe I can mask them in the following way?
I want to be able to use it both on 32 and 64 bit systems, which means my code is wrong because i'm using here a fixed size of 64 bit long although I don't know what is the system that the code runs on.
I thought about using an array to store all the bits or maybe use shifting?
unsigned long merge_bytes(unsigned long x, unsigned long int y)
{
if (is_big_endian() ==0) {
//little endian system
return (y & 0xFFFFFFFF00000000) | (x & 0xFFFFFFFFFFFF);
}
else
{
return (y & 0x00000000FFFFFFFF) | (x & 0xFFFFFFFFFFFF);
}
}
I have "masked" the right side of the bits if that's a little endian system because the LSB there is the furthest to the left bit.
And did the opposite if this is a big endian system.
any help would be appreciated.
Your code is almost correct. You want this:
merged = (y & 0x00000000ffffffff) | (x & 0xffffffff00000000);
There is no need to distinguish between big and little endian. The high bits of a value are the high bits of the value.
The difference is only the representation in memory.
Example: storage of the value 0x12345678 at memory location 0x0000
Little endian:
Address byte
-------------
0000 78
0001 56
0002 34
0003 12
Big endian:
Address byte
-------------
0000 12
0001 34
0002 56
0003 78
So I have a little piece of code that takes 2 uint8_t's and places then next to each other, and then returns a uint16_t. The point is not adding the 2 variables, but putting them next to each other and creating a uint16_t from them.
The way I expect this to work is that when the first uint8_t is 0, and the second uint8_t is 1, I expect the uint16_t to also be one.
However, this is in my code not the case.
This is my code:
uint8_t *bytes = new uint8_t[2];
bytes[0] = 0;
bytes[1] = 1;
uint16_t out = *((uint16_t*)bytes);
It is supposed to make the bytes uint8_t pointer into a uint16_t pointer, and then take the value. I expect that value to be 1 since x86 is little endian. However it returns 256.
Setting the first byte to 1 and the second byte to 0 makes it work as expected. But I am wondering why I need to switch the bytes around in order for it to work.
Can anyone explain that to me?
Thanks!
There is no uint16_t or compatible object at that address, and so the behaviour of *((uint16_t*)bytes) is undefined.
I expect that value to be 1 since x86 is little endian. However it returns 256.
Even if the program was fixed to have well defined behaviour, your expectation is backwards. In little endian, the least significant byte is stored in the lowest address. Thus 2 byte value 1 is stored as 1, 0 and not 0, 1.
Does endianess also affect the order of the bit's in the byte or not?
There is no way to access a bit by "address"1, so there is no concept of endianness. When converting to text, bits are conventionally shown most significant on left and least on right; just like digits of decimal numbers. I don't know if this is true in right to left writing systems.
1 You can sort of create "virtual addresses" for bits using bitfields. The order of bitfields i.e. whether the first bitfield is most or least significant is implementation defined and not necessarily related to byte endianness at all.
Here is a correct way to set two octets as uint16_t. The result will depend on endianness of the system:
// no need to complicate a simple example with dynamic allocation
uint16_t out;
// note that there is an exception in language rules that
// allows accessing any object through narrow (unsigned) char
// or std::byte pointers; thus following is well defined
std::byte* data = reinterpret_cast<std::byte*>(&out);
data[0] = 1;
data[1] = 0;
Note that assuming that input is in native endianness is usually not a good choice, especially when compatibility across multiple systems is required, such as when communicating through network, or accessing files that may be shared to other systems.
In these cases, the communication protocol, or the file format typically specify that the data is in specific endianness which may or may not be the same as the native endianness of your target system. De facto standard in network communication is to use big endian. Data in particular endianness can be converted to native endianness using bit shifts, as shown in Frodyne's answer for example.
In a little endian system the small bytes are placed first. In other words: The low byte is placed on offset 0, and the high byte on offset 1 (and so on). So this:
uint8_t* bytes = new uint8_t[2];
bytes[0] = 1;
bytes[1] = 0;
uint16_t out = *((uint16_t*)bytes);
Produces the out = 1 result you want.
However, as you can see this is easy to get wrong, so in general I would recommend that instead of trying to place stuff correctly in memory and then cast it around, you do something like this:
uint16_t out = lowByte + (highByte << 8);
That will work on any machine, regardless of endianness.
Edit: Bit shifting explanation added.
x << y means to shift the bits in x y places to the left (>> moves them to the right instead).
If X contains the bit-pattern xxxxxxxx, and Y contains the bit-pattern yyyyyyyy, then (X << 8) produces the pattern: xxxxxxxx00000000, and Y + (X << 8) produces: xxxxxxxxyyyyyyyy.
(And Y + (X<<8) + (Z<<16) produces zzzzzzzzxxxxxxxxyyyyyyyy, etc.)
A single shift to the left is the same as multiplying by 2, so X << 8 is the same as X * 2^8 = X * 256. That means that you can also do: Y + (X*256) + (Z*65536), but I think the shifts are clearer and show the intent better.
Note that again: Endianness does not matter. Shifting 8 bits to the left will always clear the low 8 bits.
You can read more here: https://en.wikipedia.org/wiki/Bitwise_operation. Note the difference between Arithmetic and Logical shifts - in C/C++ unsigned values use logical shifts, and signed use arithmetic shifts.
If p is a pointer to some multi-byte value, then:
"Little-endian" means that the byte at p is the least-significant byte, in other words, it contains bits 0-7 of the value.
"Big-endian" means that the byte at p is the most-significant byte, which for a 16-bit value would be bits 8-15.
Since the Intel is little-endian, bytes[0] contains bits 0-7 of the uint16_t value and bytes[1] contains bits 8-15. Since you are trying to set bit 0, you need:
bytes[0] = 1; // Bits 0-7
bytes[1] = 0; // Bits 8-15
Your code works but your misinterpreted how to read "bytes"
#include <cstdint>
#include <cstddef>
#include <iostream>
int main()
{
uint8_t *in = new uint8_t[2];
in[0] = 3;
in[1] = 1;
uint16_t out = *((uint16_t*)in);
std::cout << "out: " << out << "\n in: " << in[1]*256 + in[0]<< std::endl;
return 0;
}
By the way, you should take care of alignment when casting this way.
One way to think in numbers is to use MSB and LSB order
which is MSB is the highest Bit and LSB ist lowest Bit for
Little Endian machines.
For ex.
(u)int32: MSB:Bit 31 ... LSB: Bit 0
(u)int16: MSB:Bit 15 ... LSB: Bit 0
(u)int8 : MSB:Bit 7 ... LSB: Bit 0
with your cast to a 16Bit value the Bytes will arrange like this
16Bit <= 8Bit 8Bit
MSB ... LSB BYTE[1] BYTE[0]
Bit15 Bit0 Bit7 .. 0 Bit7 .. 0
0000 0001 0000 0000 0000 0001 0000 0000
which is 256 -> correct value.
In the below code, the variable Speed is of type int. How is it stored in two variables of char type? I also don't understand the comment // 16 bits - 2 x 8 bits variables.
Can u explain me with example for the type conversion because when I run the code it shows symbols after type conversion
AX12A::turn(unsigned char ID, bool SIDE, int Speed)
{
if (SIDE == LEFT)
{
char Speed_H,Speed_L;
Speed_H = Speed >> 8;
Speed_L = Speed; // 16 bits - 2 x 8 bits variables
}
}
main(){
ax12a.turn(ID,Left,200)
}
It seems like on your platform, a variable of type int is stored on 16 bits and a variable of type char is stored on 8 bits.
This does not always happen, as the C++ standard does not guarantee the size of these types. I made my assumption based on the code and the comment. Use data types of fixed size, such as the ones described here, to make sure this assumption is always going to be true.
Both int and char are integral types. When converting from a larger integral type to a smaller integral type (e.g. int to char), the most significant bits are discarded, and the least significant bits are kept (in this case, you keep the last 8 bits).
Before fully understanding the code, you also need to know about right shift. This simply moves the bits to the right (for the purpose of this answer, it does not matter what is inserted to the right). Therefore, the least significant bit (the rightmost bit) is discarded, every other bit is moved one space to the right. Very similar to division by 10 in the decimal system.
Now, you have your variable Speed, which has 16 bits.
Speed_H = Speed >> 8;
This shifts Speed with 8 bits to the right, and then assigns the 8 least significant bits to Speed_H. This basically means that you will have in Speed_H the 8 most significant bits (the "upper" half of Speed).
Speed_L = Speed;
Simply assigns to Speed_L the least significant 8 bits.
The comment basically states that you split a variable of 16 bits into 2 variables of 8 bits, with the first (most significant) 8 bits being stored in Speed_H and the last (least significant) 8 bits being stored in Speed_L.
From your code I understand that sizeof(int) = 2 bytes in your case.
Let us take example as shown below.
int my_var = 200;
my_var is allocated 2 bytes of memory address because datatype is ‘int’.
value assigned to my_var is 200.
Note that 200 decimal = 0x00C8 Hexadecimal = 0000 0000 1100 1000 binary
Higher byte 0000 0000 binary is stored in one of the addresses allocated to my_var
And lower byte 1100 1000 is stored in other address depending on endianness.
To know about endianness, check this link
https://www.geeksforgeeks.org/little-and-big-endian-mystery/
In your code :
int Speed = 200;
Speed_H = Speed >> 8;
=> 200 decimal value right shifted 8 times
=> that means 0000 0000 1100 1000 binary value right shifted by 8 bits
=> that means Speed_H = 0000 0000 binary
Speed_L = Speed;
=> Speed_L = 200;
=> Speed_L = 0000 0000 1100 1000 binary
=> Speed_L is of type char so it can accommodate only one byte
=> The value 0000 0000 1100 1000 will be narrowed (in other words "cut-off") to least significant byte and assigned to Speed_L.
=> Speed_L = 1100 1000 binary = 200 decimal
I am learning bare metal programming in c++ and it often involves setting a portion of a 32 bit hardware register address to some combination.
For example for an IO pin, I can set the 15th to 17th bit in a 32 bit address to 001 to mark the pin as an output pin.
I have seen code that does this and I half understand it based on an explanation of another SO question.
# here ra is a physical address
# the 15th to 17th bits are being
# cleared by AND-ing it with a value that is one everywhere
# except in the 15th to 17th bits
ra&=~(7<<12);
Another example is:
# this clears the 21st to 23rd bits of another address
ra&=~(7<<21);
How do I choose the 7 and how do I choose the number of bits to shift left?
I tried this out in python to see if I can figure it out
bin((7<<21)).lstrip('-0b').zfill(32)
'00000000111000000000000000000000'
# this has 8, 9 and 10 as the bits which is wrong
The 7 (base 10) is chosen as its binary representation is 111 (7 in base 2).
As for why it's bits 8, 9 and 10 set it's because you're reading from the wrong direction. Binary, just as normal base 10, counts right to left.
(I'd left this as a comment but reputation isn't high enough.)
If you want to isolate and change some bits in a register but not all you need to understand the bitwise operations like and and or and xor and not operate on a single bit column, bit 3 of each operand is used to determine bit 3 of the result, no other bits are involved. So I have some bits in binary represented by letters since they can each either be a 1 or zero
jklmnopq
The and operation truth table you can look up, anything anded with zero is a zero anything anded with one is itself
jklmnopq
& 01110001
============
0klm000q
anything orred with one is a one anything orred with zero is itself.
jklmnopq
| 01110001
============
j111nop1
so if you want to isolate and change two bits in this variable/register say bits 5 and 6 and change them to be a 0b10 (a 2 in decimal), the common method is to and them with zero then or them with the desired value
76543210
jklmnopq
& 10011111
============
j00mnopq
jklmnopq
| 01000000
============
j10mnopq
you could have orred bit 6 with a 1 and anded bit 5 with a zero, but that is specific to the value you wanted to change them to, generically we think I want to change those bits to a 2, so to use that value 2 you want to zero the bits then force the 2 onto those bits, and them to make them zero then orr the 2 onto the bits. generic.
In c
x = read_register(blah);
x = (x&(~(3<<5)))|(2<<5);
write_register(blah,x);
lets dig into this (3 << 5)
00000011
00000110 1
00001100 2
00011000 3
00110000 4
01100000 5
76543210
that puts two ones on top of the bits we are interested in but anding with that value isolates the bits and messes up the others so to zero those and not mess with the other bits in the register we need to invert those bits
using x = ~x inverts those bits a logical not operation.
01100000
10011111
Now we have the mask we want to and with our register as shown way above, zeroing the bits in question while leaving the others alone j00mnopq
Now we need to prep the bits to or (2<<5)
00000010
00000100 1
00001000 2
00010000 3
00100000 4
01000000 5
Giving the bit pattern we want to orr in giving j10mnopq which we write back to the register. Again the j, m, n, ... bits are bits they are either a one or a zero and we dont want to change them so we do this extra masking and shifting work. You may/will sometimes see examples that simply write_register(blah,2<<5); either because they know the state of the other bits, know they are not using those other bits and zero is okay/desired or dont know what they are doing.
x read_register(blah); //bits are jklmnopq
x = (x&(~(3<<5)))|(2<<5);
z = 3
z = z << 5
z = ~z
x = x & z
z = 2
z = z << 5
x = x | z
z = 3
z = 00000011
z = z << 5
z = 01100000
z = ~z
z = 10011111
x = x & z
x = j00mnopq
z = 2
z = 00000010
z = z << 5
z = 01000000
x = x | z
x = j10mnopq
if you have a 3 bit field then the binary is 0b111 which in decimal is the number 7 or hex 0x7. a 4 bit field 0b1111 which is decimal 15 or hex 0xF, as you get past 7 it is easier to use hex IMO. 6 bit field 0x3F, 7 bit field 0x7F and so on.
You can take this further in a way to try to be more generic. If there is a register that controls some function for gpio pins 0 through say 15. starting with bit 0. If you wanted to change the properties for gpio pin 5 then that would be bits 10 and 11, 5*2 = 10 there are two pins so 10 and the next one 11. But generically you could:
x = (x&(~(0x3<<(pin*2)))) | (value<<(pin*2));
since 2 is a power of 2
x = (x&(~(0x3<<(pin<<1)))) | (value<<(pin<<1));
an optimization the compiler might do for if pin cannot be reduced to a specific value at compile time.
but if it were 3 bits per field and the fields start aligned with bit zero
x = (x&(~(0x7<<(pin*3)))) | (value<<(pin*3));
which the compiler might do a multiply by 3 but maybe instead just
pinshift = (pinshift<<1)|pinshift;
to get the multiply by three. depends on the compiler and instruction set.
overall this is called a read modify write as you read something, modify some of it, then write back (if you were modifying all of it you wouldnt need to bother with a read and a modify you would write the whole new value). And folks will say masking and shifting to generically cover isolating bits in a variable either for modification purposes or if you wanted to read/see what those two bits above were you would
x = read_register(blah);
x = x >> 5;
x = x & 0x3;
or mask first then shift
x = x & (0x3<<5);
x = x >> 5;
six of one half a dozen of another, both are equal in general, some instruction sets one might be more efficient than another (or might be equal and then shift, or shift then and). One might make more sense visually to some folks rather than the other.
Although technically this is an endian thing as some processors bit 0 is the most significant bit. In C AFAIK bit 0 is the least significant bit. If/when a manual shows the bits laid out left to right you want your right and left shifts to match that, so as above I showed 76543210 to indicate the documented bits and associated that with jklmnopq and that was the left to right information that mattered to continue the conversation about modifying bits 5 and 6. some documents will use verilog or vhdl style notation 6:5 (meaning bits 6 to 5 inclusive, makes more sense with say 4:2 meaning bits 4,3,2) or [6 downto 5], more likely to just see a visual picture with boxes or lines to show you what bits are what field.
How do I choose the 7
You want to clear three adjacent bits. Three adjacent bits at the bottom of a word is 1+2+4=7.
and how do I choose the number of bits to shift left
You want to clear bits 21-23, not bits 1-3, so you shift left another 20.
Both your examples are wrong. To clear 15-17 you need to shift left 14, and to clear 21-23 you need to shift left 20.
this has 8, 9,and 10 ...
No it doesn't. You're counting from the wrong end.
This question already has answers here:
What's the best way to toggle the MSB?
(4 answers)
Closed 8 years ago.
If, for example, I have the number 20:
0001 0100
I want to set the highest valued 1 bit, the left-most, to 0.
So
0001 0100
will become
0000 0100
I was wondering which is the most efficient way to achieve this.
Preferrably in c++.
I tried substracting from the original number the largest power of two like this,
unsigned long long int originalNumber;
unsigned long long int x=originalNumber;
x--;
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
x++;
x >>= 1;
originalNumber ^= x;
,but i need something more efficient.
The tricky part is finding the most significant bit, or counting the number of leading zeroes. Everything else is can be done more or less trivially with left shifting 1 (by one less), subtracting 1 followed by negation (building an inverse mask) and the & operator.
The well-known bit hacks site has several implementations for the problem of finding the most significant bit, but it is also worth looking into compiler intrinsics, as all mainstream compilers have an intrinsic for this purpose, which they implement as efficiently as the target architecture will allow (I tested this a few years ago using GCC on x86, came out as single instruction). Which is fastest is impossible to tell without profiling on your target architecture (fewer lines of code, or fewer assembly instructions are not always faster!), but it is a fair assumption that compilers implement these intrinsics not much worse than you'll be able to implement them, and likely faster.
Using an intrinsic with a somewhat intellegible name may also turn out easier to comprehend than some bit hack when you look at it 5 years from now.
Unluckily, although a not entirely uncommon thing, this is not a standardized function which you'd expect to find in the C or C++ libraries, at least there is no standard function that I'm aware of.
For GCC, you're looking for __builtin_clz, VisualStudio calls it _BitScanReverse, and Intel's compiler calls it _bit_scan_reverse.
Alternatively to counting leading zeroes, you may look into what the same Bit Twiddling site has under "Round up to the next power of two", which you would only need to follow up with a right shift by 1, and a NAND operation. Note that the 5-step implementation given on the site is for 32-bit integers, you would have to double the number of steps for 64-bit wide values.
#include <limits.h>
uint32_t unsetHighestBit(uint32_t val) {
for(uint32_t i = sizeof(uint32_t) * CHAR_BIT - 1; i >= 0; i--) {
if(val & (1 << i)) {
val &= ~(1 << i);
break;
}
}
return val;
}
Explanation
Here we take the size of the type uint32_t, which is 4 bytes. Each byte has 8 bits, so we iterate 32 times starting with i having values 31 to 0.
In each iteration we shift the value 1 by i to the left and then bitwise-and (&) it with our value. If this returns a value != 0, the bit at i is set. Once we find a bit that is set, we bitwise-and (&) our initial value with the bitwise negation (~) of the bit that is set.
For example if we have the number 44, its binary representation would be 0010 1100. The first set bit that we find is bit 5, resulting in the mask 0010 0000. The bitwise negation of this mask is 1101 1111. Now when bitwise and-ing & the initial value with this mask, we get the value 0000 1100.
In C++ with templates
This is an example of how this can be solved in C++ using a template:
#include <limits>
template<typename T> T unsetHighestBit(T val) {
for(uint32_t i = sizeof(T) * numeric_limits<char>::digits - 1; i >= 0; i--) {
if(val & (1 << i)) {
val &= ~(1 << i);
break;
}
}
return val;
}
If you're constrained to 8 bits (as in your example), then just precalculate all possible values in an array (byte[256]) using any algorithm, or just type it in by hand.
Then you just look up the desired value:
x = lookup[originalNumber]
Can't be much faster than that. :-)
UPDATE: so I read the question wrong.
But if using 64 bit values, then break it apart into 8 bytes, maybe by casting it to a byte[8] or overlaying it in a union or something more clever. After that, find the first byte which are not zero and do as in my answer above with that particular byte. Not as efficient I'm afraid, but still it is at most 8 tests (and in average 4.5) + one lookup.
Of course, creating a byte[65536} lookup will double the speed.
The following code will turn off the right most bit:
bool found = false;
int bit, bitCounter = 31;
while (!found) {
bit = x & (1 << bitCounter);
if (bit != 0) {
x &= ~(1 << bitCounter);
found = true;
}
else if (bitCounter == 0)
found = true;
else
bitCounter--;
}
I know method to set more right non zero bit to 0.
a & (a - 1)
It is from Book: Warren H.S., Jr. - Hacker's Delight.
You can reverse your bits, set more right to zero and reverse back. But I do now know efficient way to invert bits in your case.