Counting Minimum Number of Bytes Required to Represent a Signed Integer - c++

My task seems simple, I need to calculate the minimum number of bytes required to represent a variable integer (for example if the integer is 5, then I would like to return 1; if the integer is 300, I would like to return 2). Not referring to the data type int which is, as pointed out in comments, always just sizeof(int), I'm referring to a mathematical integer. And I almost have a solution. Here is my code:
int i;
cin >> i;
int length = 0;
while (i != 0) {
i >>= 8;
length++;
}
The problem is that this doesn’t work for any negative numbers (I have not been able to determine why) or some positive numbers where the most significant bit is a 0 (because the sign bit is the bit that makes it one bit larger)... Is there any hints or advice I can get in how to account for those cases?

Stored as a single byte,
Positive numbers are in the range 0x00 to 0x7F
Negative numbers are in the range 0x80 to 0xFF
As 2-bytes,
Positive numbers are in the range 0x0000 to 0x7FFF
Negative numbers are in the range 0x8000 to 0xFFFF
As 4-bytes,
Positive numbers are in the range 0x00000000 to 0x7FFFFFFF
Negative numbers are in the range 0x80000000 to 0xFFFFFFFF
You can use a function like the following to get the minimum size:
int getmin(int64_t i)
{
if(i == (int8_t)(i & 0xFF))
return 1;
if(i == (int16_t)(i & 0xFFFF))
return 2;
if(i == (int32_t)(i & 0xFFFFFFFF))
return 4;
return 8;
}
Then for example, when you see 0x80, translate it to -128. While 7F is translated to 127, and 0x801 should be translated to a positive number.
Note that this will be very difficult and rather pointless, it should be avoided. This doesn't accommodate storage of numbers in triple bytes, for that, you have to make up your own format.

The range of signed numbers possible to store in x bytes in 2's complement is -2^(8*x-1) to 2^(8*x-1)-1. For example, 1 byte can store signed integers from -128 to 127. Your example would incorrectly calculate that only 1 byte is needed to represent 128 (if we are talking about signed numbers), as right shifting by 8 would equal zero, but that last byte is required to know that this is not a negative number.
For handling negatives, turning it into a positive number and subtracting one (because negative numbers can store an extra value) will allow you to right shift.
int i;
cin >> i;
unsigned bytes = 1;
unsigned max = 128;
if (i < 0) {
i = ~i; //-1*i - 1
}
while(max <= i) {
i >>= 8;
bytes++;
}
cout << bytes;
Another option is to use __builtin_clz() if you are using gcc. This returns the leading zeros, which you can then use to determine the minimum number of bytes.

Related

Set all meaningful unset bits of a number

Given an integer n(1≤n≤1018). I need to make all the unset bits in this number as set (i.e. only the bits meaningful for the number, not the padding bits required to fit in an unsigned long long).
My approach: Let the most significant bit be at the position p, then n with all set bits will be 2p+1-1.
My all test cases matched except the one shown below.
Input
288230376151711743
My output
576460752303423487
Expected output
288230376151711743
Code
#include<bits/stdc++.h>
using namespace std;
typedef long long int ll;
int main() {
ll n;
cin >> n;
ll x = log2(n) + 1;
cout << (1ULL << x) - 1;
return 0;
}
The precision of typical double is only about 15 decimal digits.
The value of log2(288230376151711743) is 57.999999999999999994994646087789191106964114967902921472132432244... (calculated using Wolfram Alpha)
Threfore, this value is rounded to 58 and this result in putting a bit 1 to higher digit than expected.
As a general advice, you should avoid using floating-point values as much as possible when dealing with integer values.
You can solve this with shift and or.
uint64_t n = 36757654654;
int i = 1;
while (n & (n + 1) != 0) {
n |= n >> i;
i *= 2;
}
Any set bit will be duplicated to the next lower bit, then pairs of bits will be duplicated 2 bits lower, then quads, bytes, shorts, int until all meaningful bits are set and (n + 1) becomes the next power of 2.
Just hardcoding the maximum of 6 shifts and ors might be faster than the loop.
If you need to do integer arithmetics and count bits, you'd better count them properly, and avoid introducing floating point uncertainty:
unsigned x=0;
for (;n;x++)
n>>=1;
...
(demo)
The good news is that for n<=1E18, x will never reach the number of bits in an unsigned long long. So the rest of you code is not at risk of being UB and you could stick to your minus 1 approach, (although it might in theory not be portable for C++ before C++20) ;-)
Btw, here are more ways to efficiently find the most significant bit, and the simple log2() is not among them.

how to count only 1bits in binary representation of number without loop

Suppose I have two numbers(minimum and maximum) . `
for example (0 and 9999999999)
maximum could be so huge. now I also have some other number. it could be between those minimum and maximum number. Let's say 15. now What I need to do is get all the multiples of 15(15,30,45 and so on, until it reaches the maximum number). and for each these numbers, I have to count how many 1 bits there are in their binary representations. for example, 15 has 4(because it has only 4 1bits).
The problem is, I need a loop in a loop to get the result. first loop is to get all multiples of that specific number(in our example it was 15) and then for each multiple, i need another loop to count only 1bits. My solution takes so much time. Here is how I do it.
unsigned long long int min = 0;
unsigned long long int max = 99999999;
unsigned long long int other_num = 15;
unsigned long long int count = 0;
unsigned long long int other_num_helper = other_num;
while(true){
if(other_num_helper > max) break;
for(int i=0;i<sizeof(int)*4;i++){
int buff = other_num_helper & 1<<i;
if(buff != 0) count++; //if bit is not 0 and is anything else, then it's 1bits.
}
other_num_helper+=other_num;
}
cout<<count<<endl;
Look at the bit patterns for the numbers between 0 and 2^3
000
001
010
011
100
101
110
111
What do you see?
Every bit is one 4 times.
If you generalize, you find that the numbers between 0 and 2^n have n*2^(n-1) bits set in total.
I am sure you can extend this reasoning for arbitrary bounds.
Here's how I do it for a 32 bit number.
std::uint16_t bitcount(
std::uint32_t n
)
{
register std::uint16_t reg;
reg = n - ((n >> 1) & 033333333333)
- ((n >> 2) & 011111111111);
return ((reg + (reg >> 3)) & 030707070707) % 63;
}
And the supporting comments from the program:
Consider a 3 bit number as being 4a + 2b + c. If we shift it right 1 bit, we have 2a + b. Subtracting this from the original gives 2a + b + c. If we right-shift the original 3-bit number by two bits, we get a, and so with another subtraction we have a + b + c, which is the number of bits in the original number.
The first assignment statement in the routine computes 'reg'. Each digit in the octal representation is simply the number of 1’s in the corresponding three bit positions in 'n'.
The last return statement sums these octal digits to produce the final answer. The key idea is to add adjacent pairs of octal digits together and then compute the remainder modulus 63.
This is accomplished by right-shifting 'reg' by three bits, adding it to 'reg' itself and ANDing with a suitable mask. This yields a number in which groups of six adjacent bits (starting from the LSB) contain the number of 1’s among those six positions in n. This number modulo 63 yields the final answer. For 64-bit numbers, we would have to add triples of octal digits and use modulus 1023.

Clear i to 0 bits

I'm working through Cracking the Coding Interview and one of the bit manipulations techniques is as follows:
To clear all bits from i through 0 (inclusive), we take a sequence of all 1s (which is -1) and shift it left by i + 1 bits. This gives us a sequence of 1s (in the most significant bits) followed by i 0 bits.
int clearBitsIthrough0(int num, int i){
int mask = (-1 << (i + 1));
return num & mask;
}
How is -1 a sequence of all 1s?
Assuming you are using C/C++, int represents a signed 32-bit integer represented with two's complement.
-1 by itself is assumed to be of type int, and therefore is equivalent to 0xFFFFFFFF. This is derived as follows:
1 is 0x00000001. Inverting the bits gives 0xFFFFFFFE, and adding one yields the two's complement representation of -1: 0xFFFFFFFF, a sequence of 32 ones.
You have: int mask = (-1 << (i - 1));
Seems you are missing a cast:
int mask = ((int)-1 << (i - 1));

How can you get the j first most significant bits of an integer in C++?

I know that to get the first j least significant bits of an integer you can do the following:
int res = (myInteger & ((1<<j)-1))
Can you do something similar for the most significant bits?
Simply right shift: (Warning, fails when you want 0 bits, but yours fails for all bits)
unsigned dropbits = CHAR_BIT*sizeof(int)-j;
//if you want the high bits moved to low bit position, use this:
ullong res = (ullong)myInteger >> dropbits;
//if you want the high bits in the origonal position, use this:
ullong res = (ullong)myInteger >> dropbits << dropbits;
Important! The cast must be the unsigned version of your type.
It's also good to note that your code for the lowest j bits fails when you ask it for all (32?) bits. As such, it can be easier to doubleshift:
unsigned dropbits = CHAR_BIT*sizeof(int)-j;
ullong res = (ullong)myInteger << dropbits >> dropbits;
See it working here: http://coliru.stacked-crooked.com/a/64eb843b3b255278 and here: http://coliru.stacked-crooked.com/a/29bc40188d852dd3
To get the j highest bits of an integer (or rather an unsigned integer, because bitwise operations in signed integers are a recipe for pain):
unsigned res = myUnsignedInteger & ~(~0u >> j);
~0u consists of only set bits. Shifting that j bits to the right gives us j zero-bits on the left side followed by one-bits, and inverting that gives us j one-bits on the left followed by zeroes, which is the mask we need to isolate the j highest bits of another integer.
Note: This is under the assumption that you want the isolated bits to remain in the same place, which is to say
(0xdeadbeef & ~(~0u >> 12)) == 0xdea00000

algorithm to figure out how many bytes are required to hold an int

sorry for the stupid question, but how would I go about figuring out, mathematically or using c++, how many bytes it would take to store an integer.
If you mean from an information theory point of view, then the easy answer is:
log(number) / log(2)
(It doesn't matter if those are natural, binary, or common logarithms, because of the division by log(2), which calculates the logarithm with base 2.)
This reports the number of bits necessary to store your number.
If you're interested in how much memory is required for the efficient or usual encoding of your number in a specific language or environment, you'll need to do some research. :)
The typical C and C++ ranges for integers are:
char 1 byte
short 2 bytes
int 4 bytes
long 8 bytes
If you're interested in arbitrary-sized integers, special libraries are available, and every library will have its own internal storage mechanism, but they'll typically store numbers via 4- or 8- byte chunks up to the size of the number.
You could find the first power of 2 that's larger than your number, and divide that power by 8, then round the number up to the nearest integer. So for 1000, the power of 2 is 1024 or 2^10; divide 10 by 8 to get 1.25, and round up to 2. You need two bytes to hold 1000!
If you mean "how large is an int" then sizeof(int) is the answer.
If you mean "how small a type can I use to store values of this magnitude" then that's a bit more complex. If you already have the value in integer form, then presumably it fits in 4, 3, 2, or 1 bytes. For unsigned values, if it's 16777216 or over you need 4 bytes, 65536-16777216 requires 3 bytes, 256-65535 needs 2, and 0-255 fits in 1 byte. The formula for this comes from the fact that each byte can hold 8 bits, and each bit holds 2 digits, so 1 byte holds 2^8 values, ie. 256 (but starting at 0, so 0-255). 2 bytes therefore holds 2^16 values, ie. 65536, and so on.
You can generalise that beyond the normal 4 bytes used for a typical int if you like. If you need to accommodate signed integers as well as unsigned, bear in mind that 1 bit is effectively used to store whether it is positive or negative, so the magnitude is 1 power of 2 less.
You can calculate the number of bits you need iteratively from an integer by dividing it by two and discarding the remainder. Each division you can make and still have a non-zero value means you have one more bit of data in use - and every 8 bits you're using means 1 byte.
A quick way of calculating this is to use the shift right function and compare the result against zero.
int value = 23534; // or whatever
int bits = 0;
while (value)
{
value >> 1;
++bits;
}
std::cout << "Bits used = " << bits << std::endl;
std::cout << "Bytes used = " << (bits / 8) + 1 << std::endl;
This is basically the same question as "how many binary digits would it take to store a number x?" All you need is the logarithm.
A n-bit integer can store numbers up to 2n-1. So, given a number x, ceil(log2 x) gets you the number of digits you need.
It's exactly the same thing as figuring out how many decimal digits you need to write a number by hand. For example, log10 123456 = 5.09151220... , so ceil( log10(123456) ) = 6, six digits.
Since nobody put up the simplest code that works yet, I mind as well do it:
unsigned int get_number_of_bytes_needed(unsigned int N) {
unsigned int bytes = 0;
while(N) {
N >>= 8;
++bytes;
};
return bytes;
};
assuming sizeof(long int) = 4.
int nbytes( long int x )
{
unsigned long int n = (unsigned long int) x;
if (n <= 0xFFFF)
{
if (n <= 0xFF) return 1;
else return 2;
}
else
{
if (n <= 0xFFFFFF) return 3;
else return 4;
}
}
The shortest code way to do this is as follows:
int bytes = (int)Math.Log(num, 256) + 1;
The code is small enough to be inlined, which helps offset the "slow" FP code. Also, there are no branches, which can be expensive.
Try this code:
// works for num >= 0
int numberOfBytesForNumber(int num) {
if (num < 0)
return 0;
else if (num == 0)
return 1;
else if (num > 0) {
int n = 0;
while (num != 0) {
num >>= 8;
n++;
}
return n;
}
}
/**
* assumes i is non-negative.
* note that this returns 0 for 0, when perhaps it should be special cased?
*/
int numberOfBytesForNumber(int i) {
int bytes = 0;
int div = 1;
while(i / div) {
bytes++;
div *= 256;
}
if(i % 8 == 0) return bytes;
return bytes + 1;
}
This code runs at 447 million tests / sec on my laptop where i = 1 to 1E9. i is a signed int:
n = (i > 0xffffff || i < 0) ? 4 : (i < 0xffff) ? (i < 0xff) ? 1 : 2 : 3;
Python example: no logs or exponents, just bit shift.
Note: 0 counts as 0 bits and only positive ints are valid.
def bits(num):
"""Return the number of bits required to hold a int value."""
if not isinstance(num, int):
raise TypeError("Argument must be of type int.")
if num < 0:
raise ValueError("Argument cannot be less than 0.")
for i in count(start=0):
if num == 0:
return i
num = num >> 1