This is kind of a curiosity.
I'm studying C++. I was asked to reproduce an infinite loop, for example one that prints a series of powers:
#include <iostream>
int main()
{
int powerOfTwo = 1;
while (true)
{
powerOfTwo *= 2;
cout << powerOfTwo << endl;
}
}
The result kinda troubled me. With the Python interpreter, for example, I used to get an effective infinite loop printing a power of two each time it iterates (until the IDE would stop for exceeding iteration's limit, of course). With this C++ program instead I get a series of 0. But, if I change this to a finite loop, and that is to say I only change the condition statement to:
(powerOfTwo <= 100)
the code works well, printing 2, 4, 16, ..., 128.
So my question is: why an infinite loop in C++ works in this way? Why it seems to not evaluate the while body at all?
Edit: I'm using Code::Blocks and compiling with g++.
In the infinite loop case you see 0 because the int overflows after 32 iterations to 0 and 0*2 == 0.
Look at the first few lines of output. http://ideone.com/zESrn
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824
-2147483648
0
0
0
In Python, integers can hold an arbitrary number of digits. C++ does not work this way, its integers only have a limited precision (normally 32 bits, but this depends on the platform). Multiplication by 2 is implemented by bitwise shifting an integer one bit to the left. What is happening is that you initially have only the first bit in the integer set:
powerOfTwo = 1; // 0x00000001 = 0b00000000000000000000000000000001
After your loop iterates 31 times, the bit will have shifted to the very last position in the integer.
powerOfTwo = -2147483648; // 0x80000000 = 0b10000000000000000000000000000000
The next multiplication by two, the bit is shifted all the way out of the integer (since it has limited precision), and you end up with zero.
powerOfTwo = 0; // 0x00000000 = = 0b00000000000000000000000000000000
From then on, you are stuck, since 0 * 2 is always 0. If you watch your program in "slow motion", you would see an initial burst of powers of 2, followed by an infinite loop of zeroes.
In Python, on the other hand, your code would work as expected - Python integers can expand to hold any arbitrary number of digits, so your single set bit will never "shift off the end" of the integer. The number will simply keep expanding so that the bit is never lost, and you will never wrap back around and get trapped at zero.
Actually it prints powers of two until powerOfTwo gets overflowed and becomes 0. Then 0*2 = 0 and so on. http://ideone.com/XUuHS
I c++ it has a limited size - so therefore is able to compute even if errror
but the whole true makes the case
In C++ you will cause an overflow pretty soon, your int variable won't be able to handle big numbers.
int: 4 bytes signed can handle the range –2,147,483,648 to 2,147,483,647
So as #freerider said, your compiler is maybe optimizing the code for you.
I guess you know all data-type concept in C,C++, so you are declaring powerOfTwo as a integer.
so the range of integer get followed accordingly, if you want an continuous loop you can use char as datatype and by using data conversion you can get infinite loop for you function.
Carefully examine the output of the program. You don't really get an infinite series of zeroes. You get 32 numbers, followed by an infinite series of zeroes.
The thirty-two numbers are the first thirty-two powers of two:
1
2
4
8
...
(2 raised to the 30th)
(2 raised to the 31st)
0
0
0
The problem is how C represents numbers, as finite quantities. Since your mathematical quantity is no longer representable in the C int, C puts some other number in its place. In particular, it puts the true value modulo 2^32. But 2^32 mod 2^32 is zero, so there you are.
Related
std::bit_width finds minimum bits required to represent an integral number x as 1+floor(log(x))
Why does std::bit_width return 0 for the value 0? Shouldn't it return 1, Since the number of bits required to represent 0 is 1?
Also, I think the 1 in the formula is an offset.
There is a strange bit of history to bit_width.
The function that would eventually become known as bit_width started life as log2, as part of a proposal adding integer power-of-two functions. log2 was specified to produce UB when passed 0.
Because that's how logarithms work.
But then, things changed. The function later became log2p1, and for reasons that are not specified was given a wider contract ("wide contract" in C++ parlance means that more stuff is considered valid input). Specifically, 0 is valid input, and yields the value of 0.
Which is not how logarithms work, but whatever.
As C++20 neared standardization, a name conflict was discovered (PDF). The name log2p1 happens to correspond to the name of an IEEE-754 algorithm, but it's a radically different one. Also, functions in other languages with similar inputs and results use a name like bit_length. So it was renamed to bit_width.
And since it's not pretending to do a logarithm anymore, the behavior at 0 can be whatever we want.
Indeed, the Python function int.bit_length has the exact same behavior. Leading zeros are not considered part of the bit length, and since a value of 0 contains all leading zeros...
Because mathematically it makes sense:
bit_width(x) = log2(round_up_to_nearest_integer_power_of_2(x + 1))
bit_width(0) = log2(round_up_to_nearest_integer_power_of_2(0 + 1))
= log2(1)
= 0
To elaborate what was said in the comments:
Assume "bit width" means "least number of bits required to store the (nonnegative integer) number". Intuitively we need at least log2(n) bits rounding up, so it is a formula close to ceil(log2(n)), so 255 would require ceil(log2(255)) = ceil(7.99..) = 8 bits, but this doesn't work for powers of 2, so we can add a fudge factor of 1 to n to get ceil(log2(n+1)). This happens to be mathematically equivalent to 1+floor(log2(n)) for positive n, but log2(0) is not defined or defined as something unuseful like negative infinitiy in the floor version.
If we use the ceiling formula for 0, we get the result. You can also see I didn't write out leading zeros, and as Nicol Bolas points out, 0 is all leading zeros.
n
bin(n)
bit_width(n)
8
1000
4
7
111
3
6
110
3
5
101
3
4
100
3
3
11
2
2
10
2
1
1
1
0
0
I am finding pow(2,i) where i can range: 0<=i<=100000.
Apart i have MOD=1000000007
powers[100000];
powers[0]=1;
for (i = 1; i <=100000; ++i)
{
powers[i]=(powers[i-1]*2)%MOD;
}
for i=100000 won't power value become greater than MOD ?
How do I store the power correctly?
The operation doesn't look feasible to me.
I am getting correct value up to i=70 max I guess.
I have to find sum+= ar[i]*power(2,i) and finally print sum%1000000007 where ar[i] is an additional array with some big numbers up to 10^5
As long as your modulus value is less than half the capacity of your data type, it will never be exceeded. That's because you take the previous value in the range 0..1000000006, double it, then re-modulo it bringing it back to that same range.
However, I can't guarantee that higher values won't cause you troubles, it's more mathematical analysis than I'm prepared to invest given the simple alternative. You could spend a lot of time analysing, checking and debugging, but it's probably better just to not allow the problem to occur in the first place.
The alternative? I'd tend to use the pre-generation method (having a program do the gruntwork up front, inserting the pre-generated values into an array easily and speedily accessible from your real program).
With this method, you can use tools that are well tested and known to work with massive values. Since this data is not going to change, it's useless calculating it every time your program starts.
If you want an easy (and efficient) way to do this, the following bash script in conjunction with bc and awk can do this:
#!/usr/bin/bash
bc >nums.txt <<EOF
i = 1;
for (x = 0;x <= 10000; x++) {
i % 1000000007;
i = i * 2;
}
EOF
awk 'BEGIN { printf "static int array[] = {" }
{ if (NR % 5 == 1) printf "\n ";
printf "%s, ",$0;
next
}
END { print "\n};" }' nums.txt
The bc part is the "meat" of the matter, it creates the large powers of two and outputs them modulo the number you provided. The awk part is simply to format them in C-style array elements, five per line.
Just take the output of that and put it into your code and, voila, there you have it, a compile-time-expensed array that you can use for fast lookup.
It takes only a second and a half on my box to generate the array and then you never need to do it again. You also won't have to concern yourself with the vagaries of modulo math :-)
static int array[] = {
1,2,4,8,16,
32,64,128,256,512,
1024,2048,4096,8192,16384,
32768,65536,131072,262144,524288,
1048576,2097152,4194304,8388608,16777216,
33554432,67108864,134217728,268435456,536870912,
73741817,147483634,294967268,589934536,179869065,
359738130,719476260,438952513,877905026,755810045,
511620083,23240159,46480318,92960636,185921272,
371842544,743685088,487370169,974740338,949480669,
898961331,797922655,595845303,191690599,383381198,
766762396,533524785,67049563,134099126,268198252,
536396504,72793001,145586002,291172004,582344008,
164688009,329376018,658752036,317504065,635008130,
270016253,540032506,80065005,160130010,320260020,
640520040,281040073,562080146,124160285,248320570,
:
861508356,723016705,446033403,892066806,784133605,
568267203,136534399,273068798,546137596,92275185,
184550370,369100740,738201480,476402953,952805906,
905611805,
};
If you notice that your modulo can be stored in int. MOD=1000000007(decimal) is equivalent of 0b00111011100110101100101000000111 and can be stored in 32 bits.
- i pow(2,i) bit representation
- 0 1 0b00000000000000000000000000000001
- 1 2 0b00000000000000000000000000000010
- 2 4 0b00000000000000000000000000000100
- 3 8 0b00000000000000000000000000001000
- ...
- 29 536870912 0b00100000000000000000000000000000
Tricky part starts when pow(2,i) is grater than your MOD=1000000007, but if you know that current pow(2,i) will be greater than your MOD, you can actually see how bits look like after MOD
- i pow(2,i) pow(2,i)%MOD bit representation
- 30 1073741824 73741817 0b000100011001010011000000000000
- 31 2147483648 147483634 0b001000110010100110000000000000
- 32 4294967296 294967268 0b010001100101001100000000000000
- 33 8589934592 589934536 0b100011001010011000000000000000
So if you have pow(2,i-1)%MOD you can do *2 actually on pow(2,i-1)%MOD till you're next pow(2,i) will be greater than MOD.
In example for i=34 you will use (589934536*2) MOD 1000000007 instead of (8589934592*2) MOD 1000000007, because 8589934592 can't be stored in int.
Additional you can try bit operations instead of multiplication for pow(2,i).
Bit operation same as multiplication for 2 is bit shift left.
After searching the net I came to know that the bit-wise version of the sieve of eratosthenes is pretty efficient.
The problem is I am unable to understand the math/method it is using.
The version that I have been busy with looks like this:
#define MAX 100000000
#define LIM 10000
unsigned flag[MAX>>6]={0};
#define ifc(n) (flag[n>>6]&(1<<((n>>1)&31))) //LINE 1
#define isc(n) (flag[n>>6]|=(1<<((n>>1)&31))) //LINE 2
void sieve() {
unsigned i, j, k;
for(i=3; i<LIM; i+=2)
if(!ifc(i))
for(j=i*i, k=i<<1; j<LIM*LIM; j+=k)
isc(j);
}
Points that I understood (Please correct me if I am wrong):
Statement in line 1 checks if the number is composite.
Statement in line 2 marks the number 'n' as composite.
The program is storing the value 0 or 1 at a bit of an int. This tends to reduce the memory usage to x/32. (x is the size that would have been used had an int been used to store the 0 or 1 instead of a bit like in my solution above)
Points that are going above my head as of now :
How is the finction in LINE 1 functioning.How is the function making sure that the number is composite or not.
How is function in LINE 2 setting the bit.
I also came to know that the bitwise sieve is timewise efficient as
well. Is it because of the use of bitwise operators only or
something else is contributing to it as well.
Any ideas or suggestions?
Technically, there is a bug in the code as well:
unsigned flag[MAX>>6]={0};
divides MAX by 64, but if MAX is not an exact multiple of 64, the array is one element short.
Line 1: Let's pick it apart:
(flag[n>>6]&(1<<((n>>1)&31)))
The flag[n>>6] (n >> 6 = n / 64) gives the 32-bit integer that holds the bit value for n / 2.
Since only "Odd" numbers are possible primes, divide n by two: (n>>1).
The 1<<((n>>1)&31) gives us the bit corresponding to n/2 within the 0..31 - (& 31 makes sure that it's "in range").
Finally, use & to combine the value on the left with the value on the right.
So, the result is true if element for n has bit number n modulo 32 set.
The second line is essentially the same concept, just that it uses |= (or equal) to set the bit corresponding to the multiple.
I am new to bit manipulations tricks and I wrote a simple code to see the output of doing single bit shifts on a single number viz. 2
#include <iostream>
int main(int argc, char *argv[])
{
int num=2;
do
{
std::cout<<num<<std::endl;
num=num<<1;//Left shift by 1 bit.
} while (num!=0);
return 0;
}
The output of this is the following.
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824
-2147483648
Obviously, continuously bit shifting to the left by 1 bit, will result in zero as it has done above, but why does the computer output a negative number at the very end before terminating the loop (since num turned zero)??
However when I replace int num=2 by unsigned int num=2 then I get the same output except
that the last number is this time displayed as positive i.e. 2147483648 instead of -2147483648
I am using the gcc compiler on Ubuntu Linux
That's because int is a signed integer. In the two's-complement representation, the sign of the integer is determined by the upper-most bit.
Once you have shifted the 1 into the highest (sign) bit, it flips negative.
When you use unsigned, there's no sign bit.
0x80000000 = -2147483648 for a signed 32-bit integer.
0x80000000 = 2147483648 for an unsigned 32-bit integer.
EDIT :
Note that strictly speaking, signed integer overflow is undefined behavior in C/C++. The behavior of GCC in this aspect is not completely consistent:
num = num << 1; or num <<= 1; usually behaves as described above.
num += num; or num *= 2; may actually go into an infinite loop on GCC.
Good question! The answer is rather simple though.
The maximum integer value is 2^31-1. The 31 (not 32) is there for a reason - the last bit on the integer is used for determining whether it's a positive or negative number.
If you keep shifting the bit to the left, you'll eventually hit this bit and it turns negative.
More information about this: http://en.wikipedia.org/wiki/Signed_number_representations
As soon as the bit reaches the sign bit of signed (most significant bit) it turns negative.
I am learning C/C++ programming & have encountered the usage of 'Bit arrays' or 'Bit Vectors'. Am not able to understand their purpose? here are my doubts -
Are they used as boolean flags?
Can one use int arrays instead? (more memory of course, but..)
What's this concept of Bit-Masking?
If bit-masking is simple bit operations to get an appropriate flag, how do one program for them? is it not difficult to do this operation in head to see what the flag would be, as apposed to decimal numbers?
I am looking for applications, so that I can understand better. for Eg -
Q. You are given a file containing integers in the range (1 to 1 million). There are some duplicates and hence some numbers are missing. Find the fastest way of finding missing
numbers?
For the above question, I have read solutions telling me to use bit arrays. How would one store each integer in a bit?
I think you've got yourself confused between arrays and numbers, specifically what it means to manipulate binary numbers.
I'll go about this by example. Say you have a number of error messages and you want to return them in a return value from a function. Now, you might label your errors 1,2,3,4... which makes sense to your mind, but then how do you, given just one number, work out which errors have occured?
Now, try labelling the errors 1,2,4,8,16... increasing powers of two, basically. Why does this work? Well, when you work base 2 you are manipulating a number like 00000000 where each digit corresponds to a power of 2 multiplied by its position from the right. So let's say errors 1, 4 and 8 occur. Well, then that could be represented as 00001101. In reverse, the first digit = 1*2^0, the third digit 1*2^2 and the fourth digit 1*2^3. Adding them all up gives you 13.
Now, we are able to test if such an error has occured by applying a bitmask. By example, if you wanted to work out if error 8 has occured, use the bit representation of 8 = 00001000. Now, in order to extract whether or not that error has occured, use a binary and like so:
00001101
& 00001000
= 00001000
I'm sure you know how an and works or can deduce it from the above - working digit-wise, if any two digits are both 1, the result is 1, else it is 0.
Now, in C:
int func(...)
{
int retval = 0;
if ( sometestthatmeans an error )
{
retval += 1;
}
if ( sometestthatmeans an error )
{
retval += 2;
}
return retval
}
int anotherfunc(...)
{
uint8_t x = func(...)
/* binary and with 8 and shift 3 plaes to the right
* so that the resultant expression is either 1 or 0 */
if ( ( ( x & 0x08 ) >> 3 ) == 1 )
{
/* that error occurred */
}
}
Now, to practicalities. When memory was sparse and protocols didn't have the luxury of verbose xml etc, it was common to delimit a field as being so many bits wide. In that field, you assign various bits (flags, powers of 2) to a certain meaning and apply binary operations to deduce if they are set, then operate on these.
I should also add that binary operations are close in idea to the underlying electronics of a computer. Imagine if the bit fields corresponded to the output of various circuits (carrying current or not). By using enough combinations of said circuits, you make... a computer.
regarding the usage the bits array :
if you know there are "only" 1 million numbers - you use an array of 1 million bits. in the beginning all bits will be zero and every time you read a number - use this number as index and change the bit in this index to be one (if it's not one already).
after reading all numbers - the missing numbers are the indices of the zeros in the array.
for example, if we had only numbers between 0 - 4 the array would look like this in the beginning: 0 0 0 0 0.
if we read the numbers : 3, 2, 2
the array would look like this: read 3 --> 0 0 0 1 0. read 3 (again) --> 0 0 0 1 0. read 2 --> 0 0 1 1 0. check the indices of the zeroes: 0,1,4 - those are the missing numbers
BTW, of course you can use integers instead of bits but it may take (depends on the system) 32 times memory
Sivan
Bit Arrays or Bit Vectors can be though as an array of boolean values. Normally a boolean variable needs at least one byte storage, but in a bit array/vector only one bit is needed.
This gets handy if you have lots of such data so you save memory at large.
Another usage is if you have numbers which do not exactly fit in standard variables which are 8,16,32 or 64 bit in size. You could this way store into a bit vector of 16 bit a number which consists of 4 bit, one that is 2 bit and one that is 10 bits in size. Normally you would have to use 3 variables with sizes of 8,8 and 16 bit, so you only have 50% of storage wasted.
But all these uses are very rarely used in business aplications, the come to use often when interfacing drivers through pinvoke/interop functions and doing low level programming.
Bit Arrays of Bit Vectors are used as a mapping from position to some bit value. Yes it's basically the same thing as an array of Bool, but typical Bool implementation is one to four bytes long and it uses too much space.
We can store the same amount of data much more efficiently by using arrays of words and binary masking operations and shifts to store and retrieve them (less overall memory used, less accesses to memory, less cache miss, less memory page swap). The code to access individual bits is still quite straightforward.
There is also some bit field support builtin in C language (you write things like int i:1; to say "only consume one bit") , but it is not available for arrays and you have less control of the overall result (details of implementation depends on compiler and alignment issues).
Below is a possible way to answer to your "search missing numbers" question. I fixed int size to 32 bits to keep things simple, but it could be written using sizeof(int) to make it portable. And (depending on the compiler and target processor) the code could only be made faster using >> 5 instead of / 32 and & 31 instead of % 32, but that is just to give the idea.
#include <stdio.h>
#include <errno.h>
#include <stdint.h>
int main(){
/* put all numbers from 1 to 1000000 in a file, except 765 and 777777 */
{
printf("writing test file\n");
int x = 0;
FILE * f = fopen("testfile.txt", "w");
for (x=0; x < 1000000; ++x){
if (x == 765 || x == 777760 || x == 777791){
continue;
}
fprintf(f, "%d\n", x);
}
fprintf(f, "%d\n", 57768); /* this one is a duplicate */
fclose(f);
}
uint32_t bitarray[1000000 / 32];
/* read file containing integers in the range [1,1000000] */
/* any non number is considered as separator */
/* the goal is to find missing numbers */
printf("Reading test file\n");
{
unsigned int x = 0;
FILE * f = fopen("testfile.txt", "r");
while (1 == fscanf(f, " %u",&x)){
bitarray[x / 32] |= 1 << (x % 32);
}
fclose(f);
}
/* find missing number in bitarray */
{
int x = 0;
for (x=0; x < (1000000 / 32) ; ++x){
int n = bitarray[x];
if (n != (uint32_t)-1){
printf("Missing number(s) between %d and %d [%x]\n",
x * 32, (x+1) * 32, bitarray[x]);
int b;
for (b = 0 ; b < 32 ; ++b){
if (0 == (n & (1 << b))){
printf("missing number is %d\n", x*32+b);
}
}
}
}
}
}
That is used for bit flags storage, as well as for parsing different binary protocols fields, where 1 byte is divided into a number of bit-fields. This is widely used, in protocols like TCP/IP, up to ASN.1 encodings, OpenPGP packets, and so on.