Binary search doesn't work for large numbers

Binary search doesn't work for large numbers - c++

I am trying to solve problem of find square root of given number using binary search on c++ which works perfect for small numbers, but if input >= 2000000000 it doesn't work at all
code:
int main() {
int n; cin >> n;
int l = 0, r = n + 1;
while (r - l > 1) {
int m = (r + l) / 2;
if (m * m <= n) {
l = m;
} else {
r = m;
}
}
cout << l;
return 0;
}
some tests:
1
1
16
4
but
2000000000000000
-3456735426738
can't understand why...
tested the same code on python, it works good
probably it's some c++ feature which i don't know

A number n >= 2000000000 surely works, as long as it doesn't reach it's maximum allowed value (more on that shortly).
Because it seems you're not familiar with data types and their sizes in C and C++, I'll keep it simple.
A type of int is normally 4 bytes (yes, I said "normally" as there are exceptions to this rule - this is a different discussion regarding platforms and their architecture, for now, take the simple explanation that it's 4 bytes in most cases), meaning 32 bits. It can be signed or unsigned.
Minor caveat: when unsigned is not explicitly specified, then it's considered to be signed by default, so int x; would mean that x can take negative values as well
A signed int (signed, meaning it has both, positive and negative numbers, so apart from zero and the maximum negative number, you'd have each value "twice", once with + and one more time with -, hence the terminology of signed) has the following ranges: -2147483648 to +2147483647.
To "increase" the maximum allowed value, you'd need an unsigned int. Its range is 0 to 4294967295.
There are "bigger" types in C and C++ but I think that discussion is slightly more advanced. Short version is this: for a 64bit integer, if you're using GCC you can use uint64_t, if you're using MSVS you can either use __int64, but you can also use uint64_t.
For even larger values, well... it gets really complicated. Python has native support for larger numbers, that is why it works there from the get-go.
You need to check the data types available in C and C++, preferably read up on C-17 (the 2017 standard on C, which is the newest released) and C++20 (the 2020 standard for C++). The roadmap says the next standard update for both would be in 2023 (so fingers crossed :) ).
Regarding your code, however, also keep in mind what molbdnilo and ALX23z said regarding overflowing, in their comments. Even if you would cover sufficient data type ranges, there's still a risk of overflowing due to mistakes in your code:
molbdnilo: m * m overflows
ALX23z: Instead of m * m <= n write m < n/m. And inspect better the case when m == n/m

Related

Why does this loop run?

#include<iostream>
#include<string>
#include<vector>
using namespace std;
int main()
{
std::string qaz{};
vector <size_t> index ;
cout <<"qaz: "<<qaz<<" length: "<<qaz.length()<<"\n";
for (size_t i{0}; i<= ( qaz.length()-2);i++ )
{ cout<<"Entered"<<i<<"\n";
cout<<"Exited"<<i<<"\n";}
return 0;
}
//Here qaz is an empty string so qaz.length() == 0 (so qaz.length()-2 == -2) and i is initialized to 0 so I expected that we will not enter the loop. But on running it I find that it goes on in an infinite loop. Why? Please help me with it.

See docs for size_t:
std::size_t is the unsigned integer type of the result of the sizeof operator
(Emphasis mine.)
Furthermore, string::length returns a size_t too1.
But even if that were not the case, when comparing signed values to unsigned values, the signed value is converted to unsigned before the comparison, as explained in this answer.
(size_t)0 - 2 will underflow as size_t is unsigned and therefore its minimum value is zero resulting in a large number which is usually2 either 232-2 or 264-2 depending on the processor architecture. Let's go with the latter, then you will get 18,446,744,073,709,552,000 as result.
Now, looking at the result of 0 <= 18446744073709552000 you can see that zero is clearly less than or equal to 18.4 quintillion, so the loop condition is fulfilled. In fact the loop is not infinite, it will loop exactly 18,446,744,073,709,552,001 times, but it's true you will probably not want to wait for it to finally reach its finishing point.
The solution is to avoid the underflow by comparing i + y <= x instead of i <= x - y3, i.e. i + 2 <= qaz.length(). You will then have 2 <= 0 which is false.
1: Technically, it returns an std::allocator<char>::size_type but that is defined as std::size_t.
2: To be exact, it is SIZE_MAX - (2 - 1) i.e. SIZE_MAX - 1 (see limits). In terms of numeric value, it could also be 216-2 - such as on an ATmega328P microcontroller - or some other value, but on the architectures you get on desktop computers at the current point in time it's most likely one of the two I mentioned. It depends on the width of the std::size_t type. If it's X bits wide, you'd get 2X-n for (size_t)0 - n for 0<n<2X. Since C++11 it is however guaranteed that std::size_t is no less than 16 bits wide.
3: However, in the unlikely case that your length is very large, specifically at least the number calculated above with 2X-2 or larger, this would result in an overflow instead. But in that case your whole logic would be flawed and you'd need a different approach. I think this can't be the case anyway because std::ssize support means that string lengths would have to have one unused bit to be repurposed as sign bit, but I think this answer went down various rabbit holes far enough already.

length() returns unsigned value, which cannot be below zero. 0u - 2 wraps around and becomes very large number.
Use i + 2 <= qaz.length() instead.

The issue is that size_t is unsigned. length() returns the strings size_type which is unsigned and most likely also size_t. When the strings size is <2 then length() -2 wraps around to yield a large unsigned value.
Since C++20 there is std::ssize which returns a signed value. Though you also have to adjust the type of i to get correct number of iterations also when i < -2 is the condition:
#include<iostream>
#include<string>
#include<vector>
using namespace std;
int main()
{
std::string qaz{};
vector <size_t> index ;
cout <<"qaz: "<<qaz<<" length: "<<qaz.length()<<"\n";
for (int i{0}; i<= ( std::ssize(qaz)-2);i++ )
{
cout<<"Entered"<<i<<"\n";
cout<<"Exited"<<i<<"\n";
}
}
Alternatively stay with unsigneds and use i+2 <= qaz.length().

Can n %= m ever return negative value for very large nonnegative n and m?

This question is regarding the modulo operator %. We know in general a % b returns the remainder when a is divided by b and the remainder is greater than or equal to zero and strictly less than b. But does the above hold when a and b are of magnitude 10^9 ?
I seem to be getting a negative output for the following code for input:
74 41 28
However changing the final output statement does the work and the result becomes correct!
#include<iostream>
using namespace std;
#define m 1000000007
int main(){
int n,k,d;
cin>>n>>k>>d;
if(d>n)
cout<<0<<endl;
else
{
long long *dp1 = new long long[n+1], *dp2 = new long long[n+1];
//build dp1:
dp1[0] = 1;
dp1[1] = 1;
for(int r=2;r<=n;r++)
{
dp1[r] = (2 * dp1[r-1]) % m;
if(r>=k+1) dp1[r] -= dp1[r-k-1];
dp1[r] %= m;
}
//build dp2:
for(int r=0;r<d;r++) dp2[r] = 0;
dp2[d] = 1;
for(int r = d+1;r<=n;r++)
{
dp2[r] = ((2*dp2[r-1]) - dp2[r-d] + dp1[r-d]) % m;
if(r>=k+1) dp2[r] -= dp1[r-k-1];
dp2[r] %= m;
}
cout<<dp2[n]<<endl;
}
}
changing the final output statement to:
if(dp2[n]<0) cout<<dp2[n]+m<<endl;
else cout<<dp2[n]<<endl;
does the work, but why was it required?
By the way, the code is actually my solution to this question

This is a limit imposed by the range of int.
int can only hold values between –2,147,483,648 to 2,147,483,647.
Consider using long long for your m, n, k, d & r variables. If possible use unsigned long long if your calculations should never have a negative value.
long long can hold values from –9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
while unsigned long long can hold values from 0 to 18,446,744,073,709,551,615. (2^64)
The range of positive values is approximately halved in signed types compared to unsigned types, due to the fact that the most significant bit is used for the sign; When you try to assign a positive value greater than the range imposed by the specified Data Type the most significant bit is raised and it gets interpreted as a negative value.

Well, no, modulo with positive operands does not produce negative results.
However .....
The int type is only guaranteed by the C standards to support values in the range -32767 to 32767, which means your macro m is not necessarily expanding to a literal of type int. It will fit in a long though (which is guaranteed to have a large enough range).
If that's happening (e.g. a compiler that has a 16-bit int type and a 32-bit long type) the results of your modulo operations will be computed as long, and may have values that exceed what an int can represent. Converting that value to an int (as will be required with statements like dp1[r] %= m since dp1 is a pointer to int) gives undefined behaviour.

Mathematically, there is nothing special about big numbers, but computers only have a limited width to write down numbers in, so when things get too big you get "overflow" errors. A common analogy is the counter of miles traveled on a car dashboard - eventually it will show as all 9s and roll round to 0. Because of the way negative numbers are handled, standard signed integers don't roll round to zero, but to a very large negative number.
You need to switch to larger variable types so that they overflow less quickly - "long int" or "long long int" instead of just "int", the range doubling with each extra bit of width. You can also use unsigned types for a further doubling, since no range is used for negatives.

Can't store too lengthy int type

Consider the problem:
It can be shown that for some powers of two in decimal format like:
2^9 = 512
2^89 = 618,970,019,642,690,137,449,562,112
The results end in a string consisting of 1s and 2s. In fact, it can be proven that for every integer R, there
exists a power of 2 such that 2K where K > 0 has a string of only 1s and 2s in its last R digits.
It can be shown clearly in the table below:
R Smallest K 2^K
1 1 2
2 9 512
3 89 ...112
4 89 ...2112
Using this technique, what then is the sum of all the smallest K values for 1 <= R <= 10?
Proposed sol:
Now this problem ain't that difficult to solve. You can simply do
int temp = power(2, int)
and then if you can get the length of the temp then multiply it with
(100^len)-i or (10^len)-i
// where i would determine how many last digits you want.
Now this temp = power(2,int) gets much higher with increasing int that you can't even store it in the int type or even in long int....
So what would be done. And is there any other solution based on bit strings. I guess that might make this problem easy.
Thanks in advance.

No, I doubt there are any solutions based on "strings of bits". That would be quite inefficient. But there are Bignum Libraries like GMP which feature variable types either fixed-size much bigger than int types, or of arbitrary size limited only by memory capacity, plus matching sets of math operations, working similarly to software FPU emulation.
Quoting after reference with a minor paraphrase.
#include <gmpxx.h>
int
main (void)
{
mpz_class a, b, c;
a = 1234;
b = "-5676739826856836954375492356569366529629568926519085610160816539856926459237598";
c = a+b;
cout << "sum is " << c << "\n";
cout << "absolute value is " << abs(c) << "\n";
return 0;
}
Thanks to C++ operator overloading, it is much easier to use than ANSI C version.

Since you are only interested in the the n least significant digits of your result, you could try to devise an algorithm that only calculates those. Based on the standard algorithm for written multiplication you can see that the n least significant digits of the product are entirely determined by the n least significant digits of the multiplicands. Based on this it should be possible to create an algorithm that calculates as many digits of R^K as fit into a long int.
The only problem you might run into is that there may be numbers that end in a matching sequence that is longer that a long int can hold. In that case you can still resort to calculating additional digits using your own algorithm or a library.
Note that this is basically the same thing that big-number libraries do, only your approach might be more efficient, because you are calculating less digits that you are unlikely to need.

Try GMP, http://gmplib.org/
It can store a number with any size if it fits in the memory.
Altough you might be better off with less brute force approach.

You can store binary strings in std::bitset or in std::vector
www.cplusplus.com/reference/bitset/bitset/
I think bitset is your choice.
Using big arithmetic for operations on powers of 2 is not though

Why does C++ output negative numbers when using modulo?

Math:
If you have an equation like this:
x = 3 mod 7
x could be ... -4, 3, 10, 17, ..., or more generally:
x = 3 + k * 7
where k can be any integer. I don't know of a modulo operation is defined for math, but the factor ring certainly is.
Python:
In Python, you will always get non-negative values when you use % with a positive m:
#!/usr/bin/python
# -*- coding: utf-8 -*-
m = 7
for i in xrange(-8, 10 + 1):
print(i % 7)
Results in:
6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3
C++:
#include <iostream>
using namespace std;
int main(){
int m = 7;
for(int i=-8; i <= 10; i++) {
cout << (i % m) << endl;
}
return 0;
}
Will output:
-1 0 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3
ISO/IEC 14882:2003(E) - 5.6 Multiplicative operators:
The binary / operator yields the quotient, and the binary % operator
yields the remainder from the division of the first expression by the
second. If the second operand of / or % is zero the behavior is
undefined; otherwise (a/b)*b + a%b is equal to a. If both operands are
nonnegative then the remainder is nonnegative; if not, the sign of the
remainder is implementation-defined 74).
and
74) According to work underway toward the revision of ISO C, the
preferred algorithm for integer division follows the rules defined in
the ISO Fortran standard, ISO/IEC 1539:1991, in which the quotient is
always rounded toward zero.
Source: ISO/IEC 14882:2003(E)
(I couldn't find a free version of ISO/IEC 1539:1991. Does anybody know where to get it from?)
The operation seems to be defined like this:
Question:
Does it make sense to define it like that?
What are arguments for this specification? Is there a place where the people who create such standards discuss about it? Where I can read something about the reasons why they decided to make it this way?
Most of the time when I use modulo, I want to access elements of a datastructure. In this case, I have to make sure that mod returns a non-negative value. So, for this case, it would be good of mod always returned a non-negative value.
(Another usage is the Euclidean algorithm. As you could make both numbers positive before using this algorithm, the sign of modulo would matter.)
Additional material:
See Wikipedia for a long list of what modulo does in different languages.

On x86 (and other processor architectures), integer division and modulo are carried out by a single operation, idiv (div for unsigned values), which produces both quotient and remainder (for word-sized arguments, in AX and DX respectively). This is used in the C library function divmod, which can be optimised by the compiler to a single instruction!
Integer division respects two rules:
Non-integer quotients are rounded towards zero; and
the equation dividend = quotient*divisor + remainder is satisfied by the results.
Accordingly, when dividing a negative number by a positive number, the quotient will be negative (or zero).
So this behaviour can be seen as the result of a chain of local decisions:
Processor instruction set design optimises for the common case (division) over the less common case (modulo);
Consistency (rounding towards zero, and respecting the division equation) is preferred over mathematical correctness;
C prefers efficiency and simplicitly (especially given the tendency to view C as a "high level assembler"); and
C++ prefers compatibility with C.

Back in the day, someone designing the x86 instruction set decided it was right and good to round integer division toward zero rather than round down. (May the fleas of a thousand camels nest in his mother's beard.) To keep some semblance of math-correctness, operator REM, which is pronounced "remainder", had to behave accordingly. DO NOT read this: https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_73/rzatk/REM.htm
I warned you. Later someone doing the C spec decided it would be conforming for a compiler to do it either the right way or the x86 way. Then a committee doing the C++ spec decided to do it the C way. Then later yet, after this question was posted, a C++ committee decided to standardize on the wrong way. Now we are stuck with it. Many a programmer has written the following function or something like it. I have probably done it at least a dozen times.
inline int mod(int a, int b) {int ret = a%b; return ret>=0? ret: ret+b; }
There goes your efficiency.
These days I use essentially the following, with some type_traits stuff thrown in. (Thanks to Clearer for a comment that gave me an idea for an improvement using latter day C++. See below.)
<strike>template<class T>
inline T mod(T a, T b) {
assert(b > 0);
T ret = a%b;
return (ret>=0)?(ret):(ret+b);
}</strike>
template<>
inline unsigned mod(unsigned a, unsigned b) {
assert(b > 0);
return a % b;
}
True fact: I lobbied the Pascal standards committee to do mod the right way until they relented. To my horror, they did integer division the wrong way. So they do not even match.
EDIT: Clearer gave me an idea. I am working on a new one.
#include <type_traits>
template<class T1, class T2>
inline T1 mod(T1 a, T2 b) {
assert(b > 0);
T1 ret = a % b;
if constexpr ( std::is_unsigned_v<T1>)
{
return ret;
} else {
return (ret >= 0) ? (ret) : (ret + b);
}
}

What are arguments for this specification?
One of the design goals of C++ is to map efficiently to hardware. If the underlying hardware implements division in a way that produces negative remainders, then that's what you'll get if you use % in C++. That's all there is to it really.
Is there a place where the people who create such standards discuss about it?
You will find interesting discussions on comp.lang.c++.moderated and, to a lesser extent, comp.lang.c++

Others have described the why well enough and unfortunately the question which asks for a solution is marked a duplicate of this one and a comprehensive answer on that aspect seems to be missing. There seem to be 2 commonly used general solutions and one special-case I would like to include:
// 724ms
inline int mod1(int a, int b)
{
const int r = a % b;
return r < 0 ? r + b : r;
}
// 759ms
inline int mod2(int a, int b)
{
return (a % b + b) % b;
}
// 671ms (see NOTE1!)
inline int mod3(int a, int b)
{
return (a + b) % b;
}
int main(int argc, char** argv)
{
volatile int x;
for (int i = 0; i < 10000000; ++i) {
for (int j = -argc + 1; j < argc; ++j) {
x = modX(j, argc);
if (x < 0) return -1; // Sanity check
}
}
}
NOTE1: This is not generally correct (i.e. if a < -b). The reason I included it is because almost every time I find myself taking the modulus of a negative number is when doing math with numbers that are already modded, for example (i1 - i2) % n where the 0 <= iX < n (e.g. indices of a circular buffer).
As always, YMMV with regards to timing.

Double precision in C++ (or pow(2, 1000))

I'm working on Project Euler to brush up on my C++ coding skills in preparation for the programming challenge(s) we'll be having this next semester (since they don't let us use Python, boo!).
I'm on #16, and I'm trying to find a way to keep real precision for 2¹°°°
For instance:
int main(){
double num = pow(2, 1000);
printf("%.0f", num):
return 0;
}
prints
10715086071862673209484250490600018105614050000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Which is missing most of the numbers (from python):
>>> 2**1000
10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376L
Granted, I can write the program with a Python 1 liner
sum(int(_) for _ in str(2**1000))
that gives me the result immediately, but I'm trying to find a way to do it in C++. Any pointers? (haha...)
Edit:
Something outside the standard libs is worthless to me - only dead-tree code is allowed in those contests, and I'm probably not going to print out 10,000 lines of external code...

If you just keep track of each digit in a char array, this is easy. Doubling a digit is trivial, and if the result is greater than 10 you just subtract 10 and add a carry to the next digit. Start with a value of 1, loop over the doubling function 1000 times, and you're done. You can predict the number of digits you'll need with ceil(1000*log(2)/log(10)), or just add them dynamically.
Spoiler alert: it appears I have to show the code before anyone will believe me. This is a simple implementation of a bignum with two functions, Double and Display. I didn't make it a class in the interest of simplicity. The digits are stored in a little-endian format, with the least significant digit first.
typedef std::vector<char> bignum;
void Double(bignum & num)
{
int carry = 0;
for (bignum::iterator p = num.begin(); p != num.end(); ++p)
{
*p *= 2;
*p += carry;
carry = (*p >= 10);
*p -= carry * 10;
}
if (carry != 0)
num.push_back(carry);
}
void Display(bignum & num)
{
for (bignum::reverse_iterator p = num.rbegin(); p != num.rend(); ++p)
std::cout << static_cast<int>(*p);
}
int main(int argc, char* argv[])
{
bignum num;
num.push_back(1);
for (int i = 0; i < 1000; ++i)
Double(num);
Display(num);
std::cout << std::endl;
return 0;
}

You need a bignum library, such as this one.

You probably need a pointer here (pun intended)
In C++ you would need to create your own bigint lib in order to do the same as in python.

C/C++ operates on fundamental data types. You are using a double which has only 64 bits to store a 1000 bit number. double uses 51 bit for the significant digits and 11 bit for the magnitude.
The only solution for you is to either use a library like bignum mentioned elsewhere or to roll out your own.

UPDATE: I just browsed to the Euler Problem site and found that Problem 13 is about summing large integers. The iterated method can become very tricky after a short while, so I'd suggest to use the code from Problem #13 you should have already to solve this, because 2**N => 2**(N-1) + 2**(N-1)
Using bignums is cheating and not a solution. Also, you don't need to compute 2**1000 or anything like that to get to the result. I'll give you a hint:
Take the first few values of 2**N:
0 1 2 4 8 16 32 64 128 256 ...
Now write down for each number the sum of its digits:
1 2 4 8 7 5 10 11 13 ...
You should notice that (x~=y means x and y have the same sum of digits)
1+1=2, 1+(1+2)=4, 1+(1+2+4)=8, 1+(1+2+4+8)=16~=7 1+(1+2+4+8+7)=23~=5
Now write a loop.
Project Euler = Think before Compute!

If you want to do this sort of thing on a practical basis, you're looking for an arbitrary precision arithmetic package. There are a number around, including NTL, lip, GMP, and MIRACL.
If you're just after something for Project Euler, you can write your own code for raising to a power. The basic idea is to store your large number in quite a few small pieces, and implement your own carries, borrows, etc., between the pieces.

Isn't pow(2, 1000) just 2 left-shifted 1000 times, essentially? It should have an exact binary representation in a double float. It shouldn't require a bignum library.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js