count number of set bits in integer - c++

i am studing different methods about bit counting ,or population count methods fopr given integer, during this days,i was trying to figure out how following algorithms works
pop(x)=-sum(x<<i) where i=0:31
i think that after calculate each value of x,we will get
x+2*x+4*x+8*x+16*x+..............+2^31*x =4294967294*x
if we multiply it by -1,we get -4294967294*x,but how it counts number of bits?please help me to understand this method well.thanks

I believe you mean
as seen in the cover of the book Hacker's Delight, where the symbol means left-rotation not left-shift which will produce the wrong results and downvotes.
This method works because the rotation will cause all binary digits of x to appear in every possible bits in all terms, and because of 2's complement.
Take a simpler example. Consider numbers with only 4 binary digits, where the digits can be represented as ABCD, then the summation means:
ABCD // x <<rot 0
+ BCDA // x <<rot 1
+ CDAB // x <<rot 2
+ DABC // x <<rot 3
We note that every column has all of A, B, C, D. Now, ABCD actually means "2³ A + 2² B + 2¹ C + 2⁰ D", so the summation is just:
2³ A + 2² B + 2¹ C + 2⁰ D
+ 2³ B + 2² C + 2¹ D + 2⁰ A
+ 2³ C + 2² D + 2¹ A + 2⁰ B
+ 2³ D + 2² A + 2¹ B + 2⁰ C
= 2³(A+B+C+D) + 2²(B+C+D+A) + 2¹(C+D+A+B) + 2⁰(D+A+B+C)
= (2³ + 2² + 2¹ + 2⁰) × (A + B + C + D)
The (A + B + C + D) is the population count of x and (2³ + 2² + 2¹ + 2⁰) = 0b1111 is -1 in 2's complement, so the summation is the negative of the population count.
The argument can be easily extended to 32-bit numbers.

#include <stdio.h>
#include <conio.h>
unsigned int f (unsigned int a , unsigned int b);
unsigned int f (unsigned int a , unsigned int b)
return a ? f ( (a&b) << 1, a ^b) : b;
int bitcount(int n) {
int tot = 0;
int i;
for (i = 1; i <= n; i = i<<1)
if (n & i)
return tot;
int bitcount_sparse_ones(int n) {
int tot = 0;
while (n) {
n &= n - 1;
return tot;
int main()
int a = 12;
int b = 18;
int c = f(a,b);
printf("Sum = %d\n", c);
int CountA = bitcount(a);
int CountB = bitcount(b);
int CntA = bitcount_sparse_ones(a);
int CntB = bitcount_sparse_ones(b);
printf("CountA = %d and CountB = %d\n", CountA, CountB);
printf("CntA = %d and CntB = %d\n", CntA, CntB);
return 0;


I encountered the 10^9+7 problem but I can't understand the relation between the distributive properties of mod and my problem

Given 3 numbers a b c get a^b , b^a , c^x where x is abs diff between b and a cout each one but mod 10^9+7 in ascending order.
well I searched web for how to use the distributive property but didn't understand it since I am beginner,
I use very simple for loops so understanding this problem is a bit hard for me so how can I relate these mod rules with powers too in loops? If anyone can help me I would be so happy.
note time limit is 1 second which makes it harder
I tried to mod the result every time in the loop then times it by the original number.
for example if 2^3 then 1st loop given variables cin>>a,a would be 2, num =a would be like this
a = (a % 10^9 + 7) * num this works for very small inputs but large ones it exceed time
#include <iostream>
#include <cmath>
using namespace std;
int main ()
long long a,b,c,one,two,thr;
long long x;
long long mod = 1e9+7;
one = a;
two = b;
thr = c;
if (a>=b)
x = a - b;
x = b - a;
for(int i = 0; i < b-1;i++)
a = ((a % mod) * (one%mod))%mod;
for(int j = 0; j < a-1;j++)
b = ((b % mod) * (two%mod))%mod;
for(int k = 0; k < x-1;k++)
c = ((c % mod) * (thr%mod))%mod;
I use very simple for loops [...] this works for very small inputs, but large ones it exceeds time.
There is an algorithm called "exponentiation by squaring" that has a logarithmic time complexity, rather then a linear one.
It works breaking down the power exponent while increasing the base.
Consider, e.g. x355. Instead of multiplying x 354 times, we can observe that
x355 = x·x354 = x·(x2)177 = x·x2·(x2)176 = x·x2·(x4)88 = x·x2·(x8)44 = x·x2·(x16)22 = x·x2·(x32)11 = x·x2·x32·(x32)10 = x·x2·x32·(x64)5 = x·x2·x32·x64·(x64)4 = x·x2·x32·x64·(x128)2 = x1·x2·x32·x64·x256
That took "only" 12 steps.
To implement it, we only need to be able to perform modular multiplications safely, without overflowing. Given the value of the modulus, a type like std::int64_t is wide enough.
#include <iostream>
#include <cstdint>
#include <limits>
#include <cassert>
namespace modular
auto exponentiation(std::int64_t base, std::int64_t exponent) -> std::int64_t;
int main()
std::int64_t a, b, c;
std::cin >> a >> b >> c;
auto const x{ b < a ? a - b : b - a };
std::cout << modular::exponentiation(a, b) << '\n'
<< modular::exponentiation(b, a) << '\n'
<< modular::exponentiation(c, x) << '\n';
return 0;
namespace modular
constexpr std::int64_t M{ 1'000'000'007 };
// We need the mathematical modulo
auto from(std::int64_t x)
static_assert(M > 0);
x %= M;
return x < 0 ? x + M : x;
// It assumes that both a and b are already mod M
auto multiplication_(std::int64_t a, std::int64_t b)
assert( 0 <= a and a < M and 0 <= b and b < M );
assert( b == 0 or a <= std::numeric_limits<int64_t>::max() / b );
return (a * b) % M;
// Implements exponentiation by squaring
auto exponentiation(std::int64_t base, std::int64_t exponent) -> std::int64_t
assert( exponent >= 0 );
auto b{ from(base) };
std::int64_t x{ 1 };
while ( exponent > 1 )
if ( exponent % 2 != 0 )
x = multiplication_(x, b);
b = multiplication_(b, b);
exponent /= 2;
return multiplication_(b, x);

How to find fibonacci sums of huge numbers? [duplicate]

This question already has answers here:
initialize array with stackoverflow error [duplicate]
(2 answers)
Finding out nth fibonacci number for very large 'n'
(24 answers)
Finding the fibonacci number of large number
(1 answer)
Closed 2 years ago.
I'm solving a CSES problem in which I've to find the sum of first 'n' Fibonacci numbers. The code:
#pragma GCC optimize("Ofast")
#include <iostream>
using namespace std;
int main()
unsigned long long int n;
scanf("%llu", &n);
unsigned long long int seq[n];
seq[0] = 0;
seq[1] = 1;
unsigned long long int mod = 1000000000 + 7;
for (unsigned long long int i = 2; i < n + 1; i++) {
seq[i] = (seq[i - 1] + seq[i - 2]) % mod;
cout << seq[n];
The problem specifies that the value of n can get upto 10^18 and therefore I have used unsigned long long int to initialize n. The problem also instructs to give the modulo 7 answer. The code is working fine for values of n upto 4 digits but breaks when the value of n rises to the upper ceiling of 10^18.It gives a (0xC00000FD) error and does not return anything. Please help me understand the problem here and how to deal with it. Any other suggestions would also be appreciated.
When doing modular addition, you need to apply your mod to each value you're adding.
For example, (a + b) % c = (a % c + b % c) % c.
That means in your code:
seq[i] = (seq[i - 1] % mod + seq[i - 2] % mod) % mod;
Otherwise, the addition of seq[i - 1] and seq[i - 2] will result in an overflow.
Read more about modular arithmetic here.
In this problem
F[i] -> i th Fibonacci number. MOD = 1e9 + 7. n < 1e18
F[n] % MOD = ?
F[n] = F[n-1] + F[n-2]
if you calculate this with loop you get TL
that`s way you can optimize this solution
now you calculate F[n] with recursion
F[2*n] = - F[n] * F[n] + 2 * F[n] * F[n+1]
F[2*n+1] = F[n] * F[n] + F[n+1] * F[n+1]
here is my solution
using namespace std;
typedef long long ll;
ll MOD = 1e9+7;
void fib(ll n ,ll &a , ll &b){
if(n == 0){
a = 0;
b = 1;
ll x, y;
fib(n-1 ,x,y);
a = y;
b = (x+y)%MOD;
fib(n/2 , x , y);
a = (x*(2*y +MOD -x)%MOD)%MOD;
b = ((x*x)%MOD+(y*y)%MOD)%MOD;
int main(){
ll N , a, b;
cin >> N;
fib(N , a, b);
cout << a;
I think the problem with this code is that you are creating an array seq[n] of size n, which can lead to a SEGFAULT on Linux and STATUS_STACK_OVERFLOW (0xc00000fd) on Windows for large numbers, which refers to stack exhaustion.
Below I give an improved version of your algorithm, which uses a fixed memory size, and for modulo addition, I use the sum_by_modulo function, for avoiding overflow in (a + b) % m operation, the principle of which is described here.
#pragma GCC optimize("Ofast")
#include <iostream>
typedef unsigned long long int ullong;
ullong sum_by_modulo(ullong a, ullong b, ullong m){
ullong sum;
a %= m;
b %= m;
ullong c = m - a;
if (b==c)
sum = 0;
if (b<c)
sum = a + b;
if (b > c)
sum = b-c;
return sum;
int main()
ullong n;
ullong t1 = 0, t2 = 1, nextTerm = 0;
ullong modulo = 1000000000 + 7;
std::cout << "Enter the number of term: ";
std::cin >> n;
for (ullong i = 1; i <= n; ++i)
if(i == 1)
if(i == 2)
nextTerm = sum_by_modulo(t1, t2, modulo);
t1 = t2;
t2 = nextTerm;
std::cout << nextTerm << " ";
return 0;

Why the function is failing in case of numbers greater than 48 digits?

I am trying to find
(a^b) % mod
where b and mod is upto 10^9, while l can be really large i have tested upto 48 digits with success
using this relation
(a^b) % mod = (a%mod)^b % mod
#define ll long long int
ll powerLL(ll x, ll n,ll MOD)
ll result = 1;
while (n) {
if (n & 1)
result = result * x % MOD;
n = n / 2;
x = x * x % MOD;
return result;
ll powerStrings(string sa, string sb,ll MOD)
ll a = 0, b = 0;
for (size_t i = 0; i < sa.length(); i++)
a = (a * 10 + (sa[i] - '0')) % MOD;
for (size_t i = 0; i < sb.length(); i++)
b = (b * 10 + (sb[i] - '0')) % (MOD - 1);
return powerLL(a, b,MOD);
powerStrings("5109109785634228366587086207094636370893763284000","362323789",354252525) returns 208624800 but it should return 323419500. In this case a is 49 digits
powerStrings("300510498717329829809207642824818434714870652000","362323489",354255221) returns 282740484 , which is correct. In this case a is 48 digits
Is something wrong with the code or I will have to use other method of doing the same??
It does not work because it is not mathematically correct.
In general, we have that pow(a, n, m) = pow(a, n % λ(m), m) (with a coprime to m) where λ is the Carmichael function. As a special case, when m is a prime number, then λ(m) = m - 1. That situation is also covered by Fermat's little theorem. That's only a special case, it does not always work.
λ(354252525) = 2146980, if I hack that in then the right result comes out. (the base is not actually coprime to the modulus though)
In general you would need to compute the Carmichael function for the modulus, which is non-trivial, but feasible for small moduli.

Rounding integer division without logical operators

I want a function
int rounded_division(const int a, const int b) {
return round(1.0 * a/b);
So we have, for example,
rounded_division(3, 2) // = 2
rounded_division(2, 2) // = 1
rounded_division(1, 2) // = 1
rounded_division(0, 2) // = 0
rounded_division(-1, 2) // = -1
rounded_division(-2, 2) // = -1
rounded_division(-3, -2) // = 2
Or in code, where a and b are 32 bit signed integers:
int rounded_division(const int a, const int b) {
return ((a < 0) ^ (b < 0)) ? ((a - b / 2) / b) : ((a + b / 2) / b);
And here comes the tricky part: How to implement this guy efficiently (not using larger 64 bit values) and without a logical operators such as ?:, &&, ...? Is it possible at all?
The reason why I am wondering of avoiding logical operators, because the processor I have to implement this function for, has no conditional instructions (more about missing conditional instructions on ARM.).
a/b + a%b/(b/2 + b%2) works quite well - not failed in billion+ test cases. It meets all OP's goals: No overflow, no long long, no branching, works over entire range of int when a/b is defined.
No 32-bit dependency. If using C99 or later, no implementation behavior restrictions.
int rounded_division(int a, int b) {
int q = a / b;
int r = a % b;
return q + r/(b/2 + b%2);
This works with 2's complement, 1s' complement and sign-magnitude as all operations are math ones.
How about this:
int rounded_division(const int a, const int b) {
return (a + b/2 + b * ((a^b) >> 31))/b;
(a ^ b) >> 31 should evaluate to -1 if a and b have different signs and 0 otherwise, assuming int has 32 bits and the leftmost is the sign bit.
As pointed out by #chux in his comments this method is wrong due to integer division. This new version evaluates the same as OP's example, but contains a bit more operations.
int rounded_division(const int a, const int b) {
return (a + b * (1 + 2 * ((a^b) >> 31)) / 2)/b;
This version still however does not take into account the overflow problem.
What about
return ((a + (a*b)/abs(a*b) * b / 2) / b);
Without overflow:
return ((a + ((a/abs(a))*(b/abs(b))) * b / 2) / b);
This is a rough approach that you may use. Using a mask to apply something if the operation a*b < 0.
Please note that I did not test this appropriately.
int function(int a, int b){
int tmp = float(a)/b + 0.5;
int mask = (a*b) >> 31; // shift sign bit to set rest of the bits
return tmp - (1 & mask);//minus one if a*b was < 0
The following rounded_division_test1() meets OP's requirement of no branching - if one counts sign(int a), nabs(int a), and cmp_le(int a, int b) as non-branching. See here for ideas of how to do sign() without compare operators. These helper functions could be rolled into rounded_division_test1() without explicit calls.
The code demonstrates the correct functionality and is useful for testing various answers. When a/b is defined, this answer does not overflow.
#include <limits.h>
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
int nabs(int a) {
return (a < 0) * a - (a >= 0) * a;
int sign(int a) {
return (a > 0) - (a < 0);
int cmp_le(int a, int b) {
return (a <= b);
int rounded_division_test1(int a, int b) {
int q = a / b;
int r = a % b;
int flag = cmp_le(nabs(r), (nabs(b) / 2 + nabs(b % 2)));
return q + flag * sign(b) * sign(r);
// Alternative that uses long long
int rounded_division_test1LL(int a, int b) {
int c = (a^b)>>31;
return (a + (c*2 + 1)*1LL*b/2)/b;
// Reference code
int rounded_division(int a, int b) {
return round(1.0*a/b);
int test(int a, int b) {
int q0 = rounded_division(a, b);
//int q1 = function(a,b);
int q1 = rounded_division_test1(a, b);
if (q0 != q1) {
printf("%d %d --> %d %d\n", a, b, q0, q1);
return q0 != q1;
void tests(void) {
int err = 0;
int const a[] = { INT_MIN, INT_MIN + 1, INT_MIN + 1, -3, -2, -1, 0, 1, 2, 3,
for (unsigned i = 0; i < sizeof a / sizeof a[0]; i++) {
for (unsigned j = 0; j < sizeof a / sizeof a[0]; j++) {
if (a[j] == 0) continue;
if (a[i] == INT_MIN && a[j] == -1) continue;
err += test(a[i], a[j]);
printf("Err %d\n", err);
int main(void) {
return 0;
Let me give my contribution:
What about:
int rounded_division(const int a, const int b) {
return a/b + (2*(a%b))/b;
No branch, no logical operators, only mathematical operators. But it could fail if b is great than INT_MAX/2 or less than INT_MIN/2.
But if 64 bits are allowed to compute 32 bits rounds. It will not fail
int rounded_division(const int a, const int b) {
return a/b + (2LL*(a%b))/b;
Code that I came up with for use on ARM M0 (no floating point, slow divide).
It only uses one divide instruction and no conditionals, but will overflow if numerator + (denominator/2) > INT_MAX.
Cycle count on ARM M0 = 7 cycles + the divide (M0 has no divide instruction, so it is toolchain dependant).
int32_t Int32_SignOf(int32_t val)
return (+1 | (val >> 31)); // if v < 0 then -1, else +1
uint32_t Int32_Abs(int32_t val)
int32_t tmp = val ^ (val >> 31);
return (tmp - (val >> 31));
// the following code looks like it should be faster, using subexpression elimination
// except on arm a bitshift is free when performed with another operation,
// so it would actually end up being slower
// tmp = val >> 31;
// dst = val ^ (tmp);
// dst -= tmp;
// return dst;
int32_t Int32_DivRound(int32_t numerator, int32_t denominator)
// use the absolute (unsigned) demominator in the fudge value
// as the divide by 2 then becomes a bitshift
int32_t sign_num = Int32_SignOf(numerator);
uint32_t abs_denom = Int32_Abs(denominator);
return (numerator + sign_num * ((int32_t)(abs_denom / 2u))) / denominator;
since the function seems to be symmetric how about sign(a/b)*floor(abs(a/b)+0.5)

Multiplying integers the long way

I'm trying to create long int multiplication function. In math for multiplying 2 numbers for example 123 X 456, I do:
(12 * 10^1 + 3)( 45 * 10^1 + 6) =
(540 * 10^2) + (72 * 10^1) + (135 * 10^1) + 18 = 15129
I created a small program for this algorithm but it didn't work right.
I don't know where my problem is. Can you help me to understand and correct that?
int digits(int n) {
int digit = 0;
while (n>0){
return digit;
long int longMult(long int a, long int b) {
long int x,y,w,z;
int digitA = digits(a);
int digitB = digits(b);
if((a==0) || (b==0)) {
return 0;
} else if (digitA < 2 || digitB < 2) {
return a*b;
} else {
int powA = digitA / 2;
int powB = digitB / 2;
//for first number
x = a/(10^powA);
y = a%(10^powA);
//for second number
w = b/(10^powB);
z = b%(10^powB);
return ( longMult(x,w)*(10^(powA*powB)) + longMult(x,z) +
longMult(w,y)*(10^(powA*powB)) + longMult(y,z));
int main()
cout << digits(23) << endl; // for test
cout << longMult(24,24); // must be 576 but output is 96
return 0;
The expression
does a bitwise exclusive or, and doesn't raise 10 to the power of powA, as you appear to expect.
You may want to define something like
long powli(int b, long e) {return e?b*powli(b,e-1):1;}
Then instead you can use
Edit: There is also a problem with the way the values are combined:
return ( longMult(x,w)*(10^(powA*powB)) + longMult(x,z) +
longMult(w,y)*(10^(powA*powB)) + longMult(y,z));
Look into the maths, because multiplying the exponents makes little sense.
Also the combinations of adjustments to values is wrong, eg (10*a + b)(10*c + d) = 10*10*a*c + 10*a*d + 10*b*d +b*d. So check on your algebra.