Different outputs using different Data Structure - c++

I am asking this question in relation to the following problem : https://practice.geeksforgeeks.org/problems/count-pairs-with-given-sum5022/1
Given an array of N integers, and an integer K, find the number of pairs of elements in the array whose sum is equal to K.
Count pairs with given sum in O(n) time and O(n) space.
Given n = 4, k = 6, arr = [1 5 7 1]
This is part of my code:
#define MOD 1000007
int getPairsCount(int arr[], int n, int k) {
// long long int h[MOD] = {0}; // This is the one I used originally
// but it given 3 as the answer for the input n = 4, k = 6, arr = [1 5 7 1],
unordered_map<long long, long long> h; // But when using map, it gives correct output as 2
long long int count = 0;
for(int i=0;i<n;i++){
h[arr[i]]+=1;
}
for(int i=0;i<n;i++){
count+=h[k - arr[i]];
if(k == 2*arr[i])count--;
}
return (count/2);
}
};
Anyone please explain why there is a difference.
MOD was chosen based on the max number arr[i] can have (arr[i]<=10^6).
even using memset to set all values to 0 didn't work.
Then why there is a difference in using a map and array as hash?

Basic debugging: verify that data is what you think it is. This is easier to do with a debugger, but streaming diagnostics also works. Let's look at the evaluation of count+=h[k - arr[i]] over several iterations (using the input from the question).
for(int i=0;i<n;i++){
std::cerr << "count += h[k - arr[i]]\t" // To remind us what we are looking at
"count += h[" << k << " - " << arr[i] << "]\t"
"count += h[" << k - arr[i] << "]\t"
<< count << " += " << h[k - arr[i]] << "\n";
count+=h[k - arr[i]];
if(k == 2*arr[i])count--;
}
Possible output (using the array instead of the unordered map):
count += h[k - arr[i]] count += h[6 - 1] count += h[5] 0 += 1
count += h[k - arr[i]] count += h[6 - 5] count += h[1] 1 += 2
count += h[k - arr[i]] count += h[6 - 7] count += h[-1] 3 += 140720947826640
count += h[k - arr[i]] count += h[6 - 1] count += h[5] 140720947826643 += 1
At this point, the problem should be obvious (at the very least, the iteration where the problem occurs should be obvious). Even though every long long value is a valid key for an unordered_map from long long to something, at least half of those values are invalid indices for an array.

Related

Number of steps to reduce a number in binary representation to 1

Given the binary representation of an integer as a string s, return the number of steps to reduce it to 1 under the following rules:
If the current number is even, you have to divide it by 2.
If the current number is odd, you have to add 1 to it.
It is guaranteed that you can always reach one for all test cases.
Step 1) 13 is odd, add 1 and obtain 14.
Step 2) 14 is even, divide by 2 and obtain 7.
Step 3) 7 is odd, add 1 and obtain 8.
Step 4) 8 is even, divide by 2 and obtain 4.
Step 5) 4 is even, divide by 2 and obtain 2.
Step 6) 2 is even, divide by 2 and obtain 1.
My input = 1111011110000011100000110001011011110010111001010111110001
Expected output = 85
My output = 81
For the above input, the output is supposed to be 85. But my output shows 81. For other test cases it
seems to be giving the right answer. I have been trying all possible debugs, but I am stuck.
#include <iostream>
#include <string.h>
#include <vector>
#include <bits/stdc++.h>
using namespace std;
int main()
{
string s =
"1111011110000011100000110001011011110010111001010111110001";
long int count = 0, size;
unsigned long long int dec = 0;
size = s.size();
// cout << s[size - 1] << endl;
for (int i = 0; i < size; i++)
{
// cout << pow(2, size - i - 1) << endl;
if (s[i] == '0')
continue;
// cout<<int(s[i])-48<<endl;
dec += (int(s[i]) - 48) * pow(2, size - 1 - i);
}
// cout << dec << endl;
// dec = 278675673186014705;
while (dec != 1)
{
if (dec % 2 == 0)
dec /= 2;
else
dec += 1;
count += 1;
}
cout << count;
return 0;
}
This line:
pow(2, size - 1 - i)
Can face precision errors as pow takes and returns doubles.
Luckily, for powers base 2 that won't overflow unsigned long longs, we can simply use bit shift (which is equivalent to pow(2, x)).
Replace that line with:
1LL<<(size - 1 - i)
So that it should look like this:
dec += (int(s[i]) - 48) * 1ULL<<(size - 1 - i);
And we will get the correct output of 85.
Note: as mentioned by #RSahu, you can remove (int(s[i]) - 48), as the case where int(s[i]) == '0' is already caught in an above if statement. Simply change the line to:
dec += 1ULL<<(size - 1 - i);
The core problem has already been pointed out in answer by #Ryan Zhang.
I want to offer some suggestions to improve your code and make it easier to debug.
The main function has two parts -- first part coverts a string to number and the second part computes the number of steps to get the number to 1. I suggest creating two helper functions. That will allow you to debug each piece separately.
int main()
{
string s = "1111011110000011100000110001011011110010111001010111110001";
unsigned long long int dec = stringToNumber(s);
cout << "Number: " << dec << endl;
// dec = 278675673186014705;
int count = getStepsTo1(dec);
cout << "Steps to 1: " << count << endl;
return 0;
}
Iterate over the string from right to left using std::string::reverse_iterator. That will obviate the need for size and use of size - i - 1. You can just use i.
unsigned long long stringToNumber(string const& s)
{
size_t i = 0;
unsigned long long num = 0;
for (auto it = s.rbegin(); it != s.rend(); ++it, ++i )
{
if (*it != '0')
{
num += 1ULL << i;
}
}
return num;
}
Here's the other helper function.
int getStepsTo1(unsigned long long num)
{
long int count = 0;
while (num != 1 )
{
if (num % 2 == 0)
num /= 2;
else
num += 1;
count += 1;
}
return count;
}
Working demo: https://ideone.com/yerRfK.

Algorithm : partial grid count problme

In partial grid count problem
Each point is marked with 1 or 0. In this case, the problem of finding the number of subgrid with 1 in all four corners
Each row is expressed in bitset form, and while searching each row, the count is added when the common column is painted by comparison with the and operation.
Finally,count(count-1)/2 the sublattice where the first row is a and the last row is b.
I don't understand how to get the number of sublattices with the formula count(count-1)/2.
bitset<5> row[5];
row[0] = (1 << 3) + (1 << 0);
row[1] = (1 << 3) + (1 << 2);
row[2] = (1 << 4);
row[3] = (1 << 3) + (1 << 2) + (1 << 0);
row[4] = 0;
int count = 0;
for (int a = 0; a < 4; a++) {
for (int b = a + 1; b < 5; b++) {
int count_row = (row[a] & row[b]).count();
count += count_row;
}
}
count = count * (count - 1) / 2;
The meaning of the formula N*(N-1)/2 is the sum of all numbers from 1 to N.
If you look at a row e.g. 0001110000000000, then there will be sub sets
1, 11, 111, the sum of those is 1+2+3, i.e. the sum of all numbers 1 to 3.
I think that is what the text means.
However, that only counts the subsets which start on the left, not counting the middle single 1 and the right single 1 and the right 11.
So I think I know what the text means - but think it is wrong.

C++ - Code Optimization

I have a problem:
You are given a sequence, in the form of a string with characters ‘0’, ‘1’, and ‘?’ only. Suppose there are k ‘?’s. Then there are 2^k ways to replace each ‘?’ by a ‘0’ or a ‘1’, giving 2^k different 0-1 sequences (0-1 sequences are sequences with only zeroes and ones).
For each 0-1 sequence, define its number of inversions as the minimum number of adjacent swaps required to sort the sequence in non-decreasing order. In this problem, the sequence is sorted in non-decreasing order precisely when all the zeroes occur before all the ones. For example, the sequence 11010 has 5 inversions. We can sort it by the following moves: 11010 →→ 11001 →→ 10101 →→ 01101 →→ 01011 →→ 00111.
Find the sum of the number of inversions of the 2^k sequences, modulo 1000000007 (10^9+7).
For example:
Input: ??01
-> Output: 5
Input: ?0?
-> Output: 3
Here's my code:
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <string.h>
#include <math.h>
using namespace std;
void ProcessSequences(char *input)
{
int c = 0;
/* Count the number of '?' in input sequence
* 1??0 -> 2
*/
for(int i=0;i<strlen(input);i++)
{
if(*(input+i) == '?')
{
c++;
}
}
/* Get all possible combination of '?'
* 1??0
* -> ??
* -> 00, 01, 10, 11
*/
int seqLength = pow(2,c);
// Initialize 2D array of integer
int **sequencelist, **allSequences;
sequencelist = new int*[seqLength];
allSequences = new int*[seqLength];
for(int i=0; i<seqLength; i++){
sequencelist[i] = new int[c];
allSequences[i] = new int[500000];
}
//end initialize
for(int count = 0; count < seqLength; count++)
{
int n = 0;
for(int offset = c-1; offset >= 0; offset--)
{
sequencelist[count][n] = ((count & (1 << offset)) >> offset);
// cout << sequencelist[count][n];
n++;
}
// cout << std::endl;
}
/* Change '?' in former sequence into all possible bits
* 1??0
* ?? -> 00, 01, 10, 11
* -> 1000, 1010, 1100, 1110
*/
for(int d = 0; d<seqLength; d++)
{
int seqCount = 0;
for(int e = 0; e<strlen(input); e++)
{
if(*(input+e) == '1')
{
allSequences[d][e] = 1;
}
else if(*(input+e) == '0')
{
allSequences[d][e] = 0;
}
else
{
allSequences[d][e] = sequencelist[d][seqCount];
seqCount++;
}
}
}
/*
* Sort each sequences to increasing mode
*
*/
// cout<<endl;
int totalNum[seqLength];
for(int i=0; i<seqLength; i++){
int num = 0;
for(int j=0; j<strlen(input); j++){
if(j==strlen(input)-1){
break;
}
if(allSequences[i][j] > allSequences[i][j+1]){
int temp = allSequences[i][j];
allSequences[i][j] = allSequences[i][j+1];
allSequences[i][j+1] = temp;
num++;
j = -1;
}//endif
}//endfor
totalNum[i] = num;
}//endfor
/*
* Sum of all Num of Inversions
*/
int sum = 0;
for(int i=0;i<seqLength;i++){
sum = sum + totalNum[i];
}
// cout<<"Output: "<<endl;
int out = sum%1000000007;
cout<< out <<endl;
} //end of ProcessSequences method
int main()
{
// Get Input
char seq[500000];
// cout << "Input: "<<endl;
cin >> seq;
char *p = &seq[0];
ProcessSequences(p);
return 0;
}
the results were right for small size input, but for bigger size input I got time CPU time limit > 1 second. I also got exceeded memory size. How to make it faster and optimal memory use? What algorithm should I use and what better data structure should I use?, Thank you.
Dynamic programming is the way to go. Imagine You are adding the last character to all sequences.
If it is 1 then You get XXXXXX1. Number of swaps is obviously the same as it was for every sequence so far.
If it is 0 then You need to know number of ones already in every sequence. Number of swaps would increase by the amount of ones for every sequence.
If it is ? You just add two previous cases together
You need to calculate how many sequences are there. For every length and for every number of ones (number of ones in the sequence can not be greater than length of the sequence, naturally). You start with length 1, which is trivial, and continue with longer. You can get really big numbers, so You should calculate modulo 1000000007 all the time. The program is not in C++, but should be easy to rewrite (array should be initialized to 0, int is 32bit, long in 64bit).
long Mod(long x)
{
return x % 1000000007;
}
long Calc(string s)
{
int len = s.Length;
long[,] nums = new long[len + 1, len + 1];
long sum = 0;
nums[0, 0] = 1;
for (int i = 0; i < len; ++i)
{
if(s[i] == '?')
{
sum = Mod(sum * 2);
}
for (int j = 0; j <= i; ++j)
{
if (s[i] == '0' || s[i] == '?')
{
nums[i + 1, j] = Mod(nums[i + 1, j] + nums[i, j]);
sum = Mod(sum + j * nums[i, j]);
}
if (s[i] == '1' || s[i] == '?')
{
nums[i + 1, j + 1] = nums[i, j];
}
}
}
return sum;
}
Optimalization
The code above is written to be as clear as possible and to show dynamic programming approach. You do not actually need array [len+1, len+1]. You calculate column i+1 from column i and never go back, so two columns are enough - old and new. If You dig more into it, You find out that row j of new column depends only on row j and j-1 of the old column. So You can go with one column if You actualize the values in the right direction (and do not overwrite values You would need).
The code above uses 64bit integers. You really need that only in j * nums[i, j]. The nums array contain numbers less than 1000000007 and 32bit integer is enough. Even 2*1000000007 can fit into 32bit signed int, we can make use of it.
We can optimize the code by nesting loop into conditions instead of conditions in the loop. Maybe it is even more natural approach, the only downside is repeating the code.
The % operator is, as every dividing, quite expensive. j * nums[i, j] is typically far smaller that capacity of 64bit integer, so we do not have to do modulo in every step. Just watch the actual value and apply when needed. The Mod(nums[i + 1, j] + nums[i, j]) can also be optimized, as nums[i + 1, j] + nums[i, j] would always be smaller than 2*1000000007.
And finally the optimized code. I switched to C++, I realized there are differences what int and long means, so rather make it clear:
long CalcOpt(string s)
{
long len = s.length();
vector<long> nums(len + 1);
long long sum = 0;
nums[0] = 1;
const long mod = 1000000007;
for (long i = 0; i < len; ++i)
{
if (s[i] == '1')
{
for (long j = i + 1; j > 0; --j)
{
nums[j] = nums[j - 1];
}
nums[0] = 0;
}
else if (s[i] == '0')
{
for (long j = 1; j <= i; ++j)
{
sum += (long long)j * nums[j];
if (sum > std::numeric_limits<long long>::max() / 2) { sum %= mod; }
}
}
else
{
sum *= 2;
if (sum > std::numeric_limits<long long>::max() / 2) { sum %= mod; }
for (long j = i + 1; j > 0; --j)
{
sum += (long long)j * nums[j];
if (sum > std::numeric_limits<long long>::max() / 2) { sum %= mod; }
long add = nums[j] + nums[j - 1];
if (add >= mod) { add -= mod; }
nums[j] = add;
}
}
}
return (long)(sum % mod);
}
Simplification
Time limit still exceeded? There is probably better way to do it. You can either
get back to the beginning and find out mathematically different way to calculate the result
or simplify actual solution using math
I went the second way. What we are doing in the loop is in fact convolution of two sequences, for example:
0, 0, 0, 1, 4, 6, 4, 1, 0, 0,... and 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,...
0*0 + 0*1 + 0*2 + 1*3 + 4*4 + 6*5 + 4*6 + 1*7 + 0*8...= 80
The first sequence is symmetric and the second is linear. It this case, the sum of convolution can be calculated from sum of the first sequence which is = 16 (numSum) and number from second sequence corresponding to the center of the first sequence, which is 5 (numMult). numSum*numMult = 16*5 = 80. We replace the whole loop with one multiplication if we are able to update those numbers in each step, which fortulately seems the case.
If s[i] == '0' then numSum does not change and numMult does not change.
If s[i] == '1' then numSum does not change, only numMult increments by 1, as we shift the whole sequence by one position.
If s[i] == '?' we add original and shiftet sequence together. numSum is multiplied by 2 and numMult increments by 0.5.
The 0.5 means a bit problem, as it is not the whole number. But we know, that the result would be whole number. Fortunately in modular arithmetics in this case exists inversion of two (=1/2) as a whole number. It is h = (mod+1)/2. As a reminder, inversion of 2 is such a number, that h*2=1 modulo mod. Implementation wisely it is easier to multiply numMult by 2 and divide numSum by 2, but it is just a detail, we would need 0.5 anyway. The code:
long CalcOptSimpl(string s)
{
long len = s.length();
long long sum = 0;
const long mod = 1000000007;
long numSum = (mod + 1) / 2;
long long numMult = 0;
for (long i = 0; i < len; ++i)
{
if (s[i] == '1')
{
numMult += 2;
}
else if (s[i] == '0')
{
sum += numSum * numMult;
if (sum > std::numeric_limits<long long>::max() / 4) { sum %= mod; }
}
else
{
sum = sum * 2 + numSum * numMult;
if (sum > std::numeric_limits<long long>::max() / 4) { sum %= mod; }
numSum = (numSum * 2) % mod;
numMult++;
}
}
return (long)(sum % mod);
}
I am pretty sure there exists some simple way to get this code, yet I am still unable to see it. But sometimes path is the goal :-)
If a sequence has N zeros with indexes zero[0], zero[1], ... zero[N - 1], the number of inversions for it would be (zero[0] + zero[1] + ... + zero[N - 1]) - (N - 1) * N / 2. (you should be able to prove it)
For example, 11010 has two zeros with indexes 2 and 4, so the number of inversions would be 2 + 4 - 1 * 2 / 2 = 5.
For all 2^k sequences, you can calculate the sum of two parts separately and then add them up.
1) The first part is zero[0] + zero[1] + ... + zero[N - 1]. Each 0 in the the given sequence contributes index * 2^k and each ? contributes index * 2^(k-1)
2) The second part is (N - 1) * N / 2. You can calculate this using a dynamic programming (maybe you should google and learn this first). In short, use f[i][j] to present the number of sequence with j zeros using the first i characters of the given sequence.

I tried coding my own simple moving average in C++

I want a function that works.
I believe my logic is correct, thus my (vector out of range error) must be coming from the lack of familiarity and using the code correctly.
I do know that there is long code out there for this fairly simple algorithm.
Please help if you can.
Basically, I take the length as the "moving" window as it loops through j to the end of the size of the vector. This vector is filled with stock prices.
If the length equaled 2 for a 2 day moving average for numbers 1 2 3 4. I should be able to output 1.5, 2.5, and 3.5. However, I get an out of range error.
The logic is shown in the code. If an expert could help me with this simple moving average function that I am trying to create that would be great! Thanks.
void Analysis::SMA()
{
double length;
cout << "Enter number days for your Simple Moving Average:" << endl;
cin >> length;
double sum = 0;
double a;
while (length >= 2){
vector<double>::iterator it;
for (int j = 0; j < close.size(); j++){
sum = vector1[length + j - 1] + vector1[length + j - 2];
a = sum / length;
vector2.push_back(a);
vector<double>::iterator g;
for (g = vector2.begin(); g != vector2.end(); ++g){
cout << "Your SMA: " << *g;
}
}
}
}
You don't need 3 loops to calculate a moving average over an array of data, you only need 1. You iterate over the array and keep track of the sum of the last n items, and then just adjust it for each new value, adding one value and removing one each time.
For example suppose you have a data set:
4 8 1 6 9
and you want to calculate a moving average with a window size of 3, then you keep a running total like this:
iteration add subtract running-total output average
0 4 - 4 - (not enough values yet)
1 8 - 12 -
2 1 - 13 13 / 3
3 6 4 15 15 / 3
4 9 8 16 16 / 3
Notice that we add each time, we start subtracting at iteration 3 (for a window size of 3) and start outputting the average at iteration 2 (window size minus 1).
So the code will be something like this:
double runningTotal = 0.0;
int windowSize = 3;
for(int i = 0; i < length; i++)
{
runningTotal += array[i]; // add
if(i >= windowSize)
runningTotal -= array[i - windowSize]; // subtract
if(i >= (windowSize - 1)) // output moving average
cout << "Your SMA: " << runningTotal / (double)windowSize;
}
You can adapt this to use your vector data structure.
Within your outermost while loop you never change length so your function will run forever.
Then, notice that if length is two and closes.size() is four, length + j - 1 will be 5, so my psychic debugging skills tell me your vector1 is too short and you index off the end.
This question has been answered but I thought I'd post complete code for people in the future seeking information.
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<double> vector1 { 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 };
double length;
cout << "Enter number days for your Simple Moving Average:" << endl;
cin >> length;
double sum = 0;
int cnt = 0;
for (int i = 0; i < vector1.size(); i++) {
sum += vector1[i];
cnt++;
if (cnt >= length) {
cout << "Your SMA: " << (sum / (double) length) << endl;
sum -= vector1[cnt - length];
}
}
return 0;
}
This is slightly different than the answer. A 'cnt' variable in introduced to avoid an additional if statement.

How to reduce complexity of this code

Please can any one provide with a better algorithm then trying all the combinations for this problem.
Given an array A of N numbers, find the number of distinct pairs (i,
j) such that j >=i and A[i] = A[j].
First line of the input contains number of test cases T. Each test
case has two lines, first line is the number N, followed by a line
consisting of N integers which are the elements of array A.
For each test case print the number of distinct pairs.
Constraints:
1 <= T <= 10
1 <= N <= 10^6
-10^6 <= A[i] <= 10^6 for 0 <= i < N
I think that first sorting the array then finding frequency of every distinct integer and then adding nC2 of all the frequencies plus adding the length of the string at last. But unfortunately it gives wrong ans for some cases which are not known help. here is the implementation.
code:
#include <iostream>
#include<cstdio>
#include<algorithm>
using namespace std;
long fun(long a) //to find the aC2 for given a
{
if (a == 1) return 0;
return (a * (a - 1)) / 2;
}
int main()
{
long t, i, j, n, tmp = 0;
long long count;
long ar[1000000];
cin >> t;
while (t--)
{
cin >> n;
for (i = 0; i < n; i++)
{
cin >> ar[i];
}
count = 0;
sort(ar, ar + n);
for (i = 0; i < n - 1; i++)
{
if (ar[i] == ar[i + 1])
{
tmp++;
}
else
{
count += fun(tmp + 1);
tmp = 0;
}
}
if (tmp != 0)
{
count += fun(tmp + 1);
}
cout << count + n << "\n";
}
return 0;
}
Keep a count of how many times each number appears in an array. Then iterate over the result array and add the triangular number for each.
For example(from the source test case):
Input:
3
1 2 1
count array = {0, 2, 1} // no zeroes, two ones, one two
pairs = triangle(0) + triangle(2) + triangle(1)
pairs = 0 + 3 + 1
pairs = 4
Triangle numbers can be computed by (n * n + n) / 2, and the whole thing is O(n).
Edit:
First, there's no need to sort if you're counting frequency. I see what you did with sorting, but if you just keep a separate array of frequencies, it's easier. It takes more space, but since the elements and array length are both restrained to < 10^6, the max you'll need is an int[10^6]. This easily fits in the 256MB space requirements given in the challenge. (whoops, since elements can go negative, you'll need an array twice that size. still well under the limit, though)
For the n choose 2 part, the part you had wrong is that it's an n+1 choose 2 problem. Since you can pair each one by itself, you have to add one to n. I know you were adding n at the end, but it's not the same. The difference between tri(n) and tri(n+1) is not one, but n.