Finding a testcase which fails the code in the 4 sum problem - c++

We need to find whether there exists 4 numbers a, b, c and d (all numbers should be at different indices) in an array whose sum equals to a constant k.
Now its hashing based solution goes like this: Make a hash having key as sum of every pair in array and value as array of pairs of indices whose sum is the key. Now iterate over every pair in array and try to find the remaining sum in the hash table we made, while also checking that no 2 indices should be common.
While the above solution is fine, a solution I saw on geeksforgeeks.com did this: In the hash table the value is a pair instead of array of pairs. It only stores the last pair which concludes to a sum. It clearly looks wrong to me but I still can't find a test case where it fails.
Their code:
// A hashing based  CPP program to find if there are 
// four elements with given sum.
#include <bits/stdc++.h>
using namespace std;
// The function finds four elements with given sum X
void findFourElements (int arr[], int n, int X)
{
    // Store sums of all pairs in a hash table
    unordered_map<int, pair<int, int>> mp;
    for (int i = 0; i < n-1; i++)
        for (int j = i+1; j < n; j++)
            mp[arr[i] + arr[j]] = {i, j};   
    // Traverse through all pairs and search
    // for X - (current pair sum).    
    for (int i = 0; i < n-1; i++)
    {
        for (int j = i+1; j < n; j++)
        {
            int sum = arr[i] + arr[j];
  
            // If X - sum is present in hash table,            
            if (mp.find(X - sum) != mp.end())
            {   
                // Making sure that all elements are
                // distinct array elements and an element
                // is not considered more than once.
                pair<int, int> p = mp[X - sum];
                if (p.first != i && p.first != j &&
                        p.second != i && p.second != j)
                {
                    cout << arr[i] << ", " << arr[j] << ", "
                         << arr[p.first] << ", "
                         << arr[p.second];
                    return;
                }
            }
        }
    }
}
  
// Driver program to test above function
int main()
{
    int arr[] = {10, 20, 30, 40, 1, 2};
    int n = sizeof(arr) / sizeof(arr[0]);
    int X = 91;
    findFourElements(arr, n, X);
    return 0;
}
How can I find a testcase where this code fails, or if it is correct, how?

The algorithm is correct. Consider a quadruple (a, b, c, d) which satisfies the following: (1) arr[a] + arr[b] + arr[c] + arr[d] == k; (2) a < b < c < d.
It is obvious that four distinct element of the array sum to k if and only if such quadruple (a, b, c, d) exists.
Now consider the pair (a, b). You have mentioned the program records the last pair (e, f) (e < f) that is a compliment of (a, b) (i.e. arr[a] + arr[b] + arr[e] + arr[f] == k). Note that since (e, f) is the last pair with such property, so e >= c. Therefore a < b < e < f. Now we have found a valid quadruple (a, b, e, f).
Since the second loop traverse through all pairs, the pair (a, b) must have been visited, and the quadruple must have been detected. Therefore the algorithm is correct.

It only stores the last pair which concludes to a sum.
Not quite. It stores all of the pairs, just like you stored all of your arrays of length 2. Their algorithm does that here:
// Store sums of all pairs in a hash table
unordered_map<int, pair<int, int>> mp;
for (int i = 0; i < n-1; i++)
for (int j = i+1; j < n; j++)
mp[arr[i] + arr[j]] = {i, j};
{i, j} is a pair consisting of i as the first value and j as the second.
I think you're confused about what happens here:
pair<int, int> p = mp[X - sum];
if (p.first != i && p.first != j &&
p.second != i && p.second != j)
They're pulling a pair out of the map. Notably, the pair that they're matching with to form the X sum. They could do:
if (mp[X - sum].first != i && mp[X - sum].first != j &&
mp[X - sum].second != i && mp[X - sum].second != j)
But that's both ugly and a lot of map lookups. So instead they decide to copy the pair they're concerned with in a local variable, p.
They then make sure that neither of the indexes in p are those they're looking at now, i and j. Does that make sense?

Related

Partition for randomised quicksort (with few unique elements)

I've been tasked to write a partition function for a randomised quicksort with few elements (optimising it by including 3 partitions instead of 2). I've tried implementing my version of it, and have found that it doesn't pass the test cases.
However, by using a classmates' version of partition, it seems to work. Conceptually, I don't see the difference between his and mine, and I can't tell what is it with my version that causes it to break. I wrote it with the concept as him (I think), which involves using counters (j and k) to partition the arrays into 3.
I would greatly appreciate anybody that could point out why mine doesn't work, and what I should do to minimise the chances of these again. I feel like this learning point will be important to me as a developer, thank you!
For comparison, there will be 3 blocks of code, the snippet directly below will be my version of partition, following which will be my classmates' version and lastly will be the actual algorithm which runs our partition.
My version (Does not work)
vector<int> partition2(vector<int> &a, int l, int r) {
int x = a[l];
int j = l;
int k = r;
vector<int> m(2);
// I've tried changing i = l + 1
for (int i = l; i <= r; i++) {
if (a[i] < x) {
swap(a[i], a[j]);
j++;
}
else if (a[i] > x) {
swap(a[i], a[k]);
k--;
}
}
// I've tried removing this
swap(a[l], a[j]);
m[0] = j - 1;
m[1] = k + 1;
return m;
}
My classmates' (which works)
vector<int> partition2(vector<int> &a, int l, int r) {
int x = a[l];
int p_l = l;
int i = l;
int p_e = r;
vector<int> m(2);
while (i <= p_e) {
if (a[i] < x) {
swap(a[p_l], a[i]);
p_l++;
i++;
} else if (a[i] == x) {
i++;
} else {
swap(a[i], a[p_e]);
p_e -= 1;
}
m[0] = p_l - 1;
m[1] = p_e + 1;
}
return m;
}
Actual quick sort algorithm
void randomized_quick_sort(vector<int> &a, int l, int r) {
if (l >= r) {
return;
}
int k = l + rand() % (r - l + 1);
swap(a[l], a[k]);
vector<int> m = partition2(a, l, r);
randomized_quick_sort(a, l, m[0]);
randomized_quick_sort(a, m[1], r);
}
The difference between the two functions for three-way partition is that your code advances i in each pass through the loop, but your classmate's function advances i only when the value at position i is less or equal to the pivot.
Let's go through an example array. The first value, 3, is the pivot. The letters indicate the positions of the variables after each pass through the loop.
j k
3 1 5 2 4
i
The next value is smaller: swap it to the left side and advance j:
j k
1 3 5 2 4
i
The next value, 5, is greater, so it goes to the right:
j k
1 3 4 2 5
i
That's the bad move: Your i has now skipped over the 4, which must go to the right part, too. Your classmate's code does not advance the i here and catches the 4 in the next pass.
Your loop has some invariants, things that must be true after all passes:
All items with an index lower than i are smaller than the pivot.
All items with an index greater than k are greater than the pivot.
All items with an index from j to i - 1 are equal to the pivot.
All items from i to k have not yet been processed.
You can also determine the loop conditions from that:
The pivot is the leftmost element by definition, because the quicksort function swaps it there. It must belong to the group of elements that are equal to the pivot, so you can start your loop at l + 1.
All items starting from k are already in the correct part of the array. That means that you can stop when i reaches k. Going further will needlessly swap elements around inside the "greater than" partition and also move k, which will return wrong partition boundaries.

Find the number of triples (i, j, k) in array such that A[i] + A[j] = 2 * A[k]

How to finds the number of tuplets/pairs i, j, k in array such that a[i] + a[j] = 2 * a[k]. The complexity should be O(n * logn) or O(n) since n <= 10^5.
Edit 2(important): abs(a[i]) <= 10^3.
Edit:
i, j, k must all be distinct.
Here is my code, but it's too slow, it's complexity O(is n^2 logn).
#include <bits/stdc++.h>
using namespace std;
int binarna(vector<int> a, int k){
int n = a.size();
int low = 0, high = n - 1;
bool b = false;
int mid;
while(low <= high){
mid = (low + high) / 2;
if (a[mid] == k){
b = true;
break;
}
if (k < a[mid])
high = mid - 1;
else
low = mid + 1;
}
if (b)
return mid;
else
return -1;
}
int main()
{
int n;
cin >> n;
vector<int> a(n);
for (auto& i : a)
cin >> i;
sort(a.begin(), a.end());
int sol = 0;
for (int i = 0; i < n - 1; ++i){
for (int j = i + 1; j < n; ++j){
if ((a[i] + a[j]) % 2)
continue;
int k = (a[i] + a[j]) / 2;
if (binarna(a, k) != -1)
++sol;
}
}
cout << sol << '\n';
}
The complexity can't probably be better than O(N²) because in the case of elements forming a single arithmetic progression, all pairs (i, j) with j-i even have a suitable element in the middle and the count is O(N²)*.
An O(N²) solution is as follows:
sort the array increasingly;
for every i,
set k=i and for every j>i,
increment k until 2 A[k] >= A[i] + A[j]
increment the count if equality is achieved
For a given i, j and k are monotonously increasing up to N so that the total number of operations is O(N-i). This justifies the global behavior O(N²), which is optimal.
*There is a little subtlety here as you might contradict the argument by claiming: "we can identify that the array forms an arithmetic sequence in time O(N), and from this compute the count in a single go".
But if instead of a single arithmetic sequence, we have two of them of length N/2, the quadratic behavior of the count remains even if they are intertwined. And there are at least N ways to intertwine two arithmetic sequences.
If the range of elements is much smaller than their number, it is advantageous to compress the data by means of an histogram.
The triple detection algorithm simplifies a little because k is systematically (i+j)/2. Every triple now counts for Hi.Hk.Hj instead of 1. The complexity is O(M²), where M is the size of the histogram.
Let's call D - total number of distinct values in the array. If abs(a[i]) <= 10^3, then you can't have more than 2*10^3 distinct values in the array. It means that if you are a bit smart, complexity of your algorithm becomes minimum of O(D^2*log(D)) and O(N*log(N)) which is far better than O(N^2*log(N)) and if you use smart algorithm suggested by Yves, you get minimum of O(D^2*log(D)) and O(N*log(N)).
Obviously O(N*log(N)) comes from sorting and you can't avoid it but that's OK even for N = 10^5. So how to reduce N to D in the main part of the algorithm? It is not hard. What you need is to replace the array of int values with an array of tuples (value, count) (let's call it B). It is easy to get such an array by scanning the original array after it is being sorted. The size of this new array is D (instead of N). Now you apply your algorithm or Yves improved algorithm to this array but each time you find a triplet (i,j,k) such that
2*B[k].value == B[i].value + B[j].value
you increment your total counter by
totalCount += B[k].count * B[i].count * B[j].count
Why this works? Consider the original sorted array. When you find a triplet (i,j,k) such that
2*A[k].value == A[i].value + A[j].value
You actually find 3 ranges for i, j and k such that in each range values are equal and so you can pick any number from the corresponding range. And simple combinatorics suggest the formula above.

How to find size of largest subset of a sub-sequence equal to a sum

I have this problem from hackerearth
Given an array of N integers, C cards and S sum. Each card can be used
either to increment or decrement an integer in the given array by 1.
Find if there is any subset (after/before using any no.of cards) with
sum S in the given array.
Input Format
First line of input contains an integer T which denotes the no. of
testcases. Each test case has 2 lines of input. First line of each
test case has three integers N(size of the array), S(subset sum) and
C(no. of cards). Second line of each test case has N integers of the
array(a1 to aN) seperated by a space.
Constraints
1<=T<=100 1<=N<=100 1<=S<=10000 0<=C<=100 1<=ai<=100
Output Format
Print TRUE if there exists a subset with given sum else print FALSE.
So this is basically a variation of the subset sum problem, but instead of finding out whether a given subset with a sum S exists, we need to find the largest subset from sequence index to N-1 that has a value of s and compare it's length with our C value to see if it is greater. If it is, then we have enough elements to modify the sum using our C cards, and then we print out our answer. Here is my code for that
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
int N, S, C;
int checkSum(int index, int s, vector<int>& a, vector< vector<int> >& dP) {
if (dP[index][s] != -1)
return dP[index][s];
int maxNums = 0; // size of maximum subset array
for (int i = index; i < N; i++) {
int newSum = s - a[i];
int l = 0;
if (newSum == 0) {
l = 1;
} if (newSum > 0) {
if (i < (N-1)) { // only if we can still fill up sum
l = checkSum(i + 1, newSum, a, dP);
if (l > 0) // if it is possible to create this sum
l++; // include l in it
} else {
// l stays at 0 for there is no subset that can create this sum
}
} else {
// there is no way to create this sum, including this number, so skip it;
if (i == (N-1))
break; // don't go to the next level
// and l stays at 0
}
if (l > maxNums) {
maxNums = l;
}
}
dP[index][s] = maxNums;
return maxNums;
}
int main() {
int t;
cin >> t;
while (t--) {
cin >> N >> S >> C;
vector<int> a(N);
for (int i = 0; i < N; i++)
cin >> a[i];
vector< vector<int> > dP(N, vector<int>(S + C + 2, -1));
bool possible = false;
for (int i = 0; i <= C; i++) {
int l = checkSum(0, S-i, a, dP);
int m = checkSum(0, S+i, a, dP);
if ( (l > 0 && l >= i) || (m > 0 && m >= i) ) {
cout << "TRUE" << endl;
possible = true;
break;
}
}
if (!possible)
cout << "FALSE" << endl;
}
return 0;
}
So basically, 0 means it's not possible to create a subset equal to s from elements index to N-1, and -1 means we haven't computed it yet. And any other value indicates the size of the largest subset that sums up to s. This code isn't passing all the test cases. What's wrong?
You miss an else in following line
} if (newSum > 0) {
This make your program has an unexpected early break before updating maxNums by l in some cases.
For example, N=1, S=5, C=0, a={5}
Potential logic problem
You have limited the no. of card to be used to not exceed the subset size while the question never state you cannot apply multiple cards to same integers.
I mean l >= i and m >= i in
if ( (l > 0 && l >= i) || (m > 0 && m >= i) ) {
Seems you have logic flaw.
You need to find the shortest subset (with sum in range S-C..S+C) and compare it's size with C. If subset is shorter, it is possible to make needed sum.

Find the kth smallest element in an unsorted array of non-negative integers

Not allowed to modify the array ( The array is read only ).
Using constant extra space is allowed.
ex:
A : [2 1 4 3 2]
k : 3
answer : 2
I did it below way. The answer is correct but need to be more memory efficient.
void insert_sorted(vector<int> &B, int a,int k)
{
for(int i=0;i<k;i++)
{
if(B[i]>=a)
{
for(int j=k-1;j>i;j--)
B[j]=B[j-1];
B[i]=a;
return;
}
}
}
int Solution::kthsmallest(const vector<int> &A, int k) {
vector <int> B;
for(int i=0;i<k;i++)
{
B.push_back(INT_MAX);
}
int l=A.size();
for(int i=0;i<l;i++)
{
if(B[k-1]>=A[i])
insert_sorted(B,A[i],k);
}
return B[k-1];
}
One possible solution is binary search.
Let A be the input array; we want to find a number b such that exactly k items in A are smaller than b.
Obviously, b must be inside the range [0, max(A)].
And we do binary search starting with this range.
Suppose we are searching within range [lo, hi].
Let c = (lo + hi)/2 which is the middle pivot.
There are three cases:
number of items in A less than c are less than k.
In this case the number we search for should be larger than c, so it should be in range (c, hi]
number of items in A less than c are larger than k.
Similarly, the number we search for is in range [lo, c)
number of items in A less than c equals to k.
In this case, the answer is the minimum element in A that is greater than or equals to c. This can be find by doing a linear search in A again
The complexity is O(n log m), where m is the max element in A.
/* assume k is 0 based, i.e. 0 <= k < n */
int kth_element(const vector<int> &A, int k){
int lo = 0, hi = *max_element(A.begin(), A.end());
while (lo <= hi){
int mid = (lo + hi) / 2;
int rank_lo = count_if(A.begin(), A.end(), [=](int i){ return i < mid;});
int rank_hi = count_if(A.begin(), A.end(), [=](int i){ return i <= mid;});
if (rank_lo <= k && k < rank_hi)
return mid;
if (k >= rank_hi)
lo = mid + 1;
else
hi = mid - 1;
}
}
Although it's not the answer to this particular problem (as it requires a modifiable collection), there is a function called std::nth_element, which rearranges the elements so that the kth element is at position k, and all elements at positions less than k are smaller than or equal to the kth element, where k is a input parameter.
The question does not ask for any time constraints. An O(nk) solution is fairly simple, by iterating the array k times (at most), and discarding one element (and its duplicates) each time.
int FindKthSmallesr(const std::vector<int>& v, int k) {
// assuming INT_MIN cannot be a value. Could be relaxed by an extra iteration.
int last_min = INT_MIN;
while (k > 0) {
int current_min = INT_MAX;
for (int x : v) {
if (x <= last_min) continue;
current_min = std::min(current_min, x);
}
last_min = current_min;
for (int x : v) {
if (x == current_min) k--;
}
}
return last_min;
}
Code on ideone: http://ideone.com/RjRIkM
If only constant extra space is allowed, we can use a simple O(n*k) algorithm.
int kth_smallest(const vector<int>& v, int k) {
int curmin = -1;
int order = -1;
while (order < k) { // while kth element wasn't reached
curmin = *min_element(v.begin(), v.end(), [curmin](int a, int b) {
if (a <= curmin) return false;
if (b <= curmin) return true;
return a < b;
}); // find minimal number among not counted yet
order += count(v.begin(), v.end(), curmin); // count all 'minimal' numbers
}
return curmin;
}
online version to play with: http://ideone.com/KNMYxA

Find the number of increasing sub sequence

I want to find the numbers of increasing subsequence in an array and I came across a Binary index tree which provide us O(log n) solution.
I can't understand the code used for BIT:
void madd(int& a, int b)
{
a += b;
}
// fenwick code
void update(int i, int x)
{
for (++i; i < MAX_N; i += i & -i) madd(ft[i], x);
}
int query(int i)
{
int s = 0;
for (++i; i > 0; i -= i & -i) madd(s, ft[i]);
return s;
}
for (int i = 0; i < N; i++)
{
dp[i] = 1 + query(H[i] - 1); // H[i] contains the our number array
update(H[i], dp[i]);
}
Please help me to understand it.
Thank you
The idea of the algorithm is rather simple:
Let's create an array f, where f[i] is the number of increasing subsequences that has i as a last element. Initially it is filled with zeros.
Let's iterate over all elements of the initial array and update f values. If the current element is h, then we can add it to all increasing subsequences that have the last element less than h or create a new subsequence that contains only this number. That's why dp[i] = sum(f[j]) + 1, where 0 <= j < h.
BIT can be used to find a sum on a prefix of the array and update one element efficiently(it is required for the step 2), that's why it is used to store f values.