I am pretty noobie with C++ and am trying to do some HackerRank challenges as a way to work on that.
Right now I am trying to solve Angry Children problem: https://www.hackerrank.com/challenges/angry-children
Basically, it asks to create a program that given a set of N integer, finds the smallest possible "unfairness" for a K-length subset of that set. Unfairness is defined as the difference between the max and min of a K-length subset.
The way I'm going about it now is to find all K-length subsets and calculate their unfairness, keeping track of the smallest unfairness.
I wrote the following C++ program that seems to the problem correctly:
#include <cmath>
#include <cstdio>
#include <iostream>
using namespace std;
int unfairness = -1;
int N, K, minc, maxc, ufair;
int *candies, *subset;
void check() {
ufair = 0;
minc = subset[0];
maxc = subset[0];
for (int i = 0; i < K; i++) {
minc = min(minc,subset[i]);
maxc = max(maxc, subset[i]);
}
ufair = maxc - minc;
if (ufair < unfairness || unfairness == -1) {
unfairness = ufair;
}
}
void process(int subsetSize, int nextIndex) {
if (subsetSize == K) {
check();
} else {
for (int j = nextIndex; j < N; j++) {
subset[subsetSize] = candies[j];
process(subsetSize + 1, j + 1);
}
}
}
int main() {
cin >> N >> K;
candies = new int[N];
subset = new int[K];
for (int i = 0; i < N; i++)
cin >> candies[i];
process(0, 0);
cout << unfairness << endl;
return 0;
}
The problem is that HackerRank requires the program to come up with a solution within 3 seconds and that my program takes longer than that to find the solution for 12/16 of the test cases. For example, one of the test cases has N = 50 and K = 8; the program takes 8 seconds to find the solution on my machine. What can I do to optimize my algorithm? I am not very experienced with C++.
All you have to do is to sort all the numbers in ascending order and then get minimal a[i + K - 1] - a[i] for all i from 0 to N - K inclusively.
That is true, because in optimal subset all numbers are located successively in sorted array.
One suggestion I'd give is to sort the integer list before selecting subsets. This will dramatically reduce the number of subsets you need to examine. In fact, you don't even need to create subsets, simply look at the elements at index i (starting at 0) and i+k, and the lowest difference for all elements at i and i+k [in valid bounds] is your answer. So now instead of n choose k subsets (factorial runtime I believe) you just have to look at ~n subsets (linear runtime) and sorting (nlogn) becomes your bottleneck in performance.
Related
Consider the following code to find a peak in an array.
#include<iostream>
#include<chrono>
#include<unistd.h>
using namespace std;
//Linear search solution
int peak(int *A, int len)
{
if(A[0] >= A[1])
return 0;
if(A[len-1] >= A[len-2])
return len-1;
for(int i=1; i < len-1; i=i+1) {
if(A[i] >= A[i-1] && A[i] >= A[i+1])
return i;
}
return -1;
}
int mean(int l, int r) {
return l-1 + (r-l)/2;
}
//Recursive binary search solution
int peak_rec(int *A, int l, int r)
{
// cout << "Called with: " << l << ", " << r << endl;
if(r == l)
return l;
if(r == l+ 1)
return (A[l] >= A[l+1])?l:l+1;
int m = mean(l, r);
if(A[m] >= A[m-1] && A[m] >= A[m+1])
return m;
if(A[m-1] >= A[m])
return peak_rec(A, l, m-1);
else
return peak_rec(A, m+1, r);
}
int main(int argc, char * argv[]) {
int size = 100000000;
int *A = new int[size];
for(int l=0; l < size; l++)
A[l] = l;
chrono::steady_clock::time_point start = chrono::steady_clock::now();
int p = -1;
for(int k=0; k <= size; k ++)
// p = peak(A, size);
p = peak_rec(A, 0, size-1);
chrono::steady_clock::time_point end = chrono::steady_clock::now();
chrono::duration<double> time_span = chrono::duration_cast<chrono::duration<double>>(end - start);
cout << "Peak finding: " << p << ", time in secs: " << time_span.count() << endl;
delete[] A;
return 0;
}
If I compile with -O3 and use the linear search solution (the peak function) it takes:
0.049 seconds
If I use the binary search solution which should be much faster (the peak_rec function), it takes:
5.27 seconds
I tried turning off optimization but this didn't change the situation. I also tried both gcc and clang.
What is going on?
What is going on is that you've tested it in one case of a strictly monotonically increasing function. Your linear search routine has a shortcut that checks the final two entries, so it never even does a linear search. You should test random arrays to get a true sense of the distribution of runtimes.
That happens because your linear search solution has an optimization for sorted arrays as the one you are passing into it. if(A[len-1] >= A[len-2]) will return your function before even approaching to enter the search loop when your array is sorted uprising so the complexity there is constant for rising sorted arrays. Your binary search however, does a full search for the array and thus takes much longer. The solution would be to fill your array randomly. You can achieve this by using a random number generator:
int main() {
std::random_device rd; /* Create a random device to seed our twisted mersenne generator */
std::mt19937 gen(rd()); /* create a generator with a random seed */
std::uniform_int_distribution<> range(0, 100000000); /* specify a range for the random values (choose whatever you want)*/
int size = 100000000;
int *A = new int[size];
for(int l=0; l < size; l++)
A[l] = range(gen); /* fill the array with random values in the range of 0 - 100000000
[ . . . ]
EDIT:
One very important thing when you fill your array randomly: your function will not work with unsorted arrays since if the first element is grater than the second or the last one is greater than the previous, the function returns even if there was a value inbetween which is much greater. So remove those lines if you expect unsorted arrays (which you should since a search a peak element is always constant complexity for sorted arrays and there is no point in searching one)
A problem involves a depth first search in a directed graph to find all the nodes that can be reached from a particular node. The solution given below is giving a wrong result on codechef. But I cannot find any test case for which this might produce a different result that the usual DFS algorithm would.
I know I can directly implement the correct algorithm to get the right result but I want to learn why my solution was incorrect so that I won't repeat it in future. Please help me identify whats wrong with this solution. The code is commented to explain my approach
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
typedef long long int lli;
vector <lli> g[1000+5]; // the adjacency list 1 indexed
void dfs(lli j, lli i);
int main(){
lli n, m, k, a, b;
// n = number of nodes
// m = number of relations
// k = multiplication factor
cin >> n >> m >> k;
while(m--){
// a,b means a is dependent upon b (directed graph)
cin >> a >> b;
g[a].push_back(b);
}
for(lli j = 1; j <= n; j++)
for(lli i = 0; i < g[j].size(); i++){
dfs(j, g[j][i]); // adds dependencies of g[j][i]
// to adjacency list of j
}
// ans is the minimum no of nodes dependent on a particular node
lli ans = g[1].size();
for(lli i = 1; i <= n; i++){
if(g[i].size() < ans)
ans = g[i].size();
}
cout << (ans+1)*k <<"\n";
}
void dfs(lli j, lli i){
// adding dependencies of a node to itself
// would result in an infinite loop?
if(i != j){
for(lli k = 0; k < g[i].size(); k++){
// a node is not dependent on itself
if(g[i][k]!=j && find(g[j].begin(), g[j].end(), g[i][k])==g[j].end()){
g[j].push_back(g[i][k]);
dfs(j, g[i][k]);
}
}
}
}`
The link for the problem : problem
link for correct solution: correct solution
Your problem is that you are not aware of multi-edges which are possible with the given problem constrains, otherwise it looks correct. Take a look at this test case:
2 4 1
1 2
1 2
2 1
2 1
Your program will return 3, but there are only 2 vertices!
Having said that, I would like to add, that I disagree with the sample solution: It says the running time would be O(N^2) which is not true, because it starts N dfs every one with costs of O(N+M) thus resulting in O(N*(N+M)) with N=10^3 and M=10^6 there is no change to be in the time limit of 0.01 seconds!
Actually, this problem can be solved in O(N+M) using algorithms for detecting strongly connected components.
Given this question:
Given an array A on size N, you need to find the number of ordered
pairs (i, j) such that i < j and A[i] > A[j]
Input: First line contains one integer, N, size of array. Second line contains N space separated integers denoting the elements of the
array A.
Output: Print the number of ordered pairs (i, j) such that i < j and
A[i] > A[j].
Constraints:
1 ≤ N ≤ 10^6
1 ≤ A[i] ≤ 10^6
Source: hackerearth's merge sort tutorial
I'm encountering problems properly implementing the solution.
This is the code I wrote:
#include <iostream>
using namespace std;
int ar[10000000];
long long counting=0;
void merge(int* ALR, int* L, int left_length, int* R, int right_length) {
int l = 0;
int r = 0;
for (int i = 0; i < left_length + right_length;) {
if (l == left_length)ALR[i++] = R[r++];
else if (r == right_length)ALR[i++] = L[l++];
else if(L[l]>R[r]){
counting+=(left_length-l);
ALR[i++]=L[l++];
}
else ALR[i++]=R[r++];
}
}
void merge_sort(int* ALR, int length) {
if (length == 1)return;
int mid = length / 2;
int* L = new int[mid];
int* R = new int[length - mid];
int k = 0;
for (size_t i = 0; k < mid; i++)L[i] = ALR[k++];
for (size_t i = 0; k < length; i++)R[i] = ALR[k++];
merge_sort(L, mid);
merge_sort(R, length - mid);
merge(ALR, L, mid, R, length - mid);
delete(L);
delete(R);
}
int main() {
int t;
cin>> t;
for(int i=0;i<t;i++)cin>> ar[i];
merge_sort(ar, t);
cout<<counting;
return 0;
}
Now the problem is that I'm getting a wrong answer in the 2nd test case ...
The answer should be: 250194527312
The answer I get: 250002372570
Where did it go wrong?
A general principle you should follow is unit testing small bits of code. In this case, you should test the merge function, to see if what you get when you merges is correct. If you had written a test which merges two very small arrays, then you would have seen the result be in descending order, and the inversion count would usually be wrong.
Here's the test case I used for merge-sort inversion counting:
// expect 3 inversions in [1,3,5,2,4,6]
Your actual problem is an easy error to make (flip the comparitor and count the other branch as an inversion), and I guarantee many experienced programmers would make some equivalent mistake before running their tests. The difference between a novice and veteran is knowing how to find those mistakes (and structure tests so that they are found automatically).
I solved this problem but I got TLE Time Limit Exceed on online judge
the output of program is right but i think the way can be improved to be more efficient!
the problem :
Given n integer numbers, count the number of ways in which we can choose two elements such
that their absolute difference is less than 32.
In a more formal way, count the number of pairs (i, j) (1 ≤ i < j ≤ n) such that
|V[i] - V[j]| < 32. |X|
is the absolute value of X.
Input
The first line of input contains one integer T, the number of test cases (1 ≤ T ≤ 128).
Each test case begins with an integer n (1 ≤ n ≤ 10,000).
The next line contains n integers (1 ≤ V[i] ≤ 10,000).
Output
For each test case, print the number of pairs on a single line.
my code in c++ :
int main() {
int T,n,i,j,k,count;
int a[10000];
cin>>T;
for(k=0;k<T;k++)
{ count=0;
cin>>n;
for(i=0;i<n;i++)
{
cin>>a[i];
}
for(i=0;i<n;i++)
{
for(j=i;j<n;j++)
{
if(i!=j)
{
if(abs(a[i]-a[j])<32)
count++;
}
}
}
cout<<count<<endl;
}
return 0;
}
I need help how can I solve it in more efficient algorithm ?
Despite my previous (silly) answer, there is no need to sort the data at all. Instead you should count the frequencies of the numbers.
Then all you need to do is keep track of the number of viable numbers to pair with, while iterating over the possible values. Sorry no c++ but java should be readable as well:
int solve (int[] numbers) {
int[] frequencies = new int[10001];
for (int i : numbers) frequencies[i]++;
int solution = 0;
int inRange = 0;
for (int i = 0; i < frequencies.length; i++) {
if (i > 32) inRange -= frequencies[i - 32];
solution += frequencies[i] * inRange;
solution += frequencies[i] * (frequencies[i] - 1) / 2;
inRange += frequencies[i];
}
return solution;
}
#include <bits/stdc++.h>
using namespace std;
int a[10010];
int N;
int search (int x){
int low = 0;
int high = N;
while (low < high)
{
int mid = (low+high)/2;
if (a[mid] >= x) high = mid;
else low = mid+1;
}
return low;
}
int main() {
cin >> N;
for (int i=0 ; i<N ; i++) cin >> a[i];
sort(a,a+N);
long long ans = 0;
for (int i=0 ; i<N ; i++)
{
int t = search(a[i]+32);
ans += (t -i - 1);
}
cout << ans << endl;
return 0;
}
You can sort the numbers, and then use a sliding window. Starting with the smallest number, populate a std::deque with the numbers so long as they are no larger than the smallest number + 31. Then in an outer loop for each number, update the sliding window and add the new size of the sliding window to the counter. Update of the sliding window can be performed in an inner loop, by first pop_front every number that is smaller than the current number of the outer loop, then push_back every number that is not larger than the current number of the outer loop + 31.
One faster solution would be to first sort the array, then iterate through the sorted array and for each element only visit the elements to the right of it until the difference exceeds 31.
Sorting can probably be done via count sort (since you have 1 ≤ V[i] ≤ 10,000). So you get linear time for the sorting part. It might not be necessary though (maybe quicksort suffices in order to get all the points).
Also, you can do a trick for the inner loop (the "going to the right of the current element" part). Keep in mind that if S[i+k]-S[i]<32, then S[i+k]-S[i+1]<32, where S is the sorted version of V. With this trick the whole algorithm turns linear.
This can be done constant number of passes over the data, and actually can be done without being affected by the value of the "interval" (in your case, 32).
This is done by populating an array where a[i] = a[i-1] + number_of_times_i_appears_in_the_data - informally, a[i] holds the total number of elements that are smaller/equals to i.
Code (for a single test case):
static int UPPER_LIMIT = 10001;
static int K = 32;
int frequencies[UPPER_LIMIT] = {0}; // O(U)
int n;
std::cin >> n;
for (int i = 0; i < n; i++) { // O(n)
int x;
std::cin >> x;
frequencies[x] += 1;
}
for (int i = 1; i < UPPER_LIMIT; i++) { // O(U)
frequencies[i] += frequencies[i-1];
}
int count = 0;
for (int i = 1; i < UPPER_LIMIT; i++) { // O(U)
int low_idx = std::max(i-32, 0);
int number_of_elements_with_value_i = frequencies[i] - frequencies[i-1];
if (number_of_elements_with_value_i == 0) continue;
int number_of_elements_with_value_K_close_to_i =
(frequencies[i-1] - frequencies[low_idx]);
std::cout << "i: " << i << " number_of_elements_with_value_i: " << number_of_elements_with_value_i << " number_of_elements_with_value_K_close_to_i: " << number_of_elements_with_value_K_close_to_i << std::endl;
count += number_of_elements_with_value_i * number_of_elements_with_value_K_close_to_i;
// Finally, add "duplicates" of i, this is basically sum of arithmetic
// progression with d=1, a0=0, n=number_of_elements_with_value_i
count += number_of_elements_with_value_i * (number_of_elements_with_value_i-1) /2;
}
std::cout << count;
Working full example on IDEone.
You can sort and then use break to end loop when ever the range goes out.
int main()
{
int t;
cin>>t;
while(t--){
int n,c=0;
cin>>n;
int ar[n];
for(int i=0;i<n;i++)
cin>>ar[i];
sort(ar,ar+n);
for(int i=0;i<n;i++){
for(int j=i+1;j<n;j++){
if(ar[j]-ar[i] < 32)
c++;
else
break;
}
}
cout<<c<<endl;
}
}
Or, you can use a hash array for the range and mark occurrence of each element and then loop around and check for each element i.e. if x = 32 - y is present or not.
A good approach here is to split the numbers into separate buckets:
constexpr int limit = 10000;
constexpr int diff = 32;
constexpr int bucket_num = (limit/diff)+1;
std::array<std::vector<int>,bucket_num> buckets;
cin>>n;
int number;
for(i=0;i<n;i++)
{
cin >> number;
buckets[number/diff].push_back(number%diff);
}
Obviously the numbers that are in the same bucket are close enough to each other to fit the requirement, so we can just count all the pairs:
int result = std::accumulate(buckets.begin(), buckets.end(), 0,
[](int s, vector<int>& v){ return s + (v.size()*(v.size()-1))/2; });
The numbers that are in non-adjacent buckets cannot form any acceptable pairs, so we can just ignore them.
This leaves the last corner case - adjacent buckets - which can be solved in many ways:
for(int i=0;i<bucket_num-1;i++)
if(buckets[i].size() && buckets[i+1].size())
result += adjacent_buckets(buckets[i], buckets[i+1]);
Personally I like the "occurrence frequency" approach on the one bucket scale, but there may be better options:
int adjacent_buckets(const vector<int>& bucket1, const vector<int>& bucket2)
{
std::array<int,diff> pairs{};
for(int number : bucket1)
{
for(int i=0;i<number;i++)
pairs[i]++;
}
return std::accumulate(bucket2.begin(), bucket2.end(), 0,
[&pairs](int s, int n){ return s + pairs[n]; });
}
This function first builds an array of "numbers from lower bucket that are close enough to i", and then sums the values from that array corresponding to the upper bucket numbers.
In general this approach has O(N) complexity, in the best case it will require pretty much only one pass, and overall should be fast enough.
Working Ideone example
This solution can be considered O(N) to process N input numbers and constant in time to process the input:
#include <iostream>
using namespace std;
void solve()
{
int a[10001] = {0}, N, n, X32 = 0, ret = 0;
cin >> N;
for (int i=0; i<N; ++i)
{
cin >> n;
a[n]++;
}
for (int i=0; i<10001; ++i)
{
if (i >= 32)
X32 -= a[i-32];
if (a[i])
{
ret += a[i] * X32;
ret += a[i] * (a[i]-1)/2;
X32 += a[i];
}
}
cout << ret << endl;
}
int main()
{
int T;
cin >> T;
for (int i=0 ; i<T ; i++)
solve();
}
run this code on ideone
Solution explanation: a[i] represents how many times i was in the input series.
Then you go over entire array and X32 keeps track of number of elements that's withing range from i. The only tricky part really is to calculate properly when some i is repeated multiple times: a[i] * (a[i]-1)/2. That's it.
You should start by sorting the input.
Then if your inner loop detects the distance grows above 32, you can break from it.
Thanks for everyone efforts and time to solve this problem.
I appreciated all Attempts to solve it.
After testing the answers on online judge I found the right and most efficient solution algorithm is Stef's Answer and AbdullahAhmedAbdelmonem's answer also pavel solution is right but it's exactly same as Stef solution in different language C++.
Stef's code got time execution 358 ms in codeforces online judge and accepted.
also AbdullahAhmedAbdelmonem's code got time execution 421 ms in codeforces online judge and accepted.
if they put detailed explanation to there algorithm the bounty will be to one of them.
you can try your solution and submit it to codeforces online judge at this link after choosing problem E. Time Limit Exceeded?
also I found a great algorithm solution and more understandable using frequency array and it's complexity O(n).
in this algorithm you only need to take specific range for each inserted element to the array which is:
begin = element - 32
end = element + 32
and then count number of pair in this range for each inserted element in the frequency array :
int main() {
int T,n,i,j,k,b,e,count;
int v[10000];
int freq[10001];
cin>>T;
for(k=0;k<T;k++)
{
count=0;
cin>>n;
for(i=1;i<=10000;i++)
{
freq[i]=0;
}
for(i=0;i<n;i++)
{
cin>>v[i];
}
for(i=0;i<n;i++)
{
count=count+freq[v[i]];
b=v[i]-31;
e=v[i]+31;
if(b<=0)
b=1;
if(e>10000)
e=10000;
for(j=b;j<=e;j++)
{
freq[j]++;
}
}
cout<<count<<endl;
}
return 0;
}
finally i think the best approach to solve this kind of problems to use frequency array and count number of pairs in specific range because it's time complexity is O(n).
I have n elements stored in an array and a number k of possible subset over n(n chose k).
I have to find all the possible combinations of k elements in the array of length n and, for each set(of length k), make some calculations on the elements choosen.
I have written a recursive algorithm(in C++) that works fine, but for large number it crashes going out of heap space.
How can I fix the problem? How can I calculate all the sets of n chose k for large n and k?
Is there any library for C++ that can help me?
I know it is a np problem but I would write the best code in order to calculate the biggest numbers possible.
Which is approximately the biggest numbers (n and k)beyond which it becames unfeasible?
I am only asking for the best algorithm, not for unfeasible space/work.
Here my code
vector<int> people;
vector<int> combination;
void pretty_print(const vector<int>& v)
{
static int count = 0;
cout << "combination no " << (++count) << ": [ ";
for (int i = 0; i < v.size(); ++i) { cout << v[i] << " "; }
cout << "] " << endl;
}
void go(int offset, int k)
{
if (k == 0) {
pretty_print(combination);
return;
}
for (int i = offset; i <= people.size() - k; ++i) {
combination.push_back(people[i]);
go(i+1, k-1);
combination.pop_back();
}
}
int main() {
int n = #, k = #;
for (int i = 0; i < n; ++i) { people.push_back(i+1); }
go(0, k);
return 0;
}
Here is non recursive algorithm:
const int n = ###;
const int k = ###;
int currentCombination[k];
for (int i=0; i<k; i++)
currentCombination[i]=i;
currentCombination[k-1] = k-1-1; // fill initial combination is real first combination -1 for last number, as we will increase it in loop
do
{
if (currentCombination[k-1] == (n-1) ) // if last number is just before overwhelm
{
int i = k-1-1;
while (currentCombination[i] == (n-k+i))
i--;
currentCombination[i]++;
for (int j=(i+1); j<k; j++)
currentCombination[j] = currentCombination[i]+j-i;
}
else
currentCombination[k-1]++;
for (int i=0; i<k; i++)
_tprintf(_T("%d "), currentCombination[i]);
_tprintf(_T("\n"));
} while (! ((currentCombination[0] == (n-1-k+1)) && (currentCombination[k-1] == (n-1))) );
Your recursive algorithm might be blowing the stack. If you make it non-recursive, then that would help, but it probably won't solve the problem if your case is really 100 choose 10. You have two problems. Few, if any, computers in the world have 17+ terabytes of memory. Going through 17 trillion+ iterations to generate all the combinations will take way too long. You need to rethink the problem and either come up with an N choose K case that is more reasonable, or process only a certain subset of the combinations.
You probably do not want to be processing more than a billion or two combinations at the most - and even that will take some time. That translates to around 41 choose 10 to about 44 choose 10. Reducing either N or K will help. Try editing your question and posting the problem you are trying to solve and why you think you need to go through all of the combinations. There may be a way to solve it without going through all of the combinations.
If it turns out you do need to go through all those combinations, then maybe you should look into using a search technique like a genetic algorithm or simulated annealing. Both of these hill climbing search techniques provide the ability to search a large space in a relatively small time for a close to optimal solution, but neither guarantee to find the optimal solution.
You can use next_permutation() in algorithm.h to generate all possible combinations.
Here is some example code:
bool is_chosen(n, false);
fill(is_chosen.begin() + n - k, is_chosen.end(), true);
do
{
for(int i = 0; i < n; i++)
{
if(is_chosen[i])
cout << some_array[i] << " ";
}
cout << endl;
} while( next_permutation(is_chosen.begin(), is_chosen.end()) );
Don't forget to include the algorithm.
As I said in a comment, it's not clear what you really want.
If you want to compute (n choose k) for relatively small values, say n,k < 100 or so, you may want to use a recursive method, using Pascals triangle.
If n,k are large (say n=1000000, k=500000), you may be happy with an approxiate result using Sterlings formula for the factorial: (n choose k) = exp(loggamma(n)-loggamma(k)-loggamma(n-k)), computing loggamma(x) via Sterling's formula.
If you want (n choose k) for all or many k but the same n, you can simply iterate over k and use (n choose k+1) = ((n choose k)*(n-k))/(k+1).