Improving the time complexity in priority queue in c++ - c++

In the code below, I am getting time out for larger vector length, though it is working for smaller length vector.
long priceCalculate(vector < int > a, long k) {
long price = 0;
priority_queue<int>pq(a.begin(),a.end());
while(--k>=0){
int x = pq.top();
price = price + x;
pq.pop();
pq.push(x-1);
}
return price;
}
I have an array of numbers. I have to add the maximum number to price and then decrement that number by 1. Again find the maximum number and so on. I have to repeat this process for k times.
Is there any better data structure than priority queue which has less time complexity?
Below is the code using vector sort:
struct mclass {
public: bool operator()(int x, int y) {
return (x > y);
}
}
compare;
long priceCalculate(vector < int > a, long k) {
long price = 0;
sort(a.begin(), a.end(), compare);
while (--k >= 0) {
if (a[0] > 0) {
price = price + a[0];
a[0] = a[0] - 1;
sort(a.begin(), a.end(), compare);
}
}
return price;
}
But this is also giving timeout on large input length.

The sorting code has two performance problems:
You are resorting the vector<> in every iteration. Even if your sorting algorithm is insertion sort (which would be best in this case), it still needs to touch every position in the vector before it can declare the vector<> sorted.
To make matters worse, you are sorting the values you want to work with to the front of the vector, requiring the subsequent sort() call to shift almost all elements.
Consequently, you can achieve huge speedups by
Reversing the sort order, so that you are only interacting with the end of the vector<>.
Sort only once, then update the vector<> by scanning to the right position from the end, and inserting the new value there.
You can also take a closer look at what your algorithm is doing: It only ever operates on the tail of the vector<> which has constant value, removing entries from it, and reinserting them, decremented by one, in front of it. I think you should be able to significantly simplify your algorithm with that knowledge, leading to even more significant speedups. In the end, you can remove that tail from the vector<> entirely: It's completely described by its length and its value, and all its elements can be manipulated in a single operation. Your algorithm should take no time at all once you are through optimizing it...

For the vector solution you should be able to gain performance by avoiding sort inside the loop.
After
a[0] = a[0] - 1;
you can do something like the (pseudo) code below instead of calling sort:
tmp = 0;
for j = 1 to end-1
{
if a[0] < a[j]
++tmp
else
break
}
swap a[0], a[tmp]
to place the decremented value correctly in the sorted vector, i.e. since the vector is sorted from start, you'll only need to find the first element which is less or equal to the decremented value and swap the element just before with [0]. This should be faster than sort that has to go through the whole vector.
Examples of algorithm
// Vector after decremt
9, 10, 9, 5, 3, 2
^
tmp = 1
// Vector after swap
10, 9, 9, 5, 3, 2
// Vector after decremt
9, 10, 10, 5, 3, 2
^
tmp = 2
// Vector after swap
10, 10, 9, 5, 3, 2
Performance
I compared my approach with the vector example from OP:
k = 1000
vector.size = 10000000
vector filled with random numbers in range 0..9999
compiled with g++ -O3
My approach:
real 0.83
user 0.78
sys 0.05
OPs vector approach
real 119.42
user 119.42
sys 0.04

Related

Efficient algorithm to produce closest triplet from 3 arrays?

I need to implement an algorithm in C++ that, when given three arrays of unequal sizes, produces triplets a,b,c (one element contributed by each array) such that max(a,b,c) - min(a,b,c) is minimized. The algorithm should produce a list of these triplets, in order of size of max(a,b,c)-min(a,b,c). The arrays are sorted.
I've implemented the following algorithm (note that I now use arrays of type double), however it runs excruciatingly slow (even when compiled using GCC with -03 optimization, and other combinations of optimizations). The dataset (and, therefore, each array) has potentially tens of millions of elements. Is there a faster/more efficient method? A significant speed increase is necessary to accomplish the required task in a reasonable time frame.
void findClosest(vector<double> vec1, vector<double> vec2, vector<double> vec3){
//calculate size of each array
int len1 = vec1.size();
int len2 = vec2.size();
int len3 = vec3.size();
int i = 0; int j = 0; int k = 0; int res_i, res_j, res_k;
int diff = INT_MAX;
int iter = 0; int iter_bound = min(min(len1,len2),len3);
while(iter < iter_bound)
while(i < len1 && j < len2 && k < len3){
int minimum = min(min(vec1[i], vec2[j]), vec3[k]);
int maximum = max(max(vec1[i], vec2[j]), vec3[k]);
//if new difference less than previous difference, update difference, store
//resultants
if(fabs(maximum - minimum) < diff){ diff = maximum-minimum; res_i = i; res_j = j; res_k = k;}
//increment minimum value
if(vec1[i] == minimum) ++i;
else if(vec2[j] == minimum) ++j;
else ++k;
}
//"remove" triplet
vec1.erase(vec1.begin() + res_i);
vec2.erase(vec2.begin() + res_j);
vec3.erase(vec3.begin() + res_k);
--len1; --len2; --len3;
++iter_bound;
}
OK, you're going to need to be clever in a few ways to make this run well.
The first thing that you need is a priority queue, which is usually implemented with a heap. With that, the algorithm in pseudocode is:
Make a priority queue for possible triples in order of max - min, then how close median is to their average.
Make a pass through all 3 arrays, putting reasonable triples for every element into the priority queue
While the priority queue is not empty:
Pull a triple out
If all three of the triple are not used:
Add triple to output
Mark the triple used
else:
If you can construct reasonable triplets for unused elements:
Add them to the queue
Now for this operation to succeed, you need to efficiently find elements that are currently unused. Doing that at first is easy, just keep an array of bools where you mark off the indexes of the used values. But once a lot have been taken off, your search gets long.
The trick for that is to have a vector of bools for individual elements, a second for whether both in a pair have been used, a third for where all 4 in a quadruple have been used and so on. When you use an element just mark the individual bool, then go up the hierarchy, marking off the next level if the one you're paired with is marked off, else stopping. This additional data structure of size 2n will require an average of marking 2 bools per element used, but allows you to find the next unused index in either direction in at most O(log(n)) steps.
The resulting algorithm will be O(n log(n)).

Find out in linear time whether there is a pair in sorted vector that adds up to certain value

Given an std::vector of distinct elements sorted in ascending order, I want to develop an algorithm that determines whether there are two elements in the collection whose sum is a certain value, sum.
I've tried two different approaches with their respective trade-offs:
I can scan the whole vector and, for each element in the vector, apply binary search (std::lower_bound) on the vector for searching an element corresponding to the difference between sum and the current element. This is an O(n log n) time solution that requires no additional space.
I can traverse the whole vector and populate an std::unordered_set. Then, I scan the vector and, for each element, I look up in the std::unordered_set for the difference between sum and the current element. Since searching on a hash table runs in constant time on average, this solution runs in linear time. However, this solution requires additional linear space because of the std::unordered_set data structure.
Nevertheless, I'm looking for a solution that runs in linear time and requires no additional linear space. Any ideas? It seems that I'm forced to trade speed for space.
As the std::vector is already sorted and you can calculate the sum of a pair on the fly, you can achieve a linear time solution in the size of the vector with O(1) space.
The following is an STL-like implementation that requires no additional space and runs in linear time:
template<typename BidirIt, typename T>
bool has_pair_sum(BidirIt first, BidirIt last, T sum) {
if (first == last)
return false; // empty range
for (--last; first != last;) {
if ((*first + *last) == sum)
return true; // pair found
if ((*first + *last) > sum)
--last; // decrease pair sum
else // (*first + *last) < sum (trichotomy)
++first; // increase pair sum
}
return false;
}
The idea is to traverse the vector from both ends – front and back – in opposite directions at the same time and calculate the sum of the pair of elements while doing so.
At the very beginning, the pair consists of the elements with the lowest and the highest values, respectively. If the resulting sum is lower than sum, then advance first – the iterator pointing at the left end. Otherwise, move last – the iterator pointing at the right end – backward. This way, the resulting sum progressively approaches to sum. If both iterators end up pointing at the same element and no pair whose sum is equal to sum has been found, then there is no such a pair.
auto main() -> int {
std::vector<int> vec{1, 3, 4, 7, 11, 13, 17};
std::cout << has_pair_sum(vec.begin(), vec.end(), 2) << ' ';
std::cout << has_pair_sum(vec.begin(), vec.end(), 7) << ' ';
std::cout << has_pair_sum(vec.begin(), vec.end(), 19) << ' ';
std::cout << has_pair_sum(vec.begin(), vec.end(), 30) << '\n';
}
The output is:
0 1 0 1
Thanks to the generic nature of the function template has_pair_sum() and since it just requires bidirectional iterators, this solution works with std::list as well:
std::list<int> lst{1, 3, 4, 7, 11, 13, 17};
has_pair_sum(lst.begin(), lst.end(), 2);
I had the same idea as the one in the answer of 眠りネロク, but with a little bit more comprehensible implementation.
bool has_pair_sum(std::vector<int> v, int sum){
if(v.empty())
return false;
std::vector<int>::iterator p1 = v.begin();
std::vector<int>::iterator p2 = v.end(); // points to the End(Null-terminator), after the last element
p2--; // Now it points to the last element.
while(p1 != p2){
if(*p1 + *p2 == sum)
return true;
else if(*p1 + *p2 < sum){
p1++;
}else{
p2--;
}
}
return false;
}
well, since we are already given sorted array, we can do it with two pointer approach, we first keep a left pointer at start of the array and a right pointer at end of array, then in each iteration we check if sum of value of left pointer index and value of right pointer index is equal or not , if yes, return from here, otherwise we have to decide how to reduce the boundary, that is either increase left pointer or decrease right pointer, so we compare the temporary sum with given sum and if this temporary sum is greater than the given sum then we decide to reduce the right pointer, if we increase left pointer the temporary sum will remain same or only increase but never lesser, so we decide to reduce the right pointer so that temporary sum decrease and we reach near our given sum, similary if temporary sum is less than given sum, so no meaning of reducing the right pointer as temporary sum will either remain sum or decrease more but never increase so we increase our left pointer so our temporary sum increase and we reach near given sum, and we do the same process again and again unless we get the equal sum or left pointer index value becomes greater than right right pointer index or vice versa
below is the code for demonstration, let me know if something is not clear
bool pairSumExists(vector<int> &a, int &sum){
if(a.empty())
return false;
int len = a.size();
int left_pointer = 0 , right_pointer = len - 1;
while(left_pointer < right_pointer){
if(a[left_pointer] + a[right_pointer] == sum){
return true;
}
if(a[left_pointer] + a[right_pointer] > sum){
--right_pointer;
}
else
if(a[left_pointer] + a[right_poitner] < sum){
++left_pointer;
}
}
return false;
}

Random generation algorithm in C++

Suppose you need to generate a random permutation of the first N integers. For example, {4, 3, 1, 5, 2} and {3, 1, 4, 2, 5} are legal permutations, but {5, 4, 1, 2, 1} is not, because one number (1) is duplicated and another (3) is missing. This routine is often used in simulation of algorithms. We assume the existence of a random number generator, RandInt(i,j), that generates between i and j with equal probability. Here is the algorithm:
Fill the array A from A[0] to A[N-1] as follows: To fill A[i], generate random numbers until you get one that is not already in A[0], A[1],…, A[i-1].
Implement this algorithm in C++ and find the complexity. This is my code:
int a;
bool b = false;
A[0] = RandInt(1,n);
for (int i=1;i<n;i++) {
do {
b = false;
a = RandInt(1,n);
for (int j=0;j<i;j++)
if(A[j] == a)
b = true;
} while(b);
A[i] = a;
}
Is this code correct? And how can I find the complexity of the algorithm? Since, RandInt(i,j) generates random numbers, I don't know how many times the do while loop will be repeated.
This algorithm will produce correct results, selecting a permutation uniformly at random from all possible permutations.
The running time is not bounded above by any deterministic function since, as you point out, it could run literally forever. In the best case, this algorithm runs in O(n^2) and selects a random permutation without having to repeat any selection. On average, you'd expect to have to try n/n=1 time to get the first unique random, n/(n-1) times to get the second, and so on down to an expected value of n/1=n times to get the last one. Adding those together gives you n*H(n), where H(n) is the nth harmonic number. It turns out H(N) is Theta(log n) so this algorithm is O(n^2 log n) in the average case.
There is a better way to do what you're trying to do: you can start with any permutation and shuffle it into another one using an algorithm that is O(n) in the worst case. The algorithm is the Fisher-Yates algorithm and works as follows:
FisherYates(array[1...n])
1. if n == 1 then return
2. r = random(2, n)
3. temp = array[1]
4. array[1] = array[r]
5. array[r] = temp
6. FisherYates(array[2...n])
This is a recursive formulation but an iterative one is straightforward. It calls random exactly n times, where n is the size of the array at the topmost invocation.

How to increase value of a k consecutive elements in an vector in c++?

Suppose we have an vector in c++ of size 8 with elements {0, 1, 1, 0, 0, 0, 1, 1} and i want to increase the size of a specific portion of vector by one, for example, lets say the portion of vector which needs to be increase by 1 is 0 to 5, then our final result is {1, 2, 2, 1, 1, 0, 0, 1, 1}.
Is it possible to do this in constant time using standard method of vectors (like we a memset in c), without running any loop?
No... and by the way with memset you don't have a guaranteed constant-time operation either (in most implementation is just very fast but still linear in the number of elements).
If you need to do this kind of operation (addition/subtraction of a constant over a range) on a very huge vector a lot of times and you need to get the final result then you can get O(1) per update using a different algorithm:
Step 1: convert the data to its "derivative"
This mean replacing each element with the difference from previous one.
// O(n) on the size of the vector, but done only once
for (int n=v.size()-1; i>0; i--) {
v[i] -= v[i-1];
}
Step 2: do all the interval operations (each in constant time)
With this representation adding a constant to a range simply means adding it to the first element and subtracting it from the element past the ending one. In code:
// intervals contains structures with start/stop/value fields
// Operation is O(n) on the **number of intervals**, and does
// not depend on the size of them
for (auto r : intervals) {
v[r.start] += r.value;
v[r.stop+1] -= r.value;
}
Step 3: Collect the results
Finally you just need to un-do the initial processing, getting back to the normal values on each cell by integrating. In code:
// O(n) on the size of vector, but done only once
for (int i=1,n=v.size(); i<n; i++) {
v[i] += v[i-1];
}
Note that both step 1 and 3 (derivation and integration) can be done in parallel on N cores with perfect efficiency if the size is large enough, even if how this is possible may be not obvious at a first sight (it wasn't for me, at least).

How to get intersection of two Arrays

I have two integer arrays
int A[] = {2, 4, 3, 5, 6, 7};
int B[] = {9, 2, 7, 6};
And i have to get intersection of these array.
i.e. output will be - 2,6,7
I am thinking to sove it by saving array A in a data strcture and then i want to compare all the element till size A or B and then i will get intersection.
Now i have a problem i need to first store the element of Array A in a container.
shall i follow like -
int size = sizeof(A)/sizeof(int);
To get the size but by doing this i will get size after that i want to access all the elemts too and store in a container.
Here i the code which i am using to find Intersection ->
#include"iostream"
using namespace std;
int A[] = {2, 4, 3, 5, 6, 7};
int B[] = {9, 2, 7, 6};
int main()
{
int sizeA = sizeof(A)/sizeof(int);
int sizeB = sizeof(B)/sizeof(int);
int big = (sizeA > sizeB) ? sizeA : sizeB;
int small = (sizeA > sizeB) ? sizeB : sizeA;
for (int i = 0; i <big ;++i)
{
for (int j = 0; j <small ; ++j)
{
if(A[i] == B[j])
{
cout<<"Element is -->"<<A[i]<<endl;
}
}
}
return 0;
}
Just use a hash table:
#include <unordered_set> // needs C++11 or TR1
// ...
unordered_set<int> setOfA(A, A + sizeA);
Then you can just check for every element in B, whether it's also in A:
for (int i = 0; i < sizeB; ++i) {
if (setOfA.find(B[i]) != setOfA.end()) {
cout << B[i] << endl;
}
}
Runtime is expected O(sizeA + sizeB).
You can sort the two arrays
sort(A, A+sizeA);
sort(B, B+sizeB);
and use a merge-like algorithm to find their intersection:
#include <vector>
...
std::vector<int> intersection;
int idA=0, idB=0;
while(idA < sizeA && idB < sizeB) {
if (A[idA] < B[idB]) idA ++;
else if (B[idB] < A[idA]) idB ++;
else { // => A[idA] = B[idB], we have a common element
intersection.push_back(A[idA]);
idA ++;
idB ++;
}
}
The time complexity of this part of the code is linear. However, due to the sorting of the arrays, the overall complexity becomes O(n * log n), where n = max(sizeA, sizeB).
The additional memory required for this algorithm is optimal (equal to the size of the intersection).
saving array A in a data strcture
Arrays are data structures; there's no need to save A into one.
i want to compare all the element till size A or B and then i will get intersection
This is extremely vague but isn't likely to yield the intersection; notice that you must examine every element in both A and B but "till size A or B" will ignore elements.
What approach i should follow to get size of an unkown size array and store it in a container??
It isn't possible to deal with arrays of unknown size in C unless they have some end-of-array sentinel that allows counting the number of elements (as is the case with NUL-terminated character arrays, commonly referred to in C as "strings"). However, the sizes of your arrays are known because their compile-time sizes are known. You can calculate the number of elements in such arrays with a macro:
#define ARRAY_ELEMENT_COUNT(a) (sizeof(a)/sizeof *(a))
...
int *ptr = new sizeof(A);
[Your question was originally tagged [C], and my comments below refer to that]
This isn't valid C -- new is a C++ keyword.
If you wanted to make copies of your arrays, you could simply do it with, e.g.,
int Acopy[ARRAY_ELEMENT_COUNT(A)];
memcpy(Acopy, A, sizeof A);
or, if for some reason you want to put the copy on the heap,
int* pa = malloc(sizeof A);
if (!pa) /* handle out-of-memory */
memcpy(pa, A, sizeof A);
/* After you're done using pa: */
free(pa);
[In C++ you would used new and delete]
However, there's no need to make copies of your arrays in order to find the intersection, unless you need to sort them (see below) but also need to preserve the original order.
There are a few ways to find the intersection of two arrays. If the values fall within the range of 0-63, you can use two unsigned longs and set the bits corresponding to the values in each array, then use & (bitwise "and") to find the intersection. If the values aren't in that range but the difference between the largest and smallest is < 64, you can use the same method but subtract the smallest value from each value to get the bit number. If the range is not that small but the number of distinct values is <= 64, you can maintain a lookup table (array, binary tree, hash table, etc.) that maps the values to bit numbers and a 64-element array that maps bit numbers back to values.
If your arrays may contain more than 64 distinct values, there are two effective approaches:
1) Sort each array and then compare them element by element to find the common values -- this algorithm resembles a merge sort.
2) Insert the elements of one array into a fast lookup table (hash table, balanced binary tree, etc.), and then look up each element of the other array in the lookup table.
Sort both arrays (e.g., qsort()) and then walk through both arrays one element at a time.
Where there is a match, add it to a third array, which is sized to match the larger of the two input arrays (your result array can be no larger than the largest of the two arrays). Use a negative or other "dummy" value as your terminator.
When walking through input arrays, where one value in the first array is larger than the other, move the index of the second array, and vice versa.
When you're done walking through both arrays, your third array has your answer, up to the terminator value.