I encountered a problem that requires the program to count the number of points within an interval. This problem provides a large amount of unsorted points, and lo,hi(restriction lo<=hi), and it aims to enumerate the points within [lo,hi]. The problem is that although my code is correct, it is too time-consuming to finish within given time (2200ms). My code can finish this mission in O(n). I would like to ask if there are any faster methods.
int n,m,c,lo,hi;
cin>>n>>m;
int arr[n];
for(int i=0;i<n;i++){
cin>>arr[i];
}
cin>>lo>>hi;
c=0;
for(int j=0;j<n;j++){
if(arr[j]<=hi&&lo<=arr[j])c++;
}
cout<<c<<endl;
It is impossible to solve this problem in less than O(n) time, because you must consider all inputs at least once.
However, you might be able to reduce the constant factor of n — have you consider storing a set of (start, end) intervals, rather than a simple array? What is the input size which causes this to be slow?
Edit: upon further testing, it seems the bottleneck is actually the use of cin to read numbers.
Try replacing every instance of cin >> x; with scanf("%d", &x); — for me, this brings the runtime down to about 0.08 seconds.
You can do it faster than O(N) only if you need to do lookups more than once on the same data set:
Sort the array or its copy. For lookup you can use binary search - which is O(log2 N) complex.
Instead of flat array to use something like binary tree, lookup complexity will be as in #1.
Related
I'm getting memory limit exceeded error for this code. I can't find the way to resolve it. If I'm taking a long long int it gives the same error.
Why this error happening?
#include<bits/stdc++.h>
#define ll long long int
using namespace std;
int main()
{
///1000000000000 500000000001 getting memory limit exceed for this test case.
ll n,k;
cin>>n>>k;
vector<ll> v;
vector<ll> arrange;
for(ll i=0;i<n;i++)
{
v.push_back(i+1);
}
//Arranging vector like 1,3,5,...2,4,6,....
for(ll i=0;i<v.size();i++)
{
if(v[i]%2!=0)
{
arrange.push_back(v[i]);
}
}
for(ll i=0;i<v.size();i++)
{
if(v[i]%2==0)
{
arrange.push_back(v[i]);
}
}
cout<<arrange[k-1]<<endl; // Found the kth number.
return 0;
}
The provided code solves a coding problem for small values of n and k. However as you noticed it does fail for large values of n. This is because you are trying to allocate a couple of vectors of 1000000000000 elements, which exceeds the amount of memory available in today's computers.
Hence I'd suggest to return to the original problem you're solving, and try an approach that doesn't need to store all the intermediary values in memory. Since the given code works for small values of n and k, you can use the given code to check whether the approach without using vectors works.
I would suggest the following steps to redesign the approach to the coding problem:
Write down the contents of arrange for a small value of n
Write down the matching values of k for each element of arrange
Derive the (mathematical) function that maps k to the matching element in arrange
For this problem this can be done in constant time, so there is no need for loops
Note that this function should work both for even and odd values of k.
Test whether your code works by comparing it with your current results.
I would suggest trying the preceding steps first to come up with your own approach. If you can not find a working solution, please have a look at this approach on Wandbox.
Assume long long int is a 8 byte type. This is a commonly valid assumptions.
For every entry in the array, you are requesting to allocate 8 byte.
If you request to allocate 1000000000000 items, your are requesting to allocate 8 Terabyte of memory.
Moreover you are using two arrays and you are requesting to allocate more than 8 Terabyte.
Just use a lower number of items for your arrays and it will work.
currently i am doing problem on data structure and i have a question in which i have to find kth largest element in an array. the actual problem is here:
https://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array/.
i did this question in two different way using heap and second is using map.
my solution using map.
int t;
cin>>t;
while(--t>=0){
int n,k;
cin>>n;
vector<int> A(n);
for(int i=0;i<n;i++){
cin>>A[i];
}
cin>>k;
map<int,int> m;
for(int i=0;i<n;i++){
m[A[i]]++;
}
auto it=m.begin();
for(int i=1;i<=k-1;i++){
it++;
}
cout<<it->first<<endl;
but my map solution is giving Time Limit Exceeded. according to me map solution also has a time complexity of (n+klog(n)), same as heap solution.
so why is map solution giving TLE?
The time complexity for your solution using maps would be O(k + nlog(n)). Each insertion into std::map takes log(n) time and you are performing 'n' insertions. Time complexity for insertion alone would take O(nlog(n)) time.
See http://www.cplusplus.com/reference/map/map/insert/ for more information about inserting elements into std::map.
Without seeing your heap version I would guess the map's allocation is the cause of the trouble. Each node needs an allocation, which implies a lock and some additional management.
And you have to follow the pointers internally in the map data structure, which is usually not the case in heaps.
In big O notation this doesn't add anything to the time, but in practice each can slow down the program by a large factor.
A simple improvement to the algorithm is that there is a "trick":
* If you only need to print "k" elements, then you only need to remember the "k" biggest elements. That way the insert operations never exceed O(log k) rather than O(log n). And k is presumably much less than n...
So instead of inserting everything into a map that gets bigger and bigger (and slower and slower - you mention a time-restriction). Change the code to remove the smallest element in the map once (map.size() >= k):
map<int,int> m;
for(int i = 0; i < n ; i++) {
m[A[i]]++;
if(m.size() > k) { // <<<<<<<<
m.erase(m.begin()) // <<<<<<<<
}
}
I am trying to write a function in C++ using MPFR to calculate multiple values. I am currently using an mpfr array to store those values. It is unknown how many values need to be calculated and stored each time. Here is the function:
void Calculator(mpfr_t x, int v, mpfr_t *Values, int numOfTerms, int mpfr_bits) {
for (int i = 0; i < numOfTerms; i++) {
mpfr_init2(Values[i], mpfr_bits);
mpfr_set(Values[i], x, GMP_RNDN);
mpfr_div_si(Values[i], Values[i], pow(-1,i+1)*(i+1)*pow(v,i+1), GMP_RNDN);
}
}
The program itself has a while loop that has a nested for loop that takes these values and does calculations with them. In this way, I don't have to recalculate these values each time within the for loop. When the for loop is finished, I clear the memory with
delete[] Values;
before the the while loops starts again in which case, it redeclares the array with
mpfr_t *Values;
Values = new mpfr_t[numOfTerms];
The number of values that need to be stored are calculated by a different function and is told to the function through the variable numOfTerms. The problem is that for some reason, the array slows down the program tremendously. I am working with very large numbers so the thought is that if I recalculate those values each time, it gets extremely expensive but this method is significantly slower than just recalculating the values in each iteration of the for loop. Is there an alternative method to this?
EDIT** Instead of redeclaring the array over each time, I moved the declaration and the delete[] Values outside of the while loop. Now I am just clearing each element of the array with
for (int i = 0; i < numOfTerms; i++) {
mpfr_clear(Values[i]);
}
inside of the while loop before the while loop starts over. The program has gotten noticeably faster but is still much slower than just calculating each value over.
If I understand correctly, you are doing inside a while loop: mpfr_init2 (at the beginning of the iteration) and mpfr_clear (at the end of the iteration) on numOfTerms MPFR numbers, and the value of numOfTerms depends on the iteration. And this is what takes most of the time.
To avoid these many memory allocations by mpfr_init2 and deallocations by mpfr_clear, I suggest that you declare the array outside the while loop and initially call the mpfr_init2 outside the while loop. The length of the array (i.e. the number of terms) should be what you think is the maximum number of terms. What can happen is that for some iterations, the chosen number of terms was too small. In such a case, you need to increase the length of the array (this will need a reallocation) and call mpfr_init2 on the new elements. This will be the new length of the array for the remaining iterations, until the array needs to be enlarged again. After the while loop, do the mpfr_clear's.
When you need to enlarge the array, have a good strategy to choose the new number of elements. Just taking the needed value of numOfTerms for the current iteration may not be a good one, since it may yield many reallocations. For instance, make sure that you have at least a N% increase. Do some tests to choose the best value for N... See Dynamic array for instance. In particular, you may want to use the C++ implementation of dynamic arrays, as mentioned on this Wikipedia article.
What would be the efficieny of the following program, it is a for loop which runs for a finite no. of times.
for(int i = 0; i < 10; i++ )
{
//do something here, no more loops though.
}
So, what should be the efficiecy. O(1) or O(n) ?
That entirely depends on what is in the for loop. Also, computational complexity is normally measured in terms of the size n of the input, and I can't see anything in your example that models or represents or encodes directly or indirectly the size of the input. There is just the constant 10.
Besides, although sometimes the analysis of computational complexity may give unexpected, surprising results, the correct term is not "Big Oh", but rather Big-O.
You can only talk about the complexity with respect to some specific input to the calculation. If you are looping ten times because there are ten "somethings" that you need to do work for, then your complexity is O(N) with respect to those somethings. If you just need to loop 10 times regardless of the number of somethings - and the processing time inside the loop doesn't change with the number of somethings - then your complexity with respect to them is O(1). If there's no "something" for which the order is greater than 1, then it's fair to describe the loop as O(1).
bit of further rambling discussion...
O(N) indicates the time taken for the work to complete can be reasonably approximated by some constant amount of time plus some function of N - the number of somethings in the input - for huge values of N:
O(N) indicates the time is c + xN, where c is a fixed overhead and x is the per-something processing time,
O(log2N) indicates time is c + x(log2N),
O(N2) indicates time is c + x(N2),
O(N!) indicates time is c + x(N!)
O(NN) indicates time is c + x(NN)
etc..
Again, in your example there's no mention of the number of inputs, and the loop iterations is fixed. I can see how it's tempting to say it's O(1) even if there are 10 input "somethings", but consider: if you have a function capable of processing an arbitrary number of inputs, then decide you'll only use it in your application with exactly 10 inputs and hard-code that, you clearly haven't changed the performance characteristics of the function - you've just locked in a single point on the time-for-N-input curve - and any big-O complexity that was valid before the hardcoding must still be valid afterwards. It's less meaningful and useful though as N of 10 is a small amount and unless you've got an horrific big-O complexity like O(NN) the constants c and x take on a lot more importance in describing the overall performance than they would for huge values of N (where changes in the big-O notation generally have much more impact on performance than changing c or even x - which is of course the whole point of having big-O analysis).
Sure O(1), because here nothing does not depend linearly of n.
EDIT:
Let the loop body to contain some complex action with complexity O(P(n)) in Big O terms.
If we have a constant C number of iterations, the complexity of loop will be O(C * P(n)) = O(P(n)).
Else, now let the number of iterations to be Q(n), depends of n. It makes the complexity of loop O(Q(n) * P(n)).
I'm just trying to say that when the number of iterations is constant, it does not change the complexity of the whole loop.
n in Big O notation denotes the input size. We can't tell what is the complexity, because we don't know what is happening inside the for loop. For example, maybe there are recursive calls, depending on the input size? In this example overall is O(n):
void f(int n) // input size = n
{
for (int i = 0; i < 10; i++ )
{
//do something here, no more loops though.
g(n); // O(n)
}
}
void g(int n)
{
if (n > 0)
{
g(n - 1);
}
}
I'm intersecting some sets of numbers, and doing this by storing a count of each time I see a number in a map.
I'm finding the performance be very slow.
Details:
- One of the sets has 150,000 numbers in it
- The intersection of that set and another set takes about 300ms the first time, and about 5000ms the second time
- I haven't done any profiling yet, but every time I break the debugger while doing the intersection its in malloc.c!
So, how can I improve this performance? Switch to a different data structure? Some how improve the memory allocation performance of map?
Update:
Is there any way to ask std::map or
boost::unordered_map to pre-allocate
some space?
Or, are there any tips for using these efficiently?
Update2:
See Fast C++ container like the C# HashSet<T> and Dictionary<K,V>?
Update3:
I benchmarked set_intersection and got horrible results:
(set_intersection) Found 313 values in the intersection, in 11345ms
(set_intersection) Found 309 values in the intersection, in 12332ms
Code:
int runIntersectionTestAlgo()
{
set<int> set1;
set<int> set2;
set<int> intersection;
// Create 100,000 values for set1
for ( int i = 0; i < 100000; i++ )
{
int value = 1000000000 + i;
set1.insert(value);
}
// Create 1,000 values for set2
for ( int i = 0; i < 1000; i++ )
{
int random = rand() % 200000 + 1;
random *= 10;
int value = 1000000000 + random;
set2.insert(value);
}
set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));
return intersection.size();
}
You should definitely be using preallocated vectors which are way faster. The problem with doing set intersection with stl sets is that each time you move to the next element you're chasing a dynamically allocated pointer, which could easily not be in your CPU caches. With a vector the next element will often be in your cache because it's physically close to the previous element.
The trick with vectors, is that if you don't preallocate the memory for a task like this, it'll perform EVEN WORSE because it'll go on reallocating memory as it resizes itself during your initialization step.
Try something like this instaed - it'll be WAY faster.
int runIntersectionTestAlgo() {
vector<char> vector1; vector1.reserve(100000);
vector<char> vector2; vector2.reserve(1000);
// Create 100,000 values for set1
for ( int i = 0; i < 100000; i++ ) {
int value = 1000000000 + i;
set1.push_back(value);
}
sort(vector1.begin(), vector1.end());
// Create 1,000 values for set2
for ( int i = 0; i < 1000; i++ ) {
int random = rand() % 200000 + 1;
random *= 10;
int value = 1000000000 + random;
set2.push_back(value);
}
sort(vector2.begin(), vector2.end());
// Reserve at most 1,000 spots for the intersection
vector<char> intersection; intersection.reserve(min(vector1.size(),vector2.size()));
set_intersection(vector1.begin(), vector1.end(),vector2.begin(), vector2.end(),back_inserter(intersection));
return intersection.size();
}
Without knowing any more about your problem, "check with a good profiler" is the best general advise I can give. Beyond that...
If memory allocation is your problem, switch to some sort of pooled allocator that reduces calls to malloc. Boost has a number of custom allocators that should be compatible with std::allocator<T>. In fact, you may even try this before profiling, if you've already noticed debug-break samples always ending up in malloc.
If your number-space is known to be dense, you can switch to using a vector- or bitset-based implementation, using your numbers as indexes in the vector.
If your number-space is mostly sparse but has some natural clustering (this is a big if), you may switch to a map-of-vectors. Use higher-order bits for map indexing, and lower-order bits for vector indexing. This is functionally very similar to simply using a pooled allocator, but it is likely to give you better caching behavior. This makes sense, since you are providing more information to the machine (clustering is explicit and cache-friendly, rather than a random distribution you'd expect from pool allocation).
I would second the suggestion to sort them. There are already STL set algorithms that operate on sorted ranges (like set_intersection, set_union, etc):
set_intersection
I don't understand why you have to use a map to do intersection. Like people have said, you could put the sets in std::set's, and then use std::set_intersection().
Or you can put them into hash_set's. But then you would have to implement intersection manually: technically you only need to put one of the sets into a hash_set, and then loop through the other one, and test if each element is contained in the hash_set.
Intersection with maps are slow, try a hash_map. (however, this is not provided in all STL implementation.
Alternatively, sort both map and do it in a merge-sort-like way.
What is your intersection algorithm? Maybe there are some improvements to be made?
Here is an alternate method
I do not know it to be faster or slower, but it could be something to try. Before doing so, I also recommend using a profiler to ensure you really are working on the hotspot. Change the sets of numbers you are intersecting to use std::set<int> instead. Then iterate through the smallest one looking at each value you find. For each value in the smallest set, use the find method to see if the number is present in each of the other sets (for performance, search from smallest to largest).
This is optimised in the case that the number is not found in all of the sets, so if the intersection is relatively small, it may be fast.
Then, store the intersection in std::vector<int> instead - insertion using push_back is also very fast.
Here is another alternate method
Change the sets of numbers to std::vector<int> and use std::sort to sort from smallest to largest. Then use std::binary_search to find the values, using roughly the same method as above. This may be faster than searching a std::set since the array is more tightly packed in memory. Actually, never mind that, you can then just iterate through the values in lock-step, looking at the ones with the same value. Increment only the iterators which are less than the minimum value you saw at the previous step (if the values were different).
Might be your algorithm. As I understand it, you are spinning over each set (which I'm hoping is a standard set), and throwing them into yet another map. This is doing a lot of work you don't need to do, since the keys of a standard set are in sorted order already. Instead, take a "merge-sort" like approach. Spin over each iter, dereferencing to find the min. Count the number that have that min, and increment those. If the count was N, add it to the intersection. Repeat until the first map hits it's end (If you compare the sizes before starting, you won't have to check every map's end each time).
Responding to update: There do exist faculties to speed up memory allocation by pre-reserving space, like boost::pool_alloc. Something like:
std::map<int, int, std::less<int>, boost::pool_allocator< std::pair<int const, int> > > m;
But honestly, malloc is pretty good at what it does; I'd profile before doing anything too extreme.
Look at your algorithms, then choose the proper data type. If you're going to have set-like behaviour, and want to do intersections and the like, std::set is the container to use.
Since it's elements are stored in a sorted way, insertion may cost you O(log N), but intersection with another (sorted!) std::set can be done in linear time.
I figured something out: if I attach the debugger to either RELEASE or DEBUG builds (e.g. hit F5 in the IDE), then I get horrible times.