I was trying this problem on spoj.
First of all I came up with a sort of trivial o(blogb) algorithm(refer the problem for whats b).But since the author of the problem mentioned the constraints as b belongs to [0,10^7] i was not convinced if it would pass.Anyways out of shear belief I coded it as follows
#include<stdio.h>
#include<iostream>
#include<algorithm>
#include<cmath>
#include<vector>
#include<cstdlib>
#include<stack>
#include<queue>
#include<string>
#include<cstring>
#define PR(x) cout<<#x"="<<x<<endl
#define READ2(x,y) scanf("%d %d",&x,&y)
#define REP(i,a) for(long long i=0;i<a;i++)
#define READ(x) scanf("%d",&x)
#define PRARR(x,n) for(long long i=0;i<n;i++)printf(#x"[%d]=\t%d\n",i,x[i])
using namespace std;
#include <stdio.h>
struct node {
int val;
int idx;
};
bool operator<(node a,node b){ return a.val<b.val;}
node contain[10000001];
int main(){
int mx=1,count=1,t,n;
scanf("%d",&t);
while(t--){
count=1;mx=1;
scanf("%d",&n);
for(int i=0;i<n;i++){
scanf("%d",&contain[i].val);
contain[i].idx=i;
}
sort(contain,contain+n);
for(int j=1;j<n;j++){
if(contain[j].idx>contain[j-1].idx)
count++;
else count=1;
mx=max(count,mx);
}
printf("%d\n",n-mx);
}
}
And it passed in 0.01 s on SPOJ server(which is known to be slow)
But I soon came up with an O(b) algorithm,code given below
#include<stdio.h>
#include<iostream>
#include<algorithm>
#include<cmath>
#include<vector>
#include<cstdlib>
#include<stack>
#include<queue>
#include<string>
#include<cstring>
#define PR(x) printf(#x"=%d\n",x)
#define READ2(x,y) scanf("%d %d",&x,&y)
#define REP(i,a) for(int i=0;i<a;i++)
#define READ(x) scanf("%d",&x)
#define PRARR(x,n) for(int i=0;i<n;i++)printf(#x"[%d]=\t%d\n",i,x[i])
using namespace std;
int val[1001];
int arr[1001];
int main() {
int t;
int n;
scanf("%d",&t);
while(t--){
scanf("%d",&n);
int mn=2<<29,count=1,mx=1;
for(int i=0;i<n;i++){
scanf("%d",&arr[i]);
if(arr[i]<mn) { mn=arr[i];}
}
for(int i=0;i<n;i++){
val[arr[i]-mn]=i;
}
for(int i=1;i<n;i++){
if(val[i]>val[i-1]) count++;
else {
count=1;
}
if(mx<count) mx=count;
}
printf("%d\n",n-mx);
}
}
But surprisingly it took 0.14s :O
Now my question is isn't o(b) better than o(blogb) for b > 2 ? Then why so much difference in time? One of the members from the community suggested that it may be due to cache miss.The o(b) code is less localized as compared to o(blogb).But I dont see that causing a difference of 0.10s that too for <1000 runs of the code? (Yes b is actually less than 1000.Dont know why problem setter exaggerated so much)
EDIT : I see all answers are going towards the hidden constant values in asymptotic notations that often cause disparity in the running times of algorithms.But if you look at the codes you will realize all I am doing is replacing the call to sort by another traversal of the loop.Now I am assuming sort accesses each element of the array atleast once .Wouldn't that make both programs even closer if we think in number of lines that get executed?Beside yes my past experiences with spoj tells me I/O makes drastic impact on the running time of the program but I am using the same I/O routines in both the codes.
Big O notation describes how long the function takes as the input set approaches infinite size. If you have large enough data sets, O(n) will always beat O(n log n).
In practice, some 'poorer-performing' algorithms are faster because of the other hidden variables in the big O formula. Some more scalable algorithms can be slower. The difference becomes more arbitrary as the input set becomes smaller.
I learned all this the hard way, when I spent hours implementing a scalable solution, and when testing, found that it would only be faster for large data sets.
Edit:
Regarding the specific case, some people mentioned that the same line of code can vary extremely with regards to performance. This is likely the case here. That means that the 'hidden variables' in the big O formula are very relevant. The better you understand how a computer works on the inside, the more optimization techniques you have up your sleeve.
If you only remember one thing, remember this. Never compare two algorithms' performance by just reading the code. If it's that important, time an actual implementation on realistic data sets.
I/O operations (scanf(), printf()) are biasing the result.
those operations are notoriously slow and show a great discrepency when timing them. you shall never measure the performance of code using including any i/o operations, unless those operations are what you are trying to measure.
so, remove those calls and try again.
i will also point out that 0.1s is very small. the 0.1s difference may refer to the time it takes for loading the executable and preparing the code for execution.
Big-O notation isn't a formula that you can plug arbitrary values of n into. It merely describes the growth of the function as n heads to infinity.
This is a more interesting question than one might suspect. The O() concept can be useful, but it is not always as useful as some think. This is particularly true for logarithmic orders. Algebraically, a logarithm really has an order of zero, which is to say that log(n)/n^epsilon converges for any positive epsilon.
More often than we like to think, the log factors in order calculations don't really matter.
However, Kendall Frey is right. For sufficiently large data sets, O(n*log(n)) will eventually lose. It's only that the data set may have to be very large for the logarithmic difference to show.
I looked at your solutions in SPOj. I noticed that your O(nlogn) solution takes 79M memory while O(n) takes very small amount of memory which it shows as 0K. I looked at the other solutions too. Most of the fastest solutions I looked at used a large amount of memory. Now the most obvious reason I can think of is the implementation of std::sort() function. Its very nicely implemented which makes your solution amazingly fast. And for the O(n) solution i think it may be slow because of if() {...} else {...}. Try changing it to ternary operators and let us know if it makes any difference.
Hope it helps !!
Related
I'm getting memory limit exceeded error for this code. I can't find the way to resolve it. If I'm taking a long long int it gives the same error.
Why this error happening?
#include<bits/stdc++.h>
#define ll long long int
using namespace std;
int main()
{
///1000000000000 500000000001 getting memory limit exceed for this test case.
ll n,k;
cin>>n>>k;
vector<ll> v;
vector<ll> arrange;
for(ll i=0;i<n;i++)
{
v.push_back(i+1);
}
//Arranging vector like 1,3,5,...2,4,6,....
for(ll i=0;i<v.size();i++)
{
if(v[i]%2!=0)
{
arrange.push_back(v[i]);
}
}
for(ll i=0;i<v.size();i++)
{
if(v[i]%2==0)
{
arrange.push_back(v[i]);
}
}
cout<<arrange[k-1]<<endl; // Found the kth number.
return 0;
}
The provided code solves a coding problem for small values of n and k. However as you noticed it does fail for large values of n. This is because you are trying to allocate a couple of vectors of 1000000000000 elements, which exceeds the amount of memory available in today's computers.
Hence I'd suggest to return to the original problem you're solving, and try an approach that doesn't need to store all the intermediary values in memory. Since the given code works for small values of n and k, you can use the given code to check whether the approach without using vectors works.
I would suggest the following steps to redesign the approach to the coding problem:
Write down the contents of arrange for a small value of n
Write down the matching values of k for each element of arrange
Derive the (mathematical) function that maps k to the matching element in arrange
For this problem this can be done in constant time, so there is no need for loops
Note that this function should work both for even and odd values of k.
Test whether your code works by comparing it with your current results.
I would suggest trying the preceding steps first to come up with your own approach. If you can not find a working solution, please have a look at this approach on Wandbox.
Assume long long int is a 8 byte type. This is a commonly valid assumptions.
For every entry in the array, you are requesting to allocate 8 byte.
If you request to allocate 1000000000000 items, your are requesting to allocate 8 Terabyte of memory.
Moreover you are using two arrays and you are requesting to allocate more than 8 Terabyte.
Just use a lower number of items for your arrays and it will work.
What is the time complexity of inserting string to set container of c++ STL?
According to me, it should be O(xlogn), where x is length of string to be inserted and n is size of set. Also copying of string to set should be of linear in length of string.
But this code of mine is running instantly.
#include<bits/stdc++.h>
using namespace std;
int main(){
set<string> c;
string s(100000,'a');
for(int i=0;i<100000;i++){
c.insert(s);
}
}
Where am I wrong, shouldn't the complexity be order of 10^10?
You should use the set in some way to reduce the risk of the loop getting optimized away, for example by adding return c.size();.
Also your choice of the number of iterations might be too low. Add a digit to the loop counter and you will see a noticeable run time.
A modern CPU can easily process >2*109 ops/s. Assuming your compiler uses memcmp, which is probably hand-vectorized, with a small working set such as yours you're working entirely from the cache and can reach throughput of up to 512 bytes per comparison (with AVX2). Assuming a moderate rate of 10 cycles per iteration, we can still compare >1010 bytes/s. So your program should run in <1 s on moderate hardware.
Try this updated code instead:
#include <string>
#include <set>
using namespace std;
int main(){
set<string> c;
string s(100000,'a');
for(int i=0;i<1000000;i++) { // Add a digit here
c.insert(s);
}
return c.size(); // use something from the set
}
With optimization on (-O3) this takes ~5 seconds to run on my system.
In other words yes, inserting into a binary tree has O(log n) complexity, but comparing a string has O(n) complexity. These n's aren't the same, in the case of a map it represents the map size, and in case of string - the length of the string.
In your particular case the map has just one element, so insertion is O(1). You get linear complexity O(n) purely from string comparisons, where n is string_length * number_of_iterations.
I encountered a problem that requires the program to count the number of points within an interval. This problem provides a large amount of unsorted points, and lo,hi(restriction lo<=hi), and it aims to enumerate the points within [lo,hi]. The problem is that although my code is correct, it is too time-consuming to finish within given time (2200ms). My code can finish this mission in O(n). I would like to ask if there are any faster methods.
int n,m,c,lo,hi;
cin>>n>>m;
int arr[n];
for(int i=0;i<n;i++){
cin>>arr[i];
}
cin>>lo>>hi;
c=0;
for(int j=0;j<n;j++){
if(arr[j]<=hi&&lo<=arr[j])c++;
}
cout<<c<<endl;
It is impossible to solve this problem in less than O(n) time, because you must consider all inputs at least once.
However, you might be able to reduce the constant factor of n — have you consider storing a set of (start, end) intervals, rather than a simple array? What is the input size which causes this to be slow?
Edit: upon further testing, it seems the bottleneck is actually the use of cin to read numbers.
Try replacing every instance of cin >> x; with scanf("%d", &x); — for me, this brings the runtime down to about 0.08 seconds.
You can do it faster than O(N) only if you need to do lookups more than once on the same data set:
Sort the array or its copy. For lookup you can use binary search - which is O(log2 N) complex.
Instead of flat array to use something like binary tree, lookup complexity will be as in #1.
I was wandering if there is any STL algorithm which produces the same result of the following code:
std::vector<int> data;
std::vector<int> counter(N); //I know in advance that all values in data
//are between 0 and N-1
for(int i=0; i<data.size(); ++i)
counter[data[i]]++;
This code simply outputs the histogram of my integer data, with pre-defined bin size equal to one.
I know that I should avoid loops as much as I could, as the equivalents with STL algorithms are much better optimized than what the majority of C++ programmer may come up with.
Any suggestions?
Thank you in advance, Giuseppe
Well, you can certainly at least clean up the loop a bit:
for (auto i : data)
++count[i];
You could (for example) use std::for_each instead:
std::for_each(data.begin(), data.end(), [&count](int i) { ++count[i]; });
...but that doesn't really look like much (if any) of an improvement to me.
I don't think there's a more efficient way of doing this. You're right about avoiding loops and preferring STL in most cases, but this only applies to bigger, and overly-complicated loops which are harder to write and maintain, therefore likely to be not optimal.
Looking at the problem at an assembly level, the only way to compute this problem is exactly the way you have it in your example. Since C/C++ loops translate to assembly very efficiently with zero unnecessary overhead, this leaves me believing that no STL function could preform this faster than your algorithm.
There is one STL function called count, but the complexity of it is linear ( O(n) ), and so as your solution's.
If you really want to squeeze out the maximum of every CPU-cycle, then consider using C-style arrays, and a separate counter variable. The overhead introduced by vectors is barely even measurable, but if any, that's the only opportunity I see for optimization here. Not that I would suggest it, but I'm afraid that's the only way you can get a hair more speed out of this.
If you think about it, in order to count the occurrences of elements in a vector, each element would have to be "visited" at least once, there's no avoiding it.
A simple loop like this is already the most efficient. You can try to unroll it, but that's probably the best you can do. STL or not, I doubt if there's a better algorithm.
You can use for_each and one lambda function. Check this example:
#include <algorithm>
#include <vector>
#include <ctime>
#include <iostream>
const int N = 10;
using namespace std;
int main()
{
srand(time(0));
std::vector<int> counter(N);
std::vector<int> data(N);
generate(data.begin(),data.end(),[]{return rand()%N;});
for (int i = 0;i<N;i++)
cout<<data[i]<<endl;
cout<<endl;
for_each(data.begin(),data.end(),[&counter](int i){++counter[i];});
for (int i = 0;i<N;i++)
cout<<counter[i]<<endl;
}
I am rotating an array or vector clockwise and counterclockwise in C++. Which is the most efficient way in terms of time complexity to do that ?
I used rotate() function but I want to know is there any faster methods than this ?
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int main()
{
vector<int> v;
for(int i=0;i<5;i++)
v.push_back(i);
int d=2;
rotate(v.begin(),v.begin()+d,v.end());
return 0;
}
rotate() is a linear time function and that is the best you can do.
However, if you need to do multiple rotates, you can accumulate.
For eg:
rotation of 4 and rotation of 5 is same as a single rotate of 9.
Or in fact, in some applications, you may not even want to actually rotate.
Like, if you want to rotate by 'd'. You can just make a function that returns v[(i+d)%v.size()] when asked for v[i]. This is constant time solution. But like I said, this is application specific.
General answer for the "can I make XY faster?" kind of question:
Maybe you can. But probably you shouldn't.
std::rotate is designed to be efficient for the average case. That means, if you have a very specific case, it might be possible to have a more performant implementation for that case.
BUT:
Don't bother to search for a more performant implementation for your specific case, because finding that specific implementation will require you to know about the detailed performance of the steps you take and about the optimizations your compiler will perform.
Don't bother to implement it, because you will have to test it and still your coverage of corner cases won't be as good as the tests already performed with the standard library implementation.
Don't use it, because someone will be irritaded, asking himself by why you rolled your own implementation and not just used the standard library. And someone else will use it for a case where it is not as performant as the standard implementation.
Don't invest time to improve the performance of a clear piece of code, unless you are 100% sure that it is a performance bottleneck. 100% sure means, you have used a profiler and pinpointed the exact location of the bottleneck.
#include <bits/stdc++.h>
#include <iostream>
using namespace std;
void rotatebyone(int arr[], int n) {
int temp= arr[0], i;
for(i=0;i<n;i++)
{
arr[i]=arr[i+1];
arr[n]=temp;
}
}
int main()
{
int arr[]={2,3,4,5,6,7,8};
int m= sizeof(arr)/sizeof(arr[0]);
int i;
int d=1;
for(i=0;i<d;i++) { //function to implement it d no of times
rotatebyone(arr,m);
}
for(i=0;i<m;i++) {
cout<<arr[i]<<"";
}
return 0;
}