I'm preparing a C++ project , which I have to calcute many algorithms complexity big-O and compare it with the theoric value on a graph. I made a time function that calculate the time execution of an algorithm but I didn't find a way to calculte the complexity and draw the curve using time T and Input N.
Any ideas ?
In a nutshell: If you defined theoretical complexity T(n) all you have to do is to execute test x times for given range of n: n1, ..., nx and measure time of each test. Then you choose median nm from your set of n1, ..., nx and compute coefficient c, defined as: c = t(nm)/T(nm). t(nm) is measured time for median of n (nm), T(nm) is calculated teoretical complexity for nm.
Next, for each of your n, compute coefficient q, which is coefficient of consistency of theoretical and experimental complexity of your algorithm:
Finally you may draw plot of q(n) which is asymptote graph and it should converges asymptotically to 1. If your graph asymptotically goes below 1 theoretical complexity is overestimated, if goes above 1 complexity is underestimated.
Related
I got the task of showing the time taken by the merge sort algorithm theoretically ( n log(n) ) and practically (by program) on a graph by using different values of n and time taken.
In the program, I'm printing the time difference between before calling the function and after the end of the function in microseconds I want to know what dose n log(n) means.
I tryed with this values:
Number of values:
10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
program time in micro second:
12964 24961 35905 47870 88764 67848 81782 97739 111702 119682
time using n log n formula:
132877 285754 446180 611508 780482 952360 1.12665e+006 1.30302e+006 1.48119e+006 1.66096e+006
code:
auto start = std::chrono::high_resolution_clock::now();
mergeSort(arr, 0, n - 1);
auto elapsed = std::chrono::high_resolution_clock::now() - start;
long long microseconds = std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
cout << microseconds << " ";
Graph i got:
What time complexity actually means?
I interpret your question in the following way:
Why is the actual time needed by the program not K*n*log(n) microseconds?
The answer is: Because on modern computers, the same step (such as comparing two numbers) does not need the same time if it is executed multiple times.
If you look at the time needed for 50.000 and 60.000 numbers, you can see, that the 50.000 numbers even needed more time than the 60.000 numbers.
The reason might be some interrupt that occurred while the 50.000 numbers were sorted; I assume that you'll get a time between the 40.000 numbers and the 60.000 numbers if you run your program a second time.
In other words: External influences (like interrupts) have more impact on the time needed by your program than the program itself.
I got the task of showing the time taken by the merge sort algorithm theoretically (n log(n)) and practically (by program) on a graph by using different values of n and time taken.
I'd take a number of elements to be sorted that takes about one second. Let's say sorting 3 Million numbers takes one second; then I would sort 3, 6, 9, 12 ... and 30 Million numbers and measure the time.
This reduces the influence of interrupts etc. on the measurement. However, you'll still have some effect of the memory cache in this case.
You can use your existing measurements (especially the 50.000 and the 60.000) to show that for a small number of elements to be sorted, there are other factors that influence the run time.
Note that a graph of y = x log(x) is surprisingly close to a straight line.
This is because the gradient at any point x is 1 + log(x), which is a slowly growing function of x.
In other words, it's difficult within the bounds of experimental error to distinguish between O(N) and O(N log N).
The fact that the blue line is pretty straight is a reasonable verification that the algorithm is not O(N * N), but really without better statistical analysis and program control set-up, one can't say much else.
The difference between the red and blue line is down to "big O" not concerning itself with proportionality constants and other coefficients.
The time complexity is the time a program takes to execute, as a function of the problem size.
The problem size is usually expressed as the number of input elements, but some other measures can sometimes be used (e.g. algorithms on matrices of size NxN can be rated in terms of N instead of N²).
The time can effectively be measured in units of time (seconds), but is often assessed by just counting the number of atomic operations of some kind performed (e.g. the number of comparisons, of array accesses...)
In fact, for theoretical studies, the exact time is not a relevant information because it is not "portable": it strongly depends on the performance of the computer used and also on implementation details.
This is why algorithmicians do not really care about exact figures, but rather on how the time varies with increasing problem sizes. This leads to the concept of asymptotic complexity, which measures the running time to an unknown factor, and for mathematical convenience, an approximation of the running time is often used, to make the computations tractable.
If you study the complexity by pure benchmarking (timing), you can obtain experimental points, which you could call empirical complexity. But some statistical rigor should be applied.
(Some of the other answers do merge the concepts of complexity and asymptotic complexity, but this is not correct.)
In this discussion of complexity, you can replace time by space and you study the memory footprint of the program.
Time complexity has nothing to do with actual time.
It's just a way that helps us to compare different algorithms - which algorithm will run faster.
For example -
In case of sorting: we have bubble sort having time-complexity as O(n^2) and merge sort having time-complexity as O(N log(N)). So, with the help of time-complexity we can say that merge-sort is much better than bubble sort for sorting things.
Big-O notations was created so that we can have generalized way of comparing speed of different algorithms, a way which is not machine dependent.
Can we compare O(m+n) with O(n), are both same because we need to focus only on the highest power?
The complexity of both O(m+n) and O(n) is linear in relation to input n. In realtion to m, the complexity of O(m+n) is linear while O(n) is constant.
So, unless we analyse only the input n and assume m to be constant, we cannot in general simplify O(m+n) to O(n).
Sometimes we may be able to combine two input dimensions into one: For example, if m is number of input strings and n is the maximum length of input string, then we might reframe the premise by analysing complexity in relation to total length of all input strings.
O(m+n) is two-dimensional (it has to parameters, m and n) and you can't reduce it to one dimension without more information about the relationship between m and n.
A concrete example: Many graph algorithms (e.g. depth first search, topological sort) have time complexity O(v + e), where v is the number of vertices and e is the number of edges. You can consider two separate types of graph:
In a dense graph with lots of edges, e is proportional to v². The time complexity of the algorithm on this type of graph is O(v + v²), or O(v²).
In a sparse graph with few edges, e is proportional to v. The time complexity of the algorithm on this type of graph is O(v + v), or O(v).
So I've been attempting to analyze a specialized variant of Dijkstras algorithm that I've been working on. I'm after the worst case complexity.
The algorithm uses a Fibonacci Heap which in the case of the normal Dijkstra would run in O(E + V log V).
However this implementation needs to do a lookup in the inner loop where we update neighbours. This lookup will execute for every edge and will be in logarithmic time, where the lookup is in a datastructure that contains all edges. Also the graph has the restriction that no node will have more than 4 neighbours.
O(Vlog V) is the complexity for the outer loop but I'm not sure what the worst case will be for the inner loop. I'm thinking that since that each edge in the graph will be checked O(E) times and each edge will take logarithmic time it should be Elog E which should exceed Vlog V and result in O(Elog E) complexity for the algorithm.
Any insight would be awesome!
The amortized complexity of Decrease-Key on Fibonacci Heap is O(1), that is to say you have |E| such operations on the Fibonacci Heap, the total cost will be O(E). Also you have |V| Extract-Min operations, which cost O(lnV) each. So the total cost is O(E+VlnV).
I'm trying to implement the Quadratic Sieve, and i noticed i need to choose a smoothness bound B to use this algorithm. I found in the web that B also stands for exp((1/2 + o(1))(log n log log n)^(1/2)) but now my problem is o(1). Could you tell me what does o(1) stand for?
Let's start with your answer:
The definition of f(n) being o(1) is that limn→∞f(n)=0. That means that for all ϵ>0 there exists Nϵ, depending on ϵ, such that for all n≥Nϵ we have |f(n)|≤ϵ.
Or in plain English:
The notation o(1) means "a function that converges to 0."
This is a fantastic resource: http://bigocheatsheet.com
Look at the Notation for asymptotic growth section
The answer can also be found in this duplicate post: Difference between Big-O and Little-O Notation
f ∈ O(g) says, essentially
For at least one choice of a constant k > 0, you can find a constant a such that the inequality f(x) < k g(x) holds for all x > a.
Note that O(g) is the set of all functions for which this condition holds.
f ∈ o(g) says, essentially
For every choice of a constant k > 0, you can find a constant a such that the inequality f(x) < k g(x) holds for all x > a.
O(1) means it takes constant time, unaffected by input size.
o(1) (slightly different!) means the function it represents converges to 0.
I wouldn't worry too much about the smoothness bound, write the rest of the much more complicated algorithm first, using very simple smoothness formula. (first 100,000 primes, or first n primes where n = c *log(number)) Once the rest of the algorithm is working (and perhaps optimized?) then choosing a smoothness bound carefully will actually have a significant effect. That long complicated formula you gave in the question is the approximate (asymptotic) running time for the quadratic sieve algorithm itself, I'm pretty sure it is unrelated to choosing the smoothness bound.
Have anyone used the Boolean function of Boost polygon library?
Boost polygon library
It says that the algorithm is O(nlogn) in time complexity, n = #points
I input 200000 random generated polygons (with 5~8 potins)
but the OR and XOR function cost about half hours (yes, just call its function)
though the result is correct, but the time consuming is horrible
have anyone met this problem?
Although it would always help to post the code that exhibits the described behavior, I assume that each of the i=1..n polygons has some (unique) crossings with each of the previous 1..(i-1) polygons, which implies that the number of points that result from XOR'ing the first n-1 polygons is quadratic in n, so you are requesting n times an operation of O(#Points * log(#Points)) where #Points is O(n^2), thus the total complexity would be O(n^2*log(n)).