I have the following 2 codes:
int i=0;
while(i<=1000000000 && i!=-1) {
i++;
}
I think the run-time complexity is 4 billion
In the while condition is 3 operations (i<=1000000000),(i!=-1) and && ,
and
int i=0;
while(i!=-1) {
if(i>=1000000000) break;
i++;
}
Which I think the run-time complexity is 3 billion,
in the while condition is 1 operation (i<=1000000000) in the if is 1 operation (i!=-1),
But when I run it the two code have the same running time so why was that?
I have change the two codes a little bit as follow:
int n = 1000000000;
int i=0;
while(i<=n && i!=-1) {
i++;
}
int n = 1000000000;
int i=0;
while(i!=-1) {
if(i>=n) break;
i++;
}
This time the 3rd code block run in 2.6s and the 4th is 3.1s,
Why was this happened?
What was the time complexity of the four codes?
I use dev-c++ IDE.
Time complexity and actual running time are two very different things.
Time complexity only has meaning when we are talking about variable input size. It tells how well algorithm scales for larger inputs. If we assume that your input is n (or 1000000000 in the first two cases), then all your examples have linear time complexity. It means, roughly, if you take n two times larger, running time is also doubled.
Actual running time somehow depends on complexity, but you can't reliably calculate it. Reasons are: compiler optimizations, CPU optimizations, OS thread management and many others.
I think by 'time complexity' you mean number of primitive operations for computer to execute. Then there are no difference between
while(i<=1000000000 && i!=-1)
and
while(i!=-1) {
if(i>=1000000000) break;
because most likely operator && implemented not as 'take first operand, take second operand, and perform some operation on them', but as a sequence of conditional jumps:
if not FirstCondition then goto FalseBranch
if not SecondCondition then goto FalseBranch
TrueBranch:
... here is your loop body
FalseBranch:
... here is the code after loop
And that's exactly what you did by hands in the second example.
However, this stuff is only makes sense to the specific compiler and optimization settings (in release build your loop will be eliminated entirely by any descent compiler).
Related
I think the time complexity of below code should be O(1) as worst case can be log 1000 base 2 or something definite. But I am not sure as it's time does vary with input and the given answer is O(n), which I am very confused about how they got that. If we increase n, function gets called fewer times so is it O(1/n)? Is it even possible?
#define LIMIT 1000
void fun2(int n)
{
if (n <= 0)
return;
if (n > LIMIT)
return;
cout<<n<<" ";
fun2(2*n);
cout<<n<<" ";
}
#define LIMIT 1000 along with the base case of if (n > LIMIT) return; guarantees O(1) because it puts a ceiling on the number of iterations the function can run.
Even if this was #define LIMIT 10e50, it'd still be O(1).
Recall that Big O is concerned with theoretical growth, not with how much work is to be done in practice. If you have a cap on how much the function can grow, regardless of how large that cap may be, it's a constant time operation from a complexity perspective.
Is Big O necessarily a realistic reflection of the work the algorithm does? No. Big O is a scalability heuristic, not the final word on efficiency. All O(1) says here is that once n > LIMIT, you can increase n indefinitely with no additional cost. In the real world, constant factors often matter.
To respond to the individual points you've raised:
I think the time complexity of below code should be O(1) as worst case can be log 1000 base 2 or something definite.
Yep, that's exactly right!
But I am not sure as it's time does vary with input
You are correct that the runtime varies with the input size. However, that does not necessarily mean that the runtime is not O(1). If an algorithm's runtime is always bounded from above by some constant, regardless of what the input size is, then its runtime is O(1). Stated differently, an O(1) runtime means "without even looking at your input, I can bound how long the algorithm is going to take to run." (Technically that isn't 100% accurate, since big-O notation talks about what happens for sufficiently large inputs, but it's true in this case.)
Here's another example of this:
void sillyFunction(int n) {
for (int i = 0; i < 137 && i < n; i++) {
cout << '*' << endl;
}
}
We can guarantee that the loop will run at most 137 times regardless of what n is. However, for small values of n, we may do less work than this. But the runtime here is still O(1), since we have that bound of "at most 137 iterations."
Here's another example:
void amusingFunction(int n) {
for (int i = 137; i >= 0 && i >= n; i++) {
cout << '*' << endl;
}
}
Again, this loop is guaranteed to run at most 137 times. Here, though, the work decreases as we increase n, to the point where the loop never runs when n ≥ 137. But since we can bound the total number of loop iterations at at most 137 without even looking at n, the runtime is O(1).
Here's a trickier example:
void deviousFunction(int n) {
if (n <= 137) {
while (true) { // infinite loop!
cout << '*';
}
}
cout << "Yup." << endl;
}
This function will go into an infinite loop for any n ≤ 137. However, for sufficiently large values of n (namely, when n > 137), the algorithm always terminates immediately. This algorithm therefore has a runtime of O(1): there's a constant amount of work where, for any sufficiently large n, the algorithm does at most that much work. (This is highly contrived and I've never seen anything like this before, but you get the picture.)
and the given answer is O(n), which I am very confused about how they got that.
The runtime bound here of O(n) to me seems incorrect. It's technically not wrong to say the runtime is O(n) because that does provide a correct bound on the runtime, but it's not tight. You should ask whoever gave you this bound to explain their reasoning; perhaps there's a typo in the code or in the explanation?
If we increase n, function gets called fewer times so is it O(1/n)? Is it even possible?
As n increases, the number of recursive calls is nonincreasing, but it doesn't necessarily decrease. For example, fun2(1000) and fun2(10000000) each result in a total of one call being made.
It's not possible for an algorithm to have a runtime of O(1 / n) because all algorithms do at least a constant amount of work, even if that work is "set up the stack frame." A runtime bound of O(1 / n) means that, for sufficiently large n, you would be doing less than one unit of work. So in that sense, there's a difference between "the runtime drops as n gets bigger, to the point where it flattens out at a constant" and "the runtime is O(1 / n)."
I am working on an assignment for school. Essentially, we are analyzing sorting algorithms and their costs on large sets of numbers. We have a best case (in order already), worst case (reverse order), and average case (random order). However, for almost all of my sorting algorithms, sorting the worst case takes less time than the average case. After reading, it definitely seems like branch prediction is causing this. It is recognizing the pattern (decreasing order) and executing the code quicker than it should be in theory (big O notation).
I've done some research on branch prediction, and while there appears to be ways to optimize it to be faster, I can't find anything on disabling it entirely. Is there a G++ flag I can use? Or a terminal command?
This is an example of my bubble sort algorithm:
void bubble(vector<long> &vector) {
for (int i = 0; i < vector.size() - 1; i++){
for (int j = 0; j < vector.size() - i - 1; j++) {
if (vector[j] > vector[j + 1]) {
long tmp = vector[j];
vector[j] = vector[j+1];
vector[j+1] = tmp;
}
}
}
}
My timings for my average case is almost double for the worst case.
Big-O notation is all about asymptotic behavior. In other words, it describes what factors become dominant as the problem size gets bigger.
CPU micro-optimizations like prefetching and branch prediction can have large relative effects at smaller sizes. But the nature of an O(n^2) procedure, relative to an O(n) procedure, is that it will become slower once the problem size is big enough.
So don't bother speculating or worrying about the effects of branch prediction. Just make your array bigger. Try sorting an array of 1 million elements. Or 1 billion. I guarantee you: if you're right about one situation being the worst case, it will get slower than the best case. (Hint: you're not right about that.)
Both of these algorithms are giving same output but the first one takes nearly double time (>.67) compared to second one (.36). How is this possible? Can you tell me the time complexity of both algorithms? If they're the same, why is the time different?
1st algorithm:
for (int i =0 ;i<n;i++){
cin>>p[i];
if(i>0){
if(p[i-1]>p[i]){
cout<<p[i]<<" ";
}
else{
cout<<"-1"<<" ";
}
}
}
2nd algorithm:
for (int i =0 ;i<n;i++){
cin>>p[i];
}
for (int i =0 ; i<n-1;i++){
if(p[i]>p[i+1]){
cout<<p[i]<<" ";
}
else{
cout<<"-1"<<" ";
}
}
Time complexity in a modern processor can be an almost-useless performance statistic.
In this case we have one algorithm that goes from 0 to n-1--O(N)--and a second that goes 0 to n-1 twice--the constant drops out so it's still O(N). The first algorithm has an extra if statement that will be false exactly once and a decent compiler will obliterate that. We wind up with the same amount of input, the same amount of output, the same amount of array accesses (sort of) and the same amount of if (a>b).
What the second has that the first doesn't is determinism. One loop determines everything for the second. All of the input is read in in the first loop. That means The CPU can see exactly what is going to happen ahead of the time because it has all of the numbers and thus knows exactly how every branch of the if will go and can predict with 100% accuracy, load up the caches, and fill up pipelines to everything is ready ahead of time without missing a beat.
Algorithm 1 can't do that because the next input is not known until the next iteration of the loop. Unless the input pattern is predictable, it's going to guess which way if(p[i-1]>p[i]) is going wrong a lot of the time.
Additional reading: Why is it faster to process a sorted array than an unsorted array?
In my University we are learning Big O Notation. However, one question that I have in light of big o notation is, how do you convert a simple computer algorithm, say for example, a linear searching algorithm, into a mathematical function, say for example 2n^2 + 1?
Here is a simple and non-robust linear searching algorithm that I have written in c++11. Note: I have disregarded all header files (iostream) and function parameters just for simplicity. I will just be using basic operators, loops, and data types in order to show the algorithm.
int array[5] = {1,2,3,4,5};
// Variable to hold the value we are searching for
int searchValue;
// Ask the user to enter a search value
cout << "Enter a search value: ";
cin >> searchValue;
// Create a loop to traverse through each element of the array and find
// the search value
for (int i = 0; i < 5; i++)
{
if (searchValue == array[i])
{
cout << "Search Value Found!" << endl;
}
else
// If S.V. not found then print out a message
cout << "Sorry... Search Value not found" << endl;
In conclusion, how do you translate an algorithm into a mathematical function so that we can analyze how efficient an algorithm really is using big o notation? Thanks world.
First, be aware that it's not always possible to analyze the time complexity of an algorithm, there are some where we do not know their complexity, so we have to rely on experimental data.
All of the methods imply to count the number of operations done. So first, we have to define the cost of basic operations like assignation, memory allocation, control structures (if, else, for, ...). Some values I will use (working with different models can provide different values):
Assignation takes constant time (ex: int i = 0;)
Basic operations take constant time (+ - * ∕)
Memory allocation is proportional to the memory allocated: allocating an array of n elements takes linear time.
Conditions take constant time (if, else, else if)
Loops take time proportional to the number of time the code is ran.
Basic analysis
The basic analysis of a piece of code is: count the number of operations for each line. Sum those cost. Done.
int i = 1;
i = i*2;
System.out.println(i);
For this, there is one operation on line 1, one on line 2 and one on line 3. Those operations are constant: This is O(1).
for(int i = 0; i < N; i++) {
System.out.println(i);
}
For a loop, count the number of operations inside the loop and multiply by the number of times the loop is ran. There is one operation on the inside which takes constant time. This is ran n times -> Complexity is n * 1 -> O(n).
for (int i = 0; i < N; i++) {
for (int j = i; j < N; j++) {
System.out.println(i+j);
}
}
This one is more tricky because the second loop starts its iteration based on i. Line 3 does 2 operations (addition + print) which take constant time, so it takes constant time. Now, how much time line 3 is ran depends on the value of i. Enumerate the cases:
When i = 0, j goes from 0 to N so line 3 is ran N times.
When i = 1, j goes from 1 to N so line 3 is ran N-1 times.
...
Now, summing all this we have to evaluate N + N-1 + N-2 + ... + 2 + 1. The result of the sum is N*(N+1)/2 which is quadratic, so complexity is O(n^2).
And that's how it works for many cases: count the number of operations, sum all of them, get the result.
Amortized time
An important notion in complexity theory is amortized time. Let's take this example: running operation() n times:
for (int i = 0; i < N; i++) {
operation();
}
If one says that operation takes amortized constant time, it means that running n operations took linear time, even though one particular operation may have taken linear time.
Imagine you have an empty array of 1000 elements. Now, insert 1000 elements into it. Easy as pie, every insertion took constant time. And now, insert another element. For that, you have to create a new array (bigger), copy the data from the old array into the new one, and insert the element 1001. The 1000 first insertions took constant time, the last one took linear time. In this case, we say that all insertions took amortized constant time because the cost of that last insertion was amortized by the others.
Make assumptions
In some other cases, getting the number of operations require to make hypothesises. A perfect example for this is insertion sort, because it is simple and it's running time depends of how is the data ordered.
First, we have to make some more assumptions. Sorting involves two elementary operations, that is comparing two elements and swapping two elements. Here I will consider both of them to take constant time. Here is the algorithm where we want to sort array a:
for (int i = 0; i < a.length; i++) {
int j = i;
while (j > 0 && a[j] < a[j-1]) {
swap(a, i, j);
j--;
}
}
First loop is easy. No matter what happens inside, it will run n times. So the running time of the algorithm is at least linear. Now, to evaluate the second loop we have to make assumptions about how the array is ordered. Usually, we try to define the best-case, worst-case and average case running time.
Best-case: We do never enter the while loop. Is this possible ? Yes. If a is a sorted array, then a[j] > a[j-1] no matter what j is. Thus, we never enter the second loop. So, what operations are done in this case is the assignation on line 2 and the evaluation of the condition on line 3. Both take constant time. Because of the first loop, those operations are ran n times. Then in the best case, insertion sort is linear.
Worst-case: We leave the while loop only when we reach the beginning of the array. That is, we swap every element all the way to the 0 index, for every element in the array. It corresponds to an array sorted in reverse order. In this case, we end up with the first element being swapped 0 times, element 2 is swapped 1 times, element 3 is swapped 2 times, etc up to element n being swapped n-1 times. We already know the result of this: worst-case insertion is quadratic.
Average case: For the average case, we assume the items are randomly distributed inside the array. If you're interested in the maths, it involves probabilities and you can find the proof in many places. Result is quadratic.
Conclusion
Those were basics about analyzing the time complexity of an algorithm. The cases were easy, but there are some algorithms which aren't as nice. For example, you can look at the complexity of the pairing heap data structure which is much more complex.
I am parsing files around 1MB in size, reading the first 300KB and searching for a number of particular signatures. My strategy is, for each byte, see if the byte is in a map/vector/whatever of bytes that I know might be at the start of a signature, and if so look for the full signature - for this example, assume those leading bytes are x37, x50, and x52. Processing a total of 90 files (9 files 10 times actually), the following code executes in 2.122 seconds:
byte * bp = &buffer[1];
const byte * endp = buffer + bytesRead - 30; // a little buffer for optimization - no signature is that long
//multimap<byte, vector<FileSignature> >::iterator lb, ub;
map<byte, vector<FileSignature> >::iterator findItr;
vector<FileSignature>::iterator intItr;
while (++bp != endp)
{
if (*bp == 0x50 || *bp == 0x52 || *bp == 0x37) // Comparison line
{
findItr = mapSigs.find(*bp);
for (intItr = findItr->second.begin(); intItr != findItr->second.begin(); intItr++)
{
bool bMatch = true;
for (UINT i = 1; i < intItr->mSignature.size(); ++i)
{
if (intItr->mSignature[i] != bp[i])
{
bMatch = false;
break;
}
}
if (bMatch)
{
CloseHandle(fileHandle);
return true;
}
}
}
}
However, my initial implementation finishes in a sluggish 84 seconds. The only difference is related to the line labeled "// Comparison line" above:
findItr = mapSigs.find(*bp);
if (findItr != mapSigs.end())
...
A very similar implementation using a vector containing the 3 values also results in extremely slow processing (190 seconds):
if (find(vecFirstChars.begin(), vecFirstChars.end(), *bp) != vecFirstChars.end())
{
findItr = mapSigs.find(*bp);
...
But an implementation accessing the elements of the vector directly performs rather well (8.1 seconds). Not as good as the static comparisons, but still far far better than the other options:
if (vecFirstChars[0] == *bp || vecFirstChars[1] == *bp || vecFirstChars[2] == *bp)
{
findItr = mapSigs.find(*bp);
...
The fastest implementation so far (inspired by Component 10 below) is the following, clocking in at about 2.0 seconds:
bool validSigs[256] = {0};
validSigs[0x37] = true;
validSigs[0x50] = true;
validSigs[0x52] = true;
while (++bp != endp)
{
if (validSigs[*bp])
{
...
Extending this to use 2 validSigs to look if the 2nd char is valid as well brings the total run time down to 0.4 seconds.
I feel the other implementations should perform better. Especially the map, which should scale as more signature prefixes are added, and searches are O(log(n)) vs O(n). What am I missing? My only shot-in-the-dark guess is that with the static comparisons and (to a lesser extant) the vector indexing, I am getting the values used for the comparison cached in a register or other location that makes it significantly faster than reading from memory. If this is true, am I able to explicitly tell the compiler that particular values are going to be used often? Are there any other optimizations that I can take advantage of for the below code that are not apparent?
I am compiling with Visual Studio 2008.
This is simple enough to come down to the number of instructions executed. The vector, map, or lookup table will reside entirely in the CPU level 1 data cache so memory access isn't taking up time. As for the lookup table, as long as most bytes don't match a signature prefix the branch predictor will stop flow control from taking up time. (But the other structures do incur flow control overhead.)
So quite simply, comparing against each value in the vector in turn requires 3 comparisons. The map is O(log N), but the coefficient (which is ignored by big-O notation) is large due to navigating a linked data structure. The lookup table is O(1) with a small coefficient because access to the structure can be completed by a single machine instruction, and then all that remains is one comparison against zero.
The best way to analyze performance is with a profiler tool such as valgrind/kcachegrind.
The "compare against constants" compares 3 memory addresses against 3 constants. This case is going to be extremely easy to do things like unroll or do bit optimization on, if the compiler feels like it. The only branches that the written ASM is going to have here are going to be highly predictable.
For the literal 3 element vector lookup, there is the additional cost of dereferencing the addresses of the vector values.
For the vector loop, the compiler has no idea how big the vector is at this point. So it has to write a generic loop. This loop has a branch in it, a branch that goes one way 2 times, then the other way. If the computer uses the heuristic "branches go the way they did last time", this results in lots of branch prediction failures.
To verify that theory, try making the branching more predictable -- search for each element for up to 100 different input bytes at a time, then search for the next one. That will make naive branch prediction work on the order of 98% of the time, instead of the 33% in your code. Ie, scan 100 (or whatever) characters for signature 0, then 100 (or whatever) characters for signature 1, until you run out of signatures. Then go on to the next block of 100 characters to scan for signatures. I chose 100 because I'm trying to avoid branch prediction failures, and I figure a few percent branch prediction failures isn't all that bad. :)
As for the map solution, well maps have a high constant overhead, so it being slow is pretty predictable. The main uses of a map are dealing with large n lookups, and the fact that they are really easy to code against.