This is a task specific code for which I want to know if there's a better way of doing it. Thus, people who love logic and coding, please help me out.
This is the question :
Let A be an array of n positive integers. All the elements are distinct.
If A[i] > A[j] and i < j then the pair (i, j) is called a special pair of A. Given n find the number of
special pairs of A.
Its pretty simple and straight. Here's the following solution I implemented. The logic part.
for(int j=0;j<nos.size();j++)
{
for(int k=j+1;k<nos.size();k++)//always maintain condition that i<j and simply compare the numbers.
{
if(nos[j] > nos[k])
{
spc++;//compute special pair.
}
}
}
Every nos[i] contains the array for which special pair is to be computed. Is there a way to do this with using a single loop? Or any other logic that can be time saving and is faster. Thanks in advance I seek to learn more from this.
And can you please tell me how I can determine which code is faster without having to execute the code. Your inputs are really welcome.
The main question is to compute the no.of special pairs. Thus, just the increment of spc.
I think #syam is correct that this can be done in O(N log N) time (and O(N) extra space).
You'd do this with a balanced binary tree where each node has not only a value, but also a count of the number of descendants in its left sub-tree.
To count the special pairs, you'd walk the array from the end to the beginning, and insert each item in the tree. As you insert the item in the tree, you find the number of items in its left sub-tree -- those are the items that are less than it, but were to its right in the array (i.e., each one represents a special pair). Since we only descend through ~log(N) nodes to insert an item, the time to compute the number of items to the left is also O(log N). We also have to update the counts of items to the left approximately log(N)/2 times (again, logarithmic complexity).
That gives O(N log N) time.
Edit for more details: The balancing of the tree is fairly conventional (e.g., AVL- or RB-tree) with the addition of adjusting the count of items to the left as it does rotations to restore balance.
As you insert each item, you descend through the tree to the point where it's going to be inserted. At the root node, you simply record the count of items in the left sub-tree. Then let's say you new item is greater than that, so you descend to the right. As you do this, you're doing two things: recording your current position, so you know the location of this node relative to the nodes already inserted, and updating the counts in the tree so you'll have an accurate count for later insertions.
So, let's work through a small sample. For the sake of argument, let's assume our input is [6, 12, 5, 9, 7]. So, our first insertion is 7, which becomes the root of our tree, with no descendants, and (obviously) 0 to its left.
Then we insert 9 to its right. Since it's to the right, we don't need to adjust any counts during the descent -- we just increment our count of items to the left. That's it, so we know for 9 we have one special pair ([9,7], though we haven't kept track of that).
Then we insert 5. This is to the left of 7, so as we descend from 7, we increment its count of items to the left to 1. We insert 5, with no items to the left, so it gets a count of 0, and no special pairs.
Then we insert 12. When we hit the root node (7) it has a count of 1 item to the left. We're descending to the right, so we increment again for the root node itself. Then we descend to the right again from 9, so we add one more ( +0 from its left sub-tree), so 12 has three special pairs.
Then we insert 6. We descend left from 7, so we don't add anything from it. We descend right from 5, so we add 1 (again, +0 from its left sub-tree). So it has one special pair.
Even when you need to generate all the special pairs (not just count them) you can expect the tree to improve speed in the average case (i.e., pretty much anything except sorted in descending order). To generate all the special pairs, we insert each item in the tree as before, then traverse through the tree to the left of that item. Where the naive algorithm traversed (and compare to) all the elements to the right in the array to find those that would be special pairs, this only has to traverse the tree to find those that actually are special pairs.
This does have one side effect though: it generates the pairs in a different order. Instead of each pair being generated in the order it occurred in the array, the pairs will be generated in descending order by the second element. For example, given an input like [4,1,2,3], the naive algorithm would produce [[4,1], [4,2], [4,3]], but this will produce `[[4,3], [4,2], [4,1]].
You mean can you do better than quadratic runtime? No. To see this, consider the decreasing sequence A = (N, N - 1, ..., 2, 1). For this sequence, all the pairs (i, j) with i < j are special, and there are O(N^2) such pairs. Since you must output every special pair, you need quadratic time to do so.
I don't think you can improve your algorithm. In my opinion it shows O(n²) complexity. You can proove that by counting the number of inner loops your program has to perform for an array of length n. The first loop will have n-1 iterations, the second loop n-2, the third n-3 and so on. Summing that up using the formula for the sum of the first n integers (only backwards this time and not from 1 to n but from 2 to n-1):
(n-1)+(n-2)+(n-3)+...3+2 = n*(n-1)/2 - 1. You dont have to loop over the last remaining element, since there is no other element left to compare. Actually this is the only improvement I see for your algorithm ;-):
for(int j=0;j<nos.size();j++)
to
for(int j=0;j<nos.size()-1;j++)
Summing up for large n the expression n*(n-1)/2 - 1 behaves like n² and that's where I believe the O(n²) comes from. Please correct me if I'm wrong.
Related
Edit: number of elements to be displayed can be user defined and default to 10 but can be defined to be a very big number.
I have a file which I parse for words, then I need to count how many times each word appeared in the text and display the 10 words with the highs number of appearances(using C++).
I currently insert each parsed word into a std::map, the word is the key and the number of its appearances is the value. Each time I come across a word that is not part of the std::map I add it with the initial value of 1 and each time I come across a word that is part of the map I add 1 to its current value.
After I am done parsing the file I have a map with all the unique words in the text and the number of their appearances but the map is not sorted by the value of the keys.
At this point I can traverse the std::map and push its words into priority queue(ordered with min value at the top), Once the priority queue reaches maximum capacity of 10 words I check if the value I am about to insert is bigger then the value at the top and if so I pop the top and Insert the value(if not I move on to the next value from the std::map.
Because each word appears only once(at this stage) I know for sure that each value at the priority queue is unique.
My question is can this be done more efficiently in regards to complicity?
This is python's collections.Counter, so you could look there for a real-world example. It essentially does the same thing you are doing: get counts by incrementing a dictionary, then heapq.nlargest on the (word, count) pairs. (priority queue is a heap. I have no idea why they added a Q.)
Consider select m largest/smallest out of N words. This should have a theoretical limit of O(N log m)
You should create the counts in O(N) time with an std::unordered_map. This is important, you don't care about sorting the words alphabetically, so don't use std::map here. If you use std::map, you're already at O(N log N) which is greater than the theoretical limit.
Now, when selecting the top 10, you need pretty much any method that only looks at 10 items at a time. Priority queue with a max size is a good option. The important point is that you don't track more than you need to. Your complexity here is O(N log m), which becomes O(N) in the special case when n is small compared to N. But the common mistake would be to include the whole data set when comparing items.
However, check if m >= N, because if you do need the whole data set, you can just call std::sort. I'm assuming you need them in order. If you didn't, this case would become really trivial. And check m==1 so you can just use max.
In conclusion, except for using the wrong map, I believe you've already met the theoretical limit in terms of big O complexity.
Suppose I have an unsorted list such as the one below:
[1, 2, 3, 1, 1, 5, 2, 1]
and I want to return the number of minimum elements (in this case, min = 1), which is 4.
A quick solution is to just find the minimum using some built in min() function, and then iterate over the list again and compare values, then count them up. O(2n) time.
But I'm wondering if it's possible to do it in strictly O(n) time - only make one pass through the list. Is there a way to do so?
Remember that big-O notation talks about the way in which a runtime scales, not the absolute runtime. In that sense, an algorithm that makes two passes over an array that each take time O(n) also has runtime O(n) - the runtime will scale linearly as the input size increases. So your two-pass algorithm will work just fine.
A stronger requirement is that you have a one-pass algorithm, in which you get to see all the elements once. In that case, you can do this by tracking the smallest number you've seen so far and all the positions where you've seen it. Whenever you see a value,
if that value is bigger than the smallest you've seen, ignore it;
if that value equals the smallest you've seen, add it to the list of positions; and
if that value is smaller than the smallest you've seen, discard your list of all the smallest elements (they weren't actually the smallest) and reset it to a list of just the current position.
This also takes time O(n), but does so in a single pass.
Given a integer array like
int numbers[8]={1, 3, 5, 7, 8, 6, 4, 2};
The half side in the front array are odd numbers, and the rest (the equal amount number)
are even. The odd numbers are in an ascending order and even part are in a descending order. After the sorting, the order of the numbers can't be changed.
How can I sort them alternatively with time complexity less than O(n^2) and space complexity O(1)?
For this example, the result would be: {1,8,3,6,5,4,7,2};
I can't use external array storage but temporary variables are acceptable.
I have tried to use two pointers(oddPtr, evenPtr) to point odd and even numbers separately, and move evenPtrto insert the even values to the middles of odd numbers.(Like insertion sort)
But it takes O(n^2).
UPDATED
As per Dukeling's comment I realized that the solution I propose in fact is not linear,but linearithmic and even worse - you can't control if it takes extra memory or not. On my second thought I realized you know a lot about the array you to implement a more specific, but probably easier solution.
I will make an assumption that all values in the array are positive. I need this so that I can use negative values as kind of 'already processed' flag. My idea is the following - iterate over the array from left to right. For each element if it is already processed(i.e. its value is negative) simply continue with the next one. Otherwise you will have a constant formula where is the position where this element should be:
If the value is odd and its index is i it should move to i*2
If the value is even and its index is i it should move to (i - n/2)*2 + 1
Store this value into a temporary and make the value at the current index of the array 0. Now until the position where the value we 'have at hand' is not zero, swap it with the value staying at the position we should place it according to the formula above. Also when you place the value at hand negate it to 'mark it as processed'. Now we have a new value 'at hand' and again we calculate where it should go according to the formula above. We continue moving values until the value we 'have at hand' should go to the position with 0. With a little thought you can prove that you will never have a negative('processed') value at hand and that eventually you will end up at the empty spot of the array.
After you process all the values iterate once over the array to negate all values and you will have the array you need. The complexity of the algorithm I describe is linear- each value will be no more than once 'at hand' and you will iterate over it no more than once.
Recently, I was given a question to find Minimum comparisons needed to search an element from n given elements, provided they are sorted, and with more than half(n/2) occurrences.
For eg. given sorted array as: 1,1,2,2,2,2,2,7,11. Size of this array is: 9. We need to find the minimum comparisons required to find 2(since it has more than n/2 occurrences(5).
What would be the best algorithm to do so and what would be the worst case Complexity?
Options provided were:
i) O(1)
ii) O(n)
iii) O(log(n))
iv) O(nlog(n))
provided they are sorted
In this case you have to check only one middle element, if fact that
with more than half(n/2) occurrences
is guaranteed
There can be two possible interpretations of the question. I'll explain both.
Firstly, if the question assumes that there is definitely a number which occurs n/2 or more times, then MBo's answer suffices.
However, if there is a chance that there is no element with n/2 occurrences, then the complexity is O(log(n)). We cannot merely check for the n/2th element. For example, in array 2, 4, 6, 6, 6, 8, 10, the middle element is 6, but it does not occur n/2 or more times. The algorithm for this case goes as follows:
Select the middle element (say x).
Find the index of x in the left sub-array using binary search (say lIndex).
Find the index of x in the right sub-array using binary search (say rIndex).
If rIndex - lIndex >= n/2, then the number occurs n/2 or more times. Otherwise, no such number is present.
Since we use binary search to find the number in left and right sub-arrays, the complexity of the above algorithm is O(log(n)).
This question is related to
this one, and more precisely to this answer to it.
Here goes: I have a C++/TR1 unordered_set U of unsigned ints (rough cardinality 100-50000, rough value range 0 to 10^6).
Given a cardinality N, I want to as quickly as possible iterate over N random but
unique members of U. There is no typical value for N, but it should
work fast for small N.
In more detail, the notion of "randomness" here is
that two calls should produce somewhat different subsets -- the more different,
the better, but this is not too crucial. I would e.g. be happy with a continuous
(or wrapped-around continuous)
block of N members of U, as long as the start index of the block is random.
Non-continuous at the same cost is better, but the main concern is speed. U changes
mildly, but constantly between calls (ca. 0-10 elements inserted/erased between calls).
How far I've come:
Trivial approach: Pick random index i such that (i+N-1) < |U|.
Get an iterator it to U.begin(), advance it i times using it++, and then start
the actual loop over the subset. Advantage: easy. Disadvantage: waste of ++'es.
The bucket approach (and this I've "newly" derived from above link):
Pick i as above, find the bucket b in which the i-th element is in, get a local_iterator lit
to U.begin(b), advance lit via lit++ until we hit the i-th element of U, and from then on keep incrementing lit for N times. If we hit the end of the bucket,
we continue with lit from the beginning of the next bucket. If I want to make it
more random I can pick i completely random and wrap around the buckets.
My open questions:
For point 2 above, is it really the case that I cannot somehow get an
iterator into U once I've found the i-th element? This would spare me
the bucket boundary control, etc. For me as quite a
beginner, it seems unperceivable that the standard forward iterator should know how to
continue traversing U when at the i-th item, but when I found the i-th item myself,
it should not be possible to traverse U other than through point 2 above.
What else can I do? Do you know anything even much smarter/more random? If possible, I don't want to get involved in manual
control of bucket sizes, hash functions, and the like, as this is a bit over my head.
Depending on what runtime guarantees you want, there's a famous O(n) algorithm for picking k random elements out of a stream of numbers in one pass. To understand the algorithm, let's see it first for the case where we want to pick just one element out of the set, then we'll generalize it to work for picking k elements. The advantage of this approach is that it doesn't require any advance knowledge of the size of the input set and guarantees provably uniform sampling of elements, which is always pretty nice.
Suppose that we want to pick one element out of the set. To do this, we'll make a pass over all of the elements in the set and at each point will maintain a candidate element that we're planning on returning. As we iterate across the list of elements, we'll update our guess with some probability until at the very end we've chosen a single element with uniform probability. At each point, we will maintain the following invariant:
After seeing k elements, the probability that any of the first k elements is currently chosen as the candidate element is 1 / k.
If we maintain this invariant across the entire array, then after seeing all n elements, each of them has a 1 / n chance of being the candidate element. Thus the candidate element has been sampled with uniformly random probability.
To see how the algorithm works, let's think about what it has to do to maintain the invariant. Suppose that we've just seen the very first element. To maintain the above invariant, we have to choose it with probability 1, so we'll set our initial guess of the candidate element to be the first element.
Now, when we come to the second element, we need to hold the invariant that each element is chosen with probability 1/2, since we've seen two elements. So let's suppose that with probability 1/2 we choose the second element. Then we know the following:
The probability that we've picked the second element is 1/2.
The probability that we've picked the first element is the probability that we chose it the first time around (1) times the probability that we didn't just pick the second element (1/2). This comes out to 1/2 as well.
So at this point the invariant is still maintained! Let's see what happens when we come to the third element. At this point, we need to ensure that each element is picked with probability 1/3. Well, suppose that with probability 1/3 we choose the last element. Then we know that
The probability that we've picked the third element is 1/3.
The probability that we've picked either of the first two elements is the probability that it was chosen after the first two steps (1/2) times the probability that we didn't choose the third element (2/3). This works out to 1/3.
So again, the invariant holds!
The general pattern here looks like this: After we've seen k elements, each of the elements has a 1/k chance of being picked. When we see the (k + 1)st element, we choose it with probability 1 / (k + 1). This means that it's chosen with probability 1 / (k + 1), and all of the elements before it are chosen with probability equal to the odds that we picked it before (1 / k) and didn't pick the (k + 1)st element this time (k / (k + 1)), which gives those elements each a probability of 1 / (k + 1) of being chosen. Since this maintains the invariant at each step, we've got ourselves a great algorithm:
Choose the first element as the candidate when you see it.
For each successive element, replace the candidate element with that element with probability 1 / k, where k is the number of elements seen so far.
This runs in O(n) time, requires O(1) space, and gives back a uniformly-random element out of the data stream.
Now, let's see how to scale this up to work if we want to pick k elements out of the set, not just one. The idea is extremely similar to the previous algorithm (which actually ends up being a special case of the more general one). Instead of maintaining one candidate, we maintain k different candidates, stored in an array that we number 1, 2, ..., k. At each point, we maintain this invariant:
After seeing m > k elements, the probability that any of the first m elements is chosen is
k / m.
If we scan across the entire array, this means that when we're done, each element has probability k / n of being chosen. Since we're picking k different elements, this means that we sample the elements out of the array uniformly at random.
The algorithm is similar to before. First, choose the first k elements out of the set with probability 1. This means that when we've seen k elements, the probability that any of them have been picked is 1 = k / k and the invariant holds. Now, assume inductively that the invariant holds after m iterations and consider the (m + 1)st iteration. Choose a random number between 1 and (m + 1), inclusive. If we choose a number between 1 and k (inclusive), replace that candidate element with the next element. Otherwise, do not choose the next element. This means that we pick the next element with probability k / (m + 1) as required. The probability that any of the first m elements are chosen is then the probability that they were chosen before (k / m) times the probability that we didn't choose the slot containing that element (m / (m + 1)), which gives a total probability of being chosen of k / (m + 1) as required. By induction, this proves that the algorithm perfectly uniformly and randomly samples k elements out of the set!
Moreover, the runtime is O(n), which is proportional to the size of the set, which is completely independent of the number of elements you want to choose. It also uses only O(k) memory and makes no assumptions whatsoever about the type of the elements being stored.
Since you're trying to do this for C++, as a shameless self-promotion, I have an implementation of this algorithm (written as an STL algorithm) available here on my personal website. Feel free to use it!
Hope this helps!