I have a question regarding to runtime complexity of Dijkstra's algorithm. (see pseudo code in CLRS vertion 3):
DIJKSTRA(G, w, s)
1 INITIALIZE-SINGLE-SOURCE(G, s)
2 S ← ∅
3 Q ← V[G]
4 while Q != ∅
5 do u ← EXTRACT-MIN(Q)
6 S ← S ∪ {u}
7 for each vertex v ∈ Adj[u]
8 do RELAX(u, v,w)
I understand that line3 is O(V), line5 is O(VlogV) in total; line7 is O(E) in total, line8 implies decrease_key() so logV for each Relax() operation. But in relax(), after d[v]>d[u]+weight and decides to be relaxed, shouldn't we look up the position of v in queue Q before we call decrease_key(Q, pos, d[v]) to replace the key of pos with d[v]? note this look up itself costs O(V). so each Relax() should cost O(V), not O(logV), right?
A question regarding to space complexity: to compare the vertex in queue Q, I design a struct/class vertex with distance as one member and then I implement such as operator< to sort vertex by comparing their distance. but it seems I have to define a duplicate array dist[] in order to do dist[v] = dist[u]+weight in Relax(). If I do not define the duplicate array, I have to look up position of v and u in queue Q and then obtain and check their distance. is it suppose to work in this way? or maybe my implementation is not good?
Dijkstra's Algorithm (as you wrote it) does not have a runtime complexity unless you specify the datastructures. You are somehow right saying that "line 7" accounts with O(E) operations, but let's go through the lines (fortunately, Dijkstra is "easy" to analyze).
Initializing means "giving all vertices a infinite distance, except for the source, which has distance 0. Pretty easy, this can be done in O(V).
What is the set S good for? You use it "write only".
You put all elements to a queue. Here be dragons. What is a (priority!) queue? A datastructure with operations add, optionally decreaseKey (needed for Dijkstra), remove (not needed in Dijkstra), extractMin. Depending on the implementation, these operations have certain runtimes. For example, you can build a dumb PQ that is just a (marking) set - then adding and decreasing a key is constant time, but for extracting the minimum, you have to search. The canonical solution in Dijkstra is to use a queue (like a heap) that implements all relevant operations in O(log n). Let's analyze for this case, although technically speaking a Fibonacci-Heap would be better. Don't implement the queue on your own. It's amazing how much you can save by using a real PQ implementation.
You go through the loop n times.
Every time, you extract the minimum, which is in O(n log n) total (over all iterations).
What is the set S good for?
You go through the edges of each vertex at most once, i.e. you tough each edge at most twice, so in total you do whatever happens inside the loop O(E) times.
Relaxing means checking whether you have to decrease a key and do so. We already know that each such operation can add O(log V) in the queue (if it's a heap), and we have to do it O(E) times, so it'S O(E log V), which dominates the total runtime.
If you take a Fibonacci-Heap, you can go down to O(VlogV+E), but that's academic. Real implementations tune heaps. If you want to know your implementation's performance, analyze the PQ operations. But as I said, it's better to use existing implementations if you don't exactly know what your doing. Your idea of "looking up a position before calling decreaseKey" tells me you should digg deeper into that topic before you come up with an implementation which effectively takes O(V) per insert (by sorting every time some decreaseKey is called) or O(V) per extractMin (by finding the minimum on demand).
Related
I have an array of size n waiting to be sorted. But different from ordinary sorting problem, I'm constrained to use a specific comparator, which receives three numbers and tells the maximum and minimum of the three. My goal is to use the comparator as few times as possible before completely sorting the array. What strategy can I use?
Thanks for any help!
Since your three-way comparator can be implemented by three calls to a normal comparator, that means we can't improve on any normal sorting algorithm by a factor of more than 3. A more careful argument shows that, because each three-way comparison gives us log₂ 6 ≈ 2.585 bits of information, we can't improve by a factor of more than that. Intuitively, when sorting with a normal comparator you might compare a <= b and b <= c, and therefore not need to compare a and c anyway; so the possible speedup factor could be as small as 2.
So asymptotically, we're still looking for an O(n log n) algorithm, and the question is how to exploit the comparator to do fewer comparisons by at least a factor 2. The "obvious" thing to try first is modifying an existing comparison-based sorting algorithm; a good candidate is bottom-up heapsort, which does about n log₂ n comparisons in the average case, and 1.5 n log₂ n in the worst case (Wikipedia). This beats the standard quicksort algorithm, which does about 1.39 n log₂ n comparisons in the average case (Wikipedia).
The algorithm works using two basic operations on a heap, "sift down" and "sift up".
The "sift down" operation requires comparing a parent element with its two children, to see if the parent element is greater than or equal to both its children, or if not, which child the parent should be swapped with. We can use the three-way comparator to compare the parent with both children at once.
The "sift up" operation compares a child with its parent, and swaps them if they are out of order; this is then repeated all the way up to the root node. We can use the three-way comparator to compare the child node with its parent and its grandparent at once.
The heapsort algorithm only calls the comparator within those two operations, and for both operations the three-way comparator can be called fewer times by a factor of 2. This isn't necessarily the best you can do, but it starts from a very efficient algorithm, and matches the worst-case speedup factor given by intuition.
Well, I came up with an idea.
Let's remember how quicksort works:
First, we locate a sort of a median value (pick up (a[0] + a[N-1])/2 if you're too lazy =3).
Then, we divide an array
by two on the condition of being less or greater than median.
At last, we run the algorithm recursively on each of two subarrays
Using your comparator, you can speed up your second phase twice by processing two values at once:
compare(median, a[2 * i], a[2 * i + 1])
if min is median, both are greater and go to the right subarray
if max is median, both are less and go to the left subarray
if neither is median, min goes left, and max goes right
After that, run recursive part of the algorithm as usual.
Well, I get an brilliant idea. Using 4-way mergesort and loser tree to optimize, the times of using the comparator can be reduced to less than 0.5nlog₂n, by my rough estimate.
"The permutation p of n elements defined by an index permutation p(i) = (i + k) mod n is called the k-rotation." -- Stepanov & McJones
std::rotate has become a well known algorithm thanks to Sean Parent, but how to efficiently implement it for an arbitrary sequence of bits?
By efficient, I mean minimizes at least two things, i) the number of writes and ii) the worst-case space complexity.
That is, the input should be similar to std::rotate but bit-wise specific, I guess like this:
A pointer to the memory where the bit sequence starts.
Three bit indices: first, middle and last.
The type of the pointer could be any unsigned integer, and presumably the larger the better. (Boost.Dynamic Bitset calls it the "block".)
It's important to note that the indices may all be offset from the start of a block by different amounts.
According to Stepanov and McJones, rotate on random access data can be implemented in n + gcd(n, k) assignments. The algorithm that reverses each subrange followed by reversing the entire range takes 3n assignments. (However, I agree with the comments below that it is effectively 2n assignments.) Since the bits in an array can be accessed randomly, I assume the same optimal bound applies. Each assignment will usually require two reads because of different subrange block offsets but I'm less concerned about reads than writes.
Does an efficient or optimal implementation of this algorithm already exist out in the open source wild?
If not, how would one do it?
I've looked through Hacker's Delight and Volume 4A of Knuth but can't find an algorithm for it.
Using a vector<uint32_t>, for example, it's easy and reasonably efficient to do the fractional-element part of the rotation in one pass yourself (shift_amount%32), and then call std::rotate to do the rest. The fractional part is easy and only operates on adjacent elements, except at the ends, so you only need to remember one partial element while you're working.
If you want to do the whole thing yourself, then you can do the rotation by reversing the order of the entire vector, and then reversing the order of the front and back sections. The trick to doing this efficiently is that when you reverse the whole vector, you don't actually bit-reverse each element -- you just think of them as being in the opposite order. The reversal of the front and back sections is trickier and requires you to remember 4 partial elements while you work.
In terms of writes to memory or cache, both of the above methods make 2N writes. The optimal rotation you refer to in the question takes N, but if you extend it to work with fractional-word rotations, then each write spans two words and it then takes 2N writes. It provides no advantage and I think it would turn out to be complicated.
That said... I'm sure you could get closer to N writes with a fixed amount of register storage by doing m words at a time, but that's a lot of code for a simple rotation, though, and your time (or at least my time :) would be better spent elsewhere.
Just as the title, and BTW, it's just out of curiosity and it's not a homework question. It might seem to be trivial for people of CS major. The problem is I would like to find the indices of max value in an array. Basically I have two approaches.
scan over and find the maximum, then scan twice to get the vector of indices
scan over and find the maximum, along this scan construct indices array and abandon if a better one is there.
May I now how should I weigh over these two approaches in terms of performance(mainly time complexity I suppose)? It is hard for me because I have even no idea what the worst case should be for the second approach! It's not a hard problem perse. But I just want to know how to approach this problem or how should I google this type of problem to get the answer.
In term of complexity:
scan over and find the maximum,
then scan twice to get the vector of indices
First scan is O(n).
Second scan is O(n) + k insertions (with k, the number of max value)
vector::push_back has amortized complexity of O(1).
so a total O(2 * n + k) which might be simplified to O(n) as k <= n
scan over and find the maximum,
along this scan construct indices array and abandon if a better one is there.
Scan is O(n).
Number of insertions is more complicated to compute.
Number of clear (and number of element cleared) is more complicated to compute too. (clear's complexity would be less or equal to number of element removed)
But both have upper bound to n, so complexity is less or equal than O(3 * n) = O(n) but also greater than equal to O(n) (Scan) so it is O(n) too.
So for both methods, complexity is the same: O(n).
For performance timing, as always, you have to measure.
For your first method, you can set a condition to add the index to the array. Whenever the max changes, you need to clear the array. You don't need to iterate twice.
For the second method, the implementation is easier. You just find max the first go. Then you find the indices that match on the second go.
As stated in a previous answer, complexity is O(n) in both cases, and measures are needed to compare performances.
However, I would like to add two points:
The first one is that the performance comparison may depend on the compiler, how optimisation is performed.
The second point is more critical: performance may depend on the input array.
For example, let us consider the corner case: 1,1,1, .., 1, 2, i.e. a huge number of 1 followed by one 2. With your second approach, you will create a huge temporary array of indices, to provide at the end an array of one element. It is possible at the end to redefine the size of the memory allocated to this array. However, I don't like the idea to create a temporary unnecessary huge vector, independently of the time performance concern. Note that such a array could suffer of several reallocations, which would impact time performance.
This is why in the general case, without any knowledge on the input, I would prefer your first approach, two scans. The situation could be different if you want to implement a function dedicated to a specific type of data.
I have a graph with 2n vertices where every edge has a defined length. It looks like **
**.
I'm trying to find the length of the shortest path from u to v (smallest sum of edge lengths), with 2 additional restrictions:
The number of blue edges that the path contains is the same as the number of red edges.
The number of black edges that the path contains is not greater than p.
I have come up with an exponential-time algorithm that I think would work. It iterates through all binary combinations of length n - 1 that represent the path starting from u in the following way:
0 is a blue edge
1 is a red edge
There's a black edge whenever
the combination starts with 1. The first edge (from u) is then the first black one on the left.
the combination ends with 0. Then last edge (to v) is then the last black one on the right.
adjacent digits are different. That means we went from a blue edge to a red edge (or vice versa), so there's a black one in between.
This algorithm would ignore the paths that don't meet the 2 requirements mentioned earlier and calculate the length for the ones that do, and then find the shortest one. However doing it this way would probably be awfully slow and I'm looking for some tips to come up with a faster algorithm. I suspect it's possible to achieve with dynamic programming, but I don't really know where to start. Any help would be very appreciated. Thanks.
Seems like Dynamic Programming problem to me.
In here, v,u are arbitrary nodes.
Source node: s
Target node: t
For a node v, such that its outgoing edges are (v,u1) [red/blue], (v,u2) [black].
D(v,i,k) = min { ((v,u1) is red ? D(u1,i+1,k) : D(u1,i-1,k)) + w(v,u1) ,
D(u2,i,k-1) + w(v,u2) }
D(t,0,k) = 0 k <= p
D(v,i,k) = infinity k > p //note, for any v
D(t,i,k) = infinity i != 0
Explanation:
v - the current node
i - #reds_traversed - #blues_traversed
k - #black_edges_left
The stop clauses are at the target node, you end when reaching it, and allow reaching it only with i=0, and with k<=p
The recursive call is checking at each point "what is better? going through black or going though red/blue", and choosing the best solution out of both options.
The idea is, D(v,i,k) is the optimal result to go from v to the target (t), #reds-#blues used is i, and you can use up to k black edges.
From this, we can conclude D(s,0,p) is the optimal result to reach the target from the source.
Since |i| <= n, k<=p<=n - the total run time of the algorithm is O(n^3), assuming implemented in Dynamic Programming.
Edit: Somehow I looked at the "Finding shortest path" phrase in the question and ignored the "length of" phrase where the original question later clarified intent. So both my answers below store lots of extra data in order to easily backtrack the correct path once you have computed its length. If you don't need to backtrack after computing the length, my crude version can change its first dimension from N to 2 and just store one odd J and one even J, overwriting anything older. My faster version can drop all the complexity of managing J,R interactions and also just store its outer level as [0..1][0..H] None of that changes the time much, but it changes the storage a lot.
To understand my answer, first understand a crude N^3 answer: (I can't figure out whether my actual answer has better worst case than crude N^3 but it has much better average case).
Note that N must be odd, represent that as N=2H+1. (P also must be odd. Just decrement P if given an even P. But reject the input if N is even.)
Store costs using 3 real coordinates and one implied coordinate:
J = column 0 to N
R = count of red edges 0 to H
B = count of black edges 0 to P
S = side odd or even (S is just B%1)
We will compute/store cost[J][R][B] as the lowest cost way to reach column J using exactly R red edges and exactly B black edges. (We also used J-R blue edges, but that fact is redundant).
For convenience write to cost directly but read it through an accessor c(j,r,b) that returns BIG when r<0 || b<0 and returns cost[j][r][b] otherwise.
Then the innermost step is just:
If (S)
cost[J+1][R][B] = red[J]+min( c(J,R-1,B), c(J,R-1,B-1)+black[J] );
else
cost[J+1][R][B] = blue[J]+min( c(J,R,B), c(J,R,B-1)+black[J] );
Initialize cost[0][0][0] to zero and for the super crude version initialize all other cost[0][R][B] to BIG.
You could super crudely just loop through in increasing J sequence and whatever R,B sequence you like computing all of those.
At the end, we can find the answer as:
min( min(cost[N][H][all odd]), black[N]+min(cost[N][H][all even]) )
But half the R values aren't really part of the problem. In the first half any R>J are impossible and in the second half any R<J+H-N are useless. You can easily avoid computing those. With a slightly smarter accessor function, you could avoid using the positions you never computed in the boundary cases of ones you do need to compute.
If any new cost[J][R][B] is not smaller than a cost of the same J, R, and S but lower B, that new cost is useless data. If the last dim of the structure were map instead of array, we could easily compute in a sequence that drops that useless data from both the storage space and the time. But that reduced time is then multiplied by log of the average size (up to P) of those maps. So probably a win on average case, but likely a loss on worst case.
Give a little thought to the data type needed for cost and the value needed for BIG. If some precise value in that data type is both as big as the longest path and as small as half the max value that can be stored in that data type, then that is a trivial choice for BIG. Otherwise you need a more careful choice to avoid any rounding or truncation.
If you followed all that, you probably will understand one of the better ways that I thought was too hard to explain: This will double the element size but cut the element count to less than half. It will get all the benefits of the std::map tweak to the basic design without the log(P) cost. It will cut the average time way down without hurting the time of pathological cases.
Define a struct CB that contains cost and black count. The main storage is a vector<vector<CB>>. The outer vector has one position for every valid J,R combination. Those are in a regular pattern so we could easily compute the position in the vector of a given J,R or the J,R of a given position. But it is faster to keep those incrementally so J and R are implied rather than directly used. The vector should be reserved to its final size, which is approx N^2/4. It may be best if you pre compute the index for H,0
Each inner vector has C,B pairs in strictly increasing B sequence and within each S, strictly decreasing C sequence . Inner vectors are generated one at a time (in a temp vector) then copied to their final location and only read (not modified) after that. Within generation of each inner vector, candidate C,B pairs will be generated in increasing B sequence. So keep the position of bestOdd and bestEven while building the temp vector. Then each candidate is pushed into the vector only if it has a lower C than best (or best doesn't exist yet). We can also treat all B<P+J-N as if B==S so lower C in that range replaces rather than pushing.
The implied (never stored) J,R pairs of the outer vector start with (0,0) (1,0) (1,1) (2,0) and end with (N-1,H-1) (N-1,H) (N,H). It is fastest to work with those indexes incrementally, so while we are computing the vector for implied position J,R, we would have V as the actual position of J,R and U as the actual position of J-1,R and minU as the first position of J-1,? and minV as the first position of J,? and minW as the first position of J+1,?
In the outer loop, we trivially copy minV to minU and minW to both minV and V, and pretty easily compute the new minW and decide whether U starts at minU or minU+1.
The loop inside that advances V up to (but not including) minW, while advancing U each time V is advanced, and in typical positions using the vector at position U-1 and the vector at position U together to compute the vector for position V. But you must cover the special case of U==minU in which you don't use the vector at U-1 and the special case of U==minV in which you use only the vector at U-1.
When combining two vectors, you walk through them in sync by B value, using one, or the other to generate a candidate (see above) based on which B values you encounter.
Concept: Assuming you understand how a value with implied J,R and explicit C,B is stored: Its meaning is that there exists a path to column J at cost C using exactly R red branches and exactly B black branches and there does not exist exists a path to column J using exactly R red branches and the same S in which one of C' or B' is better and the other not worse.
Your exponential algorithm is essentially a depth-first search tree, where you keep track of the cost as you descend.
You could make it branch-and-bound by keeping track of the best solution seen so far, and pruning any branches that would go beyond the best so far.
Or, you could make it a breadth-first search, ordered by cost, so as soon as you find any solution, it is among the best.
The way I've done this in the past is depth-first, but with a budget.
I prune any branches that would go beyond the budget.
Then I run if with budget 0.
If it doesn't find any solutions, I run it with budget 1.
I keep incrementing the budget until I get a solution.
This might seem like a lot of repetition, but since each run visits many more nodes than the previous one, the previous runs are not significant.
This is exponential in the cost of the solution, not in the size of the network.
The common interview problem of determining the missing value in a range from 1 to N has been done a thousand times over. Variations include 2 missing values up to K missing values.
Example problem: Range [1,10] (1 2 4 5 7 8 9 10) = {3,6}
Here is an example of the various solutions:
Easy interview question got harder: given numbers 1..100, find the missing number(s)
My question is that seeing as the simple case of one missing value is of O(n) complexity and that the complexity of the larger cases converge at roughly something larger than O(nlogn):
Couldn't it just be easier to answer the question by saying sort (mergesort) the range and iterate over it observing the missing elements?
This solution should take no more than O(nlogn) and is capable of solving the problem for ranges other than 1-to-N such as 10-to-1000 or -100 to +100 etc...
Is there any reason to believe that the given solutions in the above SO link will be better than the sorting based solution for larger number of missing values?
Note: It seems a lot of the common solutions to this problem, assume an only number theoretic approach. If one is being asked such a question in an S/E interview wouldn't it be prudent to use a more computer science/algorithmic approach, assuming the approach is on par with the number theoretic solution's complexity...
More related links:
https://mathoverflow.net/questions/25374/duplicate-detection-problem
How to tell if an array is a permutation in O(n)?
You are only specifying the time complexity, but the space complexity is also important to consider.
The problem complexity can be specified in term of N (the length of the range) and K (the number of missing elements).
In the question you link, the solution of using equations is O(K) in space (or perhaps a bit more ?), as you need one equation per unknown value.
There is also the preservation point: may you alter the list of known elements ? In a number of cases this is undesirable, in which case any solution involving reordering the elements, or consuming them, must first make a copy, O(N-K) in space.
I cannot see faster than a linear solution: you need to read all known elements (N-K) and output all unknown elements (K). Therefore you cannot get better than O(N) in time.
Let us break down the solutions
Destroying, O(N) space, O(N log N) time: in-place sort
Preserving, O(K) space ?, O(N log N) time: equation system
Preserving, O(N) space, O(N) time: counting sort
Personally, though I find the equation system solution clever, I would probably use either of the sorting solutions. Let's face it: they are much simpler to code, especially the counting sort one!
And as far as time goes, in a real execution, I think the "counting sort" would beat all other solutions hands down.
Note: the counting sort does not require the range to be [0, X), any range will do, as any finite range can be transposed to the [0, X) form by a simple translation.
EDIT:
Changed the sort to O(N), one needs to have all the elements available to sort them.
Having had some time to think about the problem, I also have another solution to propose. As noted, when N grows (dramatically) the space required might explode. However, if K is small, then we could change our representation of the list, using intervals:
{4, 5, 3, 1, 7}
can be represented as
[1,1] U [3,5] U [7,7]
In the average case, maintaining a sorted list of intervals is much less costly than maintaining a sorted list of elements, and it's as easy to deduce the missing numbers too.
The time complexity is easy: O(N log N), after all it's basically an insertion sort.
Of course what's really interesting is that there is no need to actually store the list, thus you can feed it with a stream to the algorithm.
On the other hand, I have quite a hard time figuring out the average space complexity. The "final" space occupied is O(K) (at most K+1 intervals), but during the construction there will be much more missing intervals as we introduce the elements in no particular order.
The worst case is easy enough: N/2 intervals (think odd vs even numbers). I cannot however figure out the average case though. My gut feeling is telling me it should be better than O(N), but I am not that trusting.
Whether the given solution is theoretically better than the sorting one depends on N and K. While your solution has complexity of O(N*log(N)), the given solution is O(N*K). I think that the given solution is (same as the sorting solution) able to solve any range [A, B] just by transforming the range [A, B] to [1, N].
What about this?
create your own set containing all the numbers
remove the given set of numbers from your set (no need to sort)
What's left in your set are the missing numbers.
My question is that seeing as the [...] cases converge at roughly
something larger than O(nlogn) [...]
In 2011 (after you posted this question) Caf posted a simple answer that solves the problem in O(n) time and O(k) space [where the array size is n - k].
Importantly, unlike in other solutions, Caf's answer has no hidden memory requirements (using bit array's, adding numbers to elements, multiplying elements by -1 - these would all require O(log(n)) space).
Note: The question here (and the original question) didn't ask about the streaming version of the problem, and the answer here doesn't handle that case.
Regarding the other answers: I agree that many of the proposed "solutions" to this problem have dubious complexity claims, and if their time complexities aren't better in some way than either:
count sort (O(n) time and space)
compare (heap) sort (O(n*log(n)) time, O(1) space)
...then you may as well just solve the problem by sorting.
However, we can get better complexities (and more importantly, genuinely faster solutions):
Because the numbers are taken from a small, finite range, they can be 'sorted' in linear time.
All we do is initialize an array of 100 booleans, and for each input, set the boolean corresponding to each number in the input, and then step through reporting the unset booleans.
If there are total N elements where each number x is such that 1 <= x <= N then we can solve this in O(nlogn) time complexity and O(1) space complexity.
First sort the array using quicksort or mergesort.
Scan through the sorted array and if the difference between previously scanned number, a and current number, b is equal to 2 (b - a = 2), then the missing number is a+1. This can be extended to condition where (b - a > 2).
Time complexity is O(nlogn)+O(n) almost equal to O(nlogn) when N > 100.
I already answered it HERE
You can also create an array of boolean of the size last_element_in_the_existing_array + 1.
In a for loop mark all the element true that are present in the existing array.
In another for loop print the index of the elements which contains false AKA The missing ones.
Time Complexity: O(last_element_in_the_existing_array)
Space Complexity: O(array.length)
If the range is given to you well ahead, in this case range is [1,10] you can perform XOR operation with your range and the numbers given to you. Since XOR is commutative operation. You will be left with {3,6}
(1 2 3 4 5 6 7 8 9 10) XOR (1 2 4 5 7 8 9 10) ={3,6}