most efficient algorithm to get union of 2 ordered lists - c++

I need to find the union of 2 descending ordered lists (list1 and list2), where the union
would be each element from both lists without duplicates. Assume the list elements are integers. I am
using big O notation to determine the most efficient algorithm to solve this problem. I know the big
O notation for the 1st, but I do not know the big O notation for the 2nd. Can someone tell me the
big O notation of the 2nd algorithm so I can decide which algorithm to implement? If someone knows a
better algorithm than one of these, could you help me understand that as well? Thanks in advance.
Here are my two algorithms. . .
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Algorithm #1: O(N * log base2 N)
Starting at the first element of list1,
while(list1 is not at the end of the list) {
if(the current element in list1 is not in list2) // Binary Search -> O(log base2 N)
add the current element in list1 to list2
go to the next element in list1 }
list2 is now the union of the 2 lists
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Algorithm #2: O(?)
Starting at the first elements of each list,
LOOP_START:
compare the current elements of the lists
whichever element is greater, put into a 3rd list called list3
go to the next element in the list whose element was just inserted into list3
branch to LOOP_START until either list1 or list2 are at the end of their respective list
insert the remaining elements from either list1 or list2 into list3 (the union)
list3 now contains the union of list1 and list2

Here's my assessment of the situation
Your first algorithm runs in n log n time: you are doing the binary search for every element in the first list, right?
Your second algorithm is not entirely complete: you don't say what to do if the elements in the two lists are equal. However, given the right logic for dealing with equal elements, your second algorithm is like the merge part of the merge sort: it will run in linear time (i.e. N). It is optimal, in a sense that you cannot do better than that: you cannot merge two ordered lists without looking at every element in both list at least once.

The second is O(n+m) while the first is O(n log(m) + m). Thus the second is significantly better.

With the following algorithm you can have the two lists merged in O(n+m).
[Sorry, I have used python for simplicity, but the algorithm is the same in every language]
Note that the algorithm also maintains the items sorted in the result list.
def merge(list1, list2):
result = []
i1 = 0;
i2 = 0;
#iterate over the two lists
while i1 < len(list1) and i2 < len(list2):
 #if the current items are equal, add just one and go to the next two items
if list1[i1] == list2[i2]:
result.append(list1[i1])
i1 += 1
i2 += 1
#if the item of list1 is greater than the item of list2, add it and go to next item of list1
elif list1[i1] > list2[i2]:
result.append(list1[i1])
i1 += 1
#if the item of list2 is greater than the item of list1, add it and go to next item of list2
else:
result.append(list2[i2])
i2 += 1
#Add the remaining items of list1
while i1 < len(list1):
result.append(list1[i1])
i1 += 1
#Add the remaining items of list2
while i2 < len(list2):
result.append(list2[i2])
i2 += 1
return result
print merge([10,8,5,1],[12,11,7,5,2])
Output:
[12, 11, 10, 8, 7, 5, 2, 1]

Complexity Analysis:
Say the length of list 1 is N and that of list 2 is M.
Algorithm 1:
At the risk of sounding incredible, I would accept that according to me the complexity of this algorithm as such is N * M and not NlogM.
For each element in list 1 (O(N)), we are searching it in list 2 (O(logM). The complexity of this algorithm 'seems' O(NlogM).
However, we are also inserting the element in list 2. This new element should be inserted in proper place so that the list 2 remains sorted for further binary search operations. If we are using array as the data structure, then the insertion would take O(M) time.
Hence the order of complexity is O(N*M) for the algorithm as is.
A modification can be done, wherein the new element is inserted at the end of the list 2 (the list is then no more ordered) and we carry out the binary search operation from index 0 to M-1 rather than the new size-1. In this case the complexity shall be O(N*logM) since we shall carry out N binary searches in the list of length M.
To make the list ordered again, we will have to merge the two ordered parts (0 to M-1 and M to newSize-1). This can be done in O(N+M) time (one merge operation in merge sort of array length N+M). Hence the net time complexity of this algorithm shall be
O(NlogM + N + M)
Space complexity is O(max(N,M)) not considering the original lists space and only considering the extra space required in list 2.
Algorithm 2:
At each iteration, we are moving atleast 1 pointer forward. The total distance to travel by both pointers is N + M. Hence the order of time complexity in worst case is O(N+M) which is better than 1st algorithm.
However, the space complexity required in this case is larger (O(N+M)).

Here is another approach:
Iterate through both lists, and insert all the values into a set.
This will remove all duplicates and the result will be the union of two lists.
Two important notes: You'll loose the order of the numbers. Also, it takes additional space.
Time complexity: O(n + m)
Space Complexity: O(n + m)
If you need to maintain order of the result set, use some custom version of LinkedHashMap.

Actually, algorithm 2 should not work if the input lists are not sorted.
To sort the array it is order O(m*lg(m)+ n*lg(n))
You can build a hash table on the first list, then for each item from the second list, you check if this item exists in the hash table. This works in O(m+n).

There are a few things that need to be specified:
Do the input lists contain duplicates?
Must the result be ordered?
I'll assume that, using std::list, you can cheaply insert at the head or at the tail.
Let's say List 1 has N elements and List 2 has M elements.
Algorithm 1
It iterates over every item of List 1 searching for it in List 2.
Assuming that there may be duplicates and that the result must be ordered, the worse case time for the search is that no element in List 1 exists in List 2, hence it's at least:
O(N × M).
To insert the item of List 1 in the right place, you need to iterate List 2 again until the point of insertion. The worse case will be when every item in List 1 is smaller (if List 2 is searched from the beginning) or greater (if List 2 is searched from the end). Since the previous items of List 1 have been inserted in List 2, there would be M iterations for the first item, M + 1 for the second, M + 2 for the third, etc. and M + N - 1 iterations for the last item, for an average of M + (N - 1) / 2 per item.
Something like:
N × (M + (N - 1) / 2)
For big-O notation, constant factors don't matter, so:
N × (M + (N - 1))
For big-O notation, non-variable additions don't matter, so:
O(N × (M + N))
Adding to the original O(N × M):
O(N × M) + O(N × (M + N))
O(N × M) + O(N × M + N2)
The second equation is just to make the constant factor elimination evident, e.g. 2 × (N × M), thus:
O(N × (M + N))
O(N2 + N × M)
These two are equivalent, which ever you like the most.
Possible optimizations:
If the result doesn't have to be ordered, insertion can be O(1), hence the worse time case is:
O(N × M)
Don't just test each List 1 item in List 2 by equality, test if each item by e.g. greater than, so that you can stop searching in List 2 when List 1's item is greater than List 2's item; this wouldn't reduce the worse case, but it would reduce the average case
Keep the List 2 iterator that points to where List 1's item was found to be greater than List 2's item, to make the sorted insertion O(1); on insertion make sure to keep an iterator that starts at the inserted item, because although List 1 is ordered, it might contain duplicates; with these two, the worse time case becomes:
O(N × M)
For the next iterations, search for List 1's item in the rest of List 2 with the iterator we kept; this reduces the worse case, because if you reach the end of List 2, you'll be just "removing" duplicates from List 1; with these three, the worse time case becomes:
O(N + M)
By this point, the only difference between this algorithm and Algorithm 2 is that List 2 is changed to contain the result, instead of creating a new list.
Algorithm 2
This is the merging of the merge sort.
You'll be walking every element of List 1 and every element of List 2 once, and insertion is always made at the head or tail of the list, hence the worse case time is:
O(N + M)
If there are duplicates, they're simply discarded. The result is more easily made ordered than not.
Final Notes
If there are no duplicates, insertion can be optimized in both cases. For instance, with doubly-linked lists, we can easily check if the last element in List 1 is greater than the first element in List 2 or vice-versa, and simply concatenate the lists.
This can be further generalized for any tail of List 1 and List 2. For instance, in Algorithm 1, if a List 1's item is not found in List 2, we can concatenate List 2 and the tail of List 1. In Algorithm 2, this is done in the last step.
The worse case, when List 1's items and List 2's items are interleaved, is not reduced, but again the average case is reduced, and in many cases by a big factor that makes a big difference In Real Life™.
I ignored:
Allocation times
Worse space differences in the algorithms
Binary search, because you mentioned lists, not arrays or trees
I hope I didn't make any blatant mistake.

I had implemented a typescript(js) based implementation of Union operation of 2 arrays of object in one of my previous projects. The data was too large and the default library functions like underscore or lodash were not optimistic. After some brain hunting i came up with the below binary search based algorithm. Hope it might help someone for performance tuning.
As far as complexity is concerned, the algorithm is binary search based and will end up to be O(log(N)).
Basically the code takes two unordered object arrays and a keyname to compare and:
1) sort the arrays
2) iterate through each element of first array and delete it in second array
3) concatenate resulting second array into first array.
private sortArrays = (arr1: Array<Object>, arr2: Array<Object>, propertyName: string): void => {
function comparer(a, b) {
if (a[propertyName] < b[propertyName])
return -1;
if (a[propertyName] > b[propertyName])
return 1;
return 0;
}
arr1.sort(comparer);
arr2.sort(comparer);
}
private difference = (arr1: Array<Object>, arr2: Array<Object>, propertyName: string): Array<Object> => {
this.sortArrays(arr1, arr2, propertyName);
var self = this;
for (var i = 0; i < arr1.length; i++) {
var obj = {
loc: 0
};
if (this.OptimisedBinarySearch(arr2, arr2.length, obj, arr1[i], propertyName))
arr2.splice(obj.loc, 1);
}
return arr2;
}
private OptimisedBinarySearch = (arr, size, obj, val, propertyName): boolean => {
var first, mid, last;
var count;
first = 0;
last = size - 1;
count = 0;
if (!arr.length)
return false;
while (arr[first][propertyName] <= val[propertyName] && val[propertyName] <= arr[last][propertyName]) {
mid = first + Math.floor((last - first) / 2);
if (val[propertyName] == arr[mid][propertyName]) {
obj.loc = mid;
return true;
}
else if (val[propertyName] < arr[mid][propertyName])
last = mid - 1;
else
first = mid + 1;
}
return false;
}
private UnionAll = (arr1, arr2, propertyName): Array<Object> => {
return arr1.concat(this.difference(arr1, arr2, propertyName));
}
//example
var YourFirstArray = [{x:1},{x:2},{x:3}]
var YourSecondArray= [{x:0},{x:1},{x:2},{x:3},{x:4},{x:5}]
var keyName = "x";
this.UnionAll(YourFirstArray, YourSecondArray, keyName)

Related

Moving window RMQ performance improvement

Say I have an array of integers A of length N, also I have an integer L <= N.
What I am trying to find is the minimum of the range [0, L-1], [1,L], [2,L+1]....[N-L,N-1]
(like a moving window of length L from left to right)
My algorithm now is O(N lg N) with O(N lg N) preprocess:
Save all numbers A[0...L-1] in a multi-set S, also store the number in a queue Q in order. The minimum of [0, L-1] is simply the first element of S. O(N lg N)
Pop out the first element of Q, find this element in S and delete it. Then push A[L] in S. The minimum of [1, L] is simply the first element of S. O(lg N)
Repeat step 2 for all possible range, move to next element each iteration. O(N)
Total is O(N lg N).
I wonder if there is any algorithm which can achieve better than this with following requirements:
Preprocess time (If needed) is O(N)
Query time if O(1)
I have done some research on RMQ, the nearest method I found is using sparse table which achieve O(1) query time but O(N lg N) preprocess time. Another method which reduce RMQ to LCA problem can meet the requirements but it needs some restriction on the array A.
So is it possible that, with no restriction on A, the requirements can be fulfilled when solving my problem?
Yes, use a deque. We will keep the elements sorted ascendingly, so the first element is always the minimum in [i - L + 1, i], for the current position i. We won't keep actual elements, but their positions.
d = empty deque
for i = 0 to n-1:
// get rid of too old elements
while !d.empty && i - d.front + 1 > L:
d.pop_front()
// keep the deque sorted
while !d.empty && A[d.back] > A[i]
d.pop_back()
d.push_back(i)
// A[d.front] is the minimum in `[i - L + 1, i]
Since every element enters and leaves the deque at most once, this is O(n).

How can I merge, split and query k-th of sorted lists?

Initially I have n elements, they are in n tiles.
I need to support 3 kinds of queries:
merge two tiles into one tile.
split one tile into two tiles. (Formally for a tile of size k, split it into two tiles of size k1 and k2, k=k1+k2, the first tile contains the smallest k1 elements and the second tile contains the rest)
find the k-th smallest element in one tile.
Still assuming there are n queries. What worst-case time complexity can I achieve?
That will not be a complete answer, but some thoughts on what can be done.
My idea is based on skip list.
Let every tile be an indexable sorted skip list.
Splitting then rather simple: find k-th element and break every link between i > k1-th and j <= k1-th elements (there are at most O(log n) such links).
Merging is trickier.
First, assume that we can concatenate two skiplists in O(log n).
Lets say we are merging two tiles T1 and T2.
Compare the first elements t1 from T1 and t2 from T2. Let's
say t1 < t2
Then, find the last t1' still less than t2 in T1.
We must insert t2 right after t1'. But first, we are looking at the element t1* right after t1' in T1.
Now search for the last t2' still less than t1* in T2.
An entire sequence of elements from T2, starting at t2 and ending at t2', must be inserted between t1' and t1*.
So, we are doing split at t1' and t2', obtaining new lists T1a, T1b, T2a, T2b.
We concatenating T1a, T2a and T1b, obtaining the new list T1*.
We are repeating the entire process for the T1* and T2b.
In some pseudo-python-code:
#skiplist interface:
# split(list, k) - splits list after the k-th element, returns two lists
# concat(list1, list2) - concatenates two lists, returns the new one
# index(list, k) - returns k-th element from the list
# upper_bound(list, val) - returns the index of the last element less that val
# empty(list) - check if list is empty
def Query(tile, k)
return index(tile, k)
def Split(tile, k)
return split(tile, k)
def Merge(tile1, tile2):
if empty(tile1):
return tile2
if empty(tile2):
return tile1
t1 = index(tile1, 0)
t2 = index(tile2, 0)
if t1 < t2:
#(1)
i1 = upper_bound(tile1, t2)
t1s = index(tile1, i1 + 1)
i2 = upper_bound(tile2, t1s)
t1_head, t1_tail = split(tile1, i1)
t2_head, t2_tail = split(tile2, i2)
head = concat(t1_head, t2_head)
tail = Merge(t1_tail, t2_tail)
return concat(head, tail)
else:
#swap tile1, tile2, do (1)
There are at most O(p) such iterations, where p is the number of interleaved runs in T1 and T2. Every iteration takes O(log n) operations to complete.
As it was noted by #newbie, there is an example where the sum of ps equals to n log n.
This python script generates such an example for k = log_2 n (plus sign in the output stands for merge):
def f(l):
if len(l) == 2:
return "%s+%s" % (l[0], l[1])
if len(l) == 1:
return str(l[0])
l1 = [l[i] for i in xrange(0, len(l), 2)]
l2 = [l[i + 1] for i in xrange(0, len(l), 2)]
l_str = f(l1)
r_str = f(l2)
return "(%s)+(%s)" % (l_str, r_str)
def example(k):
print f(list(range(0, 2 ** k)))
For n = 16:
example(4)
Gives us the following queries:
(
(
(0+8)+(4+12)
)
+
(
(2+10)+(6+14)
)
)
+
(
(
(1+9)+(5+13)
)
+
(
(3+11)+(7+15)
)
)
Which is a binary tree where we are merging 2^(k-j) number of 2^j-sized tiles in the height j. Tiles are constructed in such a way that their elements are always interleaved, so for the tiles of size q we are doing O(q) splits-concatenations.
However, it still doesn't worsen the overall complexity of O(n log n) for this specific case, as (highly informally speaking) each split-concatenation of the 'small' lists costs less than O(log n) and there are much more 'small' lists than 'big'.
I'm not sure if there are worse counterexamples, but for now I think the overall worst case complexity for n queries is somewhere between n log^2 n and n log n.
Look for:
std::merge or std::set_union
std::partition
std::find (or std::find_if)
Linear complexity for 1 and 2.
Depends on your container for 3, linear at worst.
But it's not clear what you're asking exactly. Do you have some code we can look at ?
By the time I asked this question I don't know how to solve it, since it seems that it's okay to answer my own question, I gonna answer this question myself :/
First, let's suppose the values in the sorted lists are integers between 1~n. If not, you may just sort and map them.
Let's build a segment tree for every sorted list, segment trees are built based on values (1~n). In every node of a segment tree stores how many numbers are in this range, let's call this the value of a node.
It seems that it requires O(nlogn) space to store every segment tree, but we can simply drop the nodes that value=0, and really allocate these nodes only when their value become >0.
So for a sorted list with only one element, we simply build a chain of this value, so only O(logn) memory is needed.
int s[SZ]/*value of a node*/,
ch[SZ][2]/*a node's two children*/;
//make a seg with only node p, return in the first argument
//call with sth. like build(root,1,n,value);
void build(int& x,int l,int r,int p)
{
x=/*a new node*/; s[x]=1;
if(l==r) return;
int m=(l+r)>>1;
if(p<=m) build(ch[x][0],l,m,p);
else build(ch[x][1],m+1,r,p);
}
When we split a segment tree (sorted list), simply split two children recursively:
//make a new node t2, split t1 to t1 and t2 so that s[t1]=k
void split(int t1,int& t2,int k)
{
t2=/*a new node*/;
int ls=s[ch[t1][0]]; //size of t1's left child
if(k>ls) split(ch[t1][1],ch[t2][1],k-ls); //split the right child of t1
else swap(ch[t1][1],ch[t2][1]); //all right child belong to t2
if(k<ls) split(ch[t1][0],ch[t2][0],k); //split the left child of t1
s[t2]=s[t1]-k; s[t1]=k;
}
When we merge two sorted lists, merge them forcely:
//merge trees t1&t2, return merged segment tree
int merge(int t1,int t2)
{
if(t1&&t2);else return t1^t2; //nothing to merge
ch[t1][0]=merge(ch[t1][0],ch[t2][0]);
ch[t1][1]=merge(ch[t1][1],ch[t2][1]);
s[t1]+=s[t2]; /*erase t2, it's useless now*/ return t1;
}
It looks very simple, isn't it? But its total complexity is in fact O(nlogn).
Proof:
Let's investigate the total number of allocated segment tree nodes.
Initially we will allocate O(nlogn) such nodes (O(logn) for each).
For every spliting attempt we will allocate at most O(logn) more, so it will also be O(nlogn) in total. The reason is obviously we will recursively split only either left child or right child of a node.
So the total number of allocated segment tree nodes will at most be only O(nlogn).
Let's consider merging, except for 'nothing to merge', every time we call merge, the total number of allocated segment tree nodes will decrease by 1 (t2 isn't useful anymore). Obviously 'nothing to merge' will only be called when its father is really merged, so they will have nothing to do with complexity.
the total number of allocated segment tree nodes is O(nlogn), for every useful merge it will decrease 1, so the total complexity of all merges is O(nlogn).
Summing up, and we've got the result.
Querying k-th is also very simple, and we've done :)
//query k-th of segment tree x[l,r]
int ask(int x,int l,int r,int k)
{
if(l==r) return l;
int ls=s[ch[x][0]]; //how many nodes in left child
int m=(l+r)>>1;
if(k>ls) return ask(ch[x][1],m+1,r,k-ls);
return ask(ch[x][0],l,m,k);
}

Big 0 notation for duplicate function, C++

What is the Big 0 notation for the function description in the screenshot.
It would take O(n) to go through all the numbers but once it finds the numbers and removes them what would that be? Would the removed parts be a constant A? and then would the function have to iterate through the numbers again?
This is what I am thinking for Big O
T(n) = n + a + (n-a) or something involving having to iterate through (n-a) number of steps after the first duplicate is found, then would big O be O(n)?
Big O notation is considering the worst case. Let's say we need to remove all duplicates from the array A=[1..n]. The algorithm will start with the first element and check every remaining element - there are n-1 of them. Since all values happen to be different it won't remove any from the array.
Next, the algorithm selects the second element and checks the remaining n-2 elements in the array. And so on.
When the algorithm arrives at the final element it is done. The total number of comparisions is the sum of (n-1) + (n-2) + ... + 2 + 1 + 0. Through the power of maths, this sum becomes (n-1)*n/2 and the dominating term is n^2 so the algorithm is O(n^2).
This algorithm is O(n^2). Because for each element in the array you are iterating over the array and counting the occurrences of that element.
foreach item in array
count = 0
foreach other in array
if item == other
count += 1
if count > 1
remove item
As you see there are two nested loops in this algorithm which results in O(n*n).
Removed items doesn't affect the worst case. Consider an array containing unique elements. No elements is being removed in this array.
Note: A naive implementation of this algorithm could result in O(n^3) complexity.
You started with first element you will go through all elements in the vector thats n-1 you will do that for n time its (n * n-1)/2 for worst case n time is the best case (all elements are 4)

What do move and key comparison mean in c++?

Followings are written in a ppt about Insertion Sort in my class:
void insertionSort(DataType theArray[], int n) {
for (int unsorted = 1; unsorted < n; ++unsorted) {
DataType nextItem = theArray[unsorted];
int loc = unsorted;
for (;(loc > 0) && (theArray[loc-1] > nextItem); --loc)
theArray[loc] = theArray[loc-1];
theArray[loc] = nextItem;
}
}
-
Running time depends on not only the size of the array but also the contents of the array.
Best-case:  O(n)
Array is already sorted in ascending order.
Inner loop will not be executed.
>>>> The number of moves: 2*(n-1)  O(n)
>>>> The number of key comparisons: (n-1)  O(n)
Worst-case:  O(n2)
Array is in reverse order:
Inner loop is executed p-1 times, for p = 2,3, …, n
The number of moves: 2*(n-1)+(1+2+...+n-1)= 2*(n-1)+ n*(n-1)/2  O(n2)
The number of key comparisons: (1+2+...+n-1)= n*(n-1)/2  O(n2)
Average-case:  O(n2)
We have to look at all possible initial data organizations.
So, Insertion Sort is O(n2)
What exacly are move and key comparison?? I couldn't find an explanaiton on Google.
Let me word the algorithm first.
Assume at a given time there are two part of array. index 0 to index loc - 1 is sorted in ascending order and index loc to n - 1 is unsorted.
Start with element at loc, find its correct place in sorted part of the array and insert it there.
So now there are two loops:
First outer loop, starts with loc = 1 to loc = n, basically partitions the array in sorted and unsorted part.
Second inner loop finds position of element at loc in the sorted part of array ( 0 to loc - 1).
For the inner loop, to find correct location, you have to compare element at loc with, in worst case, all the elements in sorted part of array. This is key comparison.
To insert, you have to create a void in sorted part of the array for element at loc. This is done by swapping each element in sorted part to the next element. This is move.
Move is the number of swaps it has to perform in order to sort the data and the keys are the data that is compered.

O(log n) algorithm to find the element having rank i in union of pre-sorted lists

Given two sorted lists, each containing n real numbers, is there a O(log n) time algorithm to compute the element of rank i (where i coresponds to index in increasing order) in the union of the two lists, assuming the elements of the two lists are distinct?
EDIT:
#BEN: This i s what I have been doing , but I am still not getting it.
I have an examples ;
List A : 1, 3, 5, 7
List B : 2, 4, 6, 8
Find rank(i) = 4.
First Step : i/2 = 2;
List A now contains is A: 1, 3
List B now contains is B: 2, 4
compare A[i] to B[i] i.e
A[i] is less;
So the lists now become :
A: 3
B: 2,4
Second Step:
i/2 = 1
List A now contains A:3
List B now contains B:2
NoW I HAVE LOST THE VALUE 4 which is actually the result ...
I know I am missing some thing , but even after close to a day of thinking I cant just figure this one out...
Yes:
You know the element lies within either index [0,i] of the first list or [0,i] of the second list. Take element i/2 from each list and compare. Proceed by bisection.
I'm not including any code because this problem sounds a lot like homework.
EDIT: Bisection is the method behind binary search. It works like this:
Assume i = 10; (zero-based indexing, we're looking for the 11th element overall).
On the first step, you know the answer is either in list1(0...10) or list2(0...10). Take a = list1(5) and b = list2(5).
If a > b, then there are 5 elements in list1 which come before a, and at least 6 elements in list2 which come before a. So a is an upper bound on the result. Likewise there are 5 elements in list2 which come before b and less than 6 elements in list1 which come before b. So b is an lower bound on the result. Now we know that the result is either in list1(0..5) or list2(5..10). If a < b, then the result is either in list1(5..10) or list2(0..5). And if a == b we have our answer (but the problem said the elements were distinct, therefore a != b).
We just repeat this process, cutting the size of the search space in half at each step. Bisection refers to the fact that we choose the middle element (bisector) out of the range we know includes the result.
So the only difference between this and binary search is that in binary search we compare to a value we're looking for, but here we compare to a value from the other list.
NOTE: this is actually O(log i) which is better (at least no worse than) than O(log n). Furthermore, for small i (perhaps i < 100), it would actually be fewer operations to merge the first i elements (linear search instead of bisection) because that is so much simpler. When you add in cache behavior and data locality, the linear search may well be faster for i up to several thousand.
Also, if i > n then rely on the fact that the result has to be toward the end of either list, your initial candidate range in each list is from ((i-n)..n)
Here is how you do it.
Let the first list be ListX and the second list be ListY. We need to find the right combination of ListX[x] and ListY[y] where x + y = i. Since x, y, i are natural numbers we can immediately constrain our problem domain to x*y. And by using the equations max(x) = len(ListX) and max(y) = len(ListY) we now have a subset of x*y elements in the form [x, y] that we need to search.
What you will do is order those elements like so [i - max(y), max(y)], [i - max(y) + 1, max(y) - 1], ... , [max(x), i - max(x)]. You will then bisect this list by choosing the middle [x, y] combination. Since the lists are ordered and distinct you can test ListX[x] < ListY[y]. If true then we bisect the upper half our [x, y] combinations or if false then we bisect the lower half. You will keep bisecting until find the right combination.
There are a lot of details I left, but that is the general gist of it. It is indeed O(log(n))!
Edit: As Ben pointed out this actually O(log(i)). If we let n = len(ListX) + len(ListY) then we know that i <= n.
When merging two lists, you're going to have to touch every element in both lists. If you don't touch every element, some elements will be left behind. Thus your theoretical lower bound is O(n). So you can't do it that way.
You don't have to sort, since you have two lists that are already sorted, and you can maintain that ordering as part of the merge.
edit: oops, I misread the question. I thought given value, you want to find rank, not the other way around. If you want to find rank given value, then this is how to do it in O(log N):
Yes, you can do this in O(log N), if the list allows O(1) random access (i.e. it's an array and not a linked list).
Binary search on L1
Binary search on L2
Sum the indices
You'd have to work out the math, +1, -1, what to do if element isn't found, etc, but that's the idea.