What do move and key comparison mean in c++? - c++

Followings are written in a ppt about Insertion Sort in my class:
void insertionSort(DataType theArray[], int n) {
for (int unsorted = 1; unsorted < n; ++unsorted) {
DataType nextItem = theArray[unsorted];
int loc = unsorted;
for (;(loc > 0) && (theArray[loc-1] > nextItem); --loc)
theArray[loc] = theArray[loc-1];
theArray[loc] = nextItem;
}
}
-
Running time depends on not only the size of the array but also the contents of the array.
Best-case:  O(n)
Array is already sorted in ascending order.
Inner loop will not be executed.
>>>> The number of moves: 2*(n-1)  O(n)
>>>> The number of key comparisons: (n-1)  O(n)
Worst-case:  O(n2)
Array is in reverse order:
Inner loop is executed p-1 times, for p = 2,3, …, n
The number of moves: 2*(n-1)+(1+2+...+n-1)= 2*(n-1)+ n*(n-1)/2  O(n2)
The number of key comparisons: (1+2+...+n-1)= n*(n-1)/2  O(n2)
Average-case:  O(n2)
We have to look at all possible initial data organizations.
So, Insertion Sort is O(n2)
What exacly are move and key comparison?? I couldn't find an explanaiton on Google.

Let me word the algorithm first.
Assume at a given time there are two part of array. index 0 to index loc - 1 is sorted in ascending order and index loc to n - 1 is unsorted.
Start with element at loc, find its correct place in sorted part of the array and insert it there.
So now there are two loops:
First outer loop, starts with loc = 1 to loc = n, basically partitions the array in sorted and unsorted part.
Second inner loop finds position of element at loc in the sorted part of array ( 0 to loc - 1).
For the inner loop, to find correct location, you have to compare element at loc with, in worst case, all the elements in sorted part of array. This is key comparison.
To insert, you have to create a void in sorted part of the array for element at loc. This is done by swapping each element in sorted part to the next element. This is move.

Move is the number of swaps it has to perform in order to sort the data and the keys are the data that is compered.

Related

Is below sorting algorithm O(n)?

Algorithm:
insert element counts in a map
start from first element
if first is present in a map, insert in output array (total number of count), increment first
if first is not in a map, find next number which is present in a map
Complexity: O(max element in array) which is linear, so, O(n).
vector<int> sort(vector<int>& can) {
unordered_map<int,int> mp;
int first = INT_MAX;
int last = INT_MIN;
for(auto &n : can) {
first = min(first, n);
last = max(last, n);
mp[n]++;
}
vector<int> out;
while(first <= last) {
while(mp.find(first) == mp.end()) first ++;
int cnt = mp[first];
while(cnt--) out.push_back(first);
first++;
}
return out;
}
Complexity: O(max element in array) which is linear, so, O(n).
No, it's not O(n). The while loop iterates last - first + 1 times, and this quantity depends on the array's contents, not the array's length.
Usually we use n to mean the length of the array that the algorithm works on. To describe the range (i.e. the difference between the largest and smallest values in the array), we could introduce a different variable r, and then the time complexity is O(n + r), because the first loop populating the map iterates O(n) times, the second loop populating the vector iterates O(r) times, and its inner loop which counts down from cnt iterates O(n) times in total.
Another more formal way to define n is the "size of the input", typically measured in the number of bits that it takes to encode the algorithm's input. Suppose the input is an array of length 2, containing just the numbers 0 and M for some number M. In this case, if the number of bits used to encode the input is n, then the number M can be on the order of O(2n), and the second loop does that many iterations; so by this formal definition the time complexity is exponential.

Finding MIN MAX pairs from array

Given a sorted array of N integers, I need to find to all pairs with different indexes(i!=j). I need the maximum (a[j]+a[i]-1) and minimum (a[j]-a[i]+1) out of all pairs with (j>i). Numbers aren't unique but their pairing is allowed. Numbers can't pair with themselves.
What I'm doing right now :
for(i=0;i<n;i++)
{
for(j=i+1;j<n;j++)
{
MAX= max(MAX,a[j] + a[i] -1);
MIN=min(MIN,a[j]-a[i]+1);
}
}
This gives the time complexity of O(n^2). Is there a way to reduce it to O(nlogn) or even less ?
To find the max you just need to add the elements at index n-1 and n-2, as the array is already sorted and the 2 biggest elements will be only at the end of the array. No other element in the array will be bigger than these and hence their sum will also be greater than the sum of any other elements.
MAX = a[n-1] + a[n-2] - 1;
Time complexity : O(1)
For finding the min , you should look for pivot in the array. I choose to start from a[0]. If space is not a constraint create another array of similar size and populate it with the delta values from your pivot.
int[] b = new int[n];
for(int i=1; i<n; i++)
{
b[i] = a[i] - a[0];
}
Now the second array will have the delta values from your pivot. All you have to find is the indices of the Minimum and next-Minimum values of Array b. These 2 will be the closest values to each and hence their difference will also be the least.
Time Complexity : O(n) + O(n) = O(n)
Space Complexity : O(n) as a new array of same size has to be created.

Big 0 notation for duplicate function, C++

What is the Big 0 notation for the function description in the screenshot.
It would take O(n) to go through all the numbers but once it finds the numbers and removes them what would that be? Would the removed parts be a constant A? and then would the function have to iterate through the numbers again?
This is what I am thinking for Big O
T(n) = n + a + (n-a) or something involving having to iterate through (n-a) number of steps after the first duplicate is found, then would big O be O(n)?
Big O notation is considering the worst case. Let's say we need to remove all duplicates from the array A=[1..n]. The algorithm will start with the first element and check every remaining element - there are n-1 of them. Since all values happen to be different it won't remove any from the array.
Next, the algorithm selects the second element and checks the remaining n-2 elements in the array. And so on.
When the algorithm arrives at the final element it is done. The total number of comparisions is the sum of (n-1) + (n-2) + ... + 2 + 1 + 0. Through the power of maths, this sum becomes (n-1)*n/2 and the dominating term is n^2 so the algorithm is O(n^2).
This algorithm is O(n^2). Because for each element in the array you are iterating over the array and counting the occurrences of that element.
foreach item in array
count = 0
foreach other in array
if item == other
count += 1
if count > 1
remove item
As you see there are two nested loops in this algorithm which results in O(n*n).
Removed items doesn't affect the worst case. Consider an array containing unique elements. No elements is being removed in this array.
Note: A naive implementation of this algorithm could result in O(n^3) complexity.
You started with first element you will go through all elements in the vector thats n-1 you will do that for n time its (n * n-1)/2 for worst case n time is the best case (all elements are 4)

most efficient algorithm to get union of 2 ordered lists

I need to find the union of 2 descending ordered lists (list1 and list2), where the union
would be each element from both lists without duplicates. Assume the list elements are integers. I am
using big O notation to determine the most efficient algorithm to solve this problem. I know the big
O notation for the 1st, but I do not know the big O notation for the 2nd. Can someone tell me the
big O notation of the 2nd algorithm so I can decide which algorithm to implement? If someone knows a
better algorithm than one of these, could you help me understand that as well? Thanks in advance.
Here are my two algorithms. . .
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Algorithm #1: O(N * log base2 N)
Starting at the first element of list1,
while(list1 is not at the end of the list) {
if(the current element in list1 is not in list2) // Binary Search -> O(log base2 N)
add the current element in list1 to list2
go to the next element in list1 }
list2 is now the union of the 2 lists
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Algorithm #2: O(?)
Starting at the first elements of each list,
LOOP_START:
compare the current elements of the lists
whichever element is greater, put into a 3rd list called list3
go to the next element in the list whose element was just inserted into list3
branch to LOOP_START until either list1 or list2 are at the end of their respective list
insert the remaining elements from either list1 or list2 into list3 (the union)
list3 now contains the union of list1 and list2
Here's my assessment of the situation
Your first algorithm runs in n log n time: you are doing the binary search for every element in the first list, right?
Your second algorithm is not entirely complete: you don't say what to do if the elements in the two lists are equal. However, given the right logic for dealing with equal elements, your second algorithm is like the merge part of the merge sort: it will run in linear time (i.e. N). It is optimal, in a sense that you cannot do better than that: you cannot merge two ordered lists without looking at every element in both list at least once.
The second is O(n+m) while the first is O(n log(m) + m). Thus the second is significantly better.
With the following algorithm you can have the two lists merged in O(n+m).
[Sorry, I have used python for simplicity, but the algorithm is the same in every language]
Note that the algorithm also maintains the items sorted in the result list.
def merge(list1, list2):
result = []
i1 = 0;
i2 = 0;
#iterate over the two lists
while i1 < len(list1) and i2 < len(list2):
 #if the current items are equal, add just one and go to the next two items
if list1[i1] == list2[i2]:
result.append(list1[i1])
i1 += 1
i2 += 1
#if the item of list1 is greater than the item of list2, add it and go to next item of list1
elif list1[i1] > list2[i2]:
result.append(list1[i1])
i1 += 1
#if the item of list2 is greater than the item of list1, add it and go to next item of list2
else:
result.append(list2[i2])
i2 += 1
#Add the remaining items of list1
while i1 < len(list1):
result.append(list1[i1])
i1 += 1
#Add the remaining items of list2
while i2 < len(list2):
result.append(list2[i2])
i2 += 1
return result
print merge([10,8,5,1],[12,11,7,5,2])
Output:
[12, 11, 10, 8, 7, 5, 2, 1]
Complexity Analysis:
Say the length of list 1 is N and that of list 2 is M.
Algorithm 1:
At the risk of sounding incredible, I would accept that according to me the complexity of this algorithm as such is N * M and not NlogM.
For each element in list 1 (O(N)), we are searching it in list 2 (O(logM). The complexity of this algorithm 'seems' O(NlogM).
However, we are also inserting the element in list 2. This new element should be inserted in proper place so that the list 2 remains sorted for further binary search operations. If we are using array as the data structure, then the insertion would take O(M) time.
Hence the order of complexity is O(N*M) for the algorithm as is.
A modification can be done, wherein the new element is inserted at the end of the list 2 (the list is then no more ordered) and we carry out the binary search operation from index 0 to M-1 rather than the new size-1. In this case the complexity shall be O(N*logM) since we shall carry out N binary searches in the list of length M.
To make the list ordered again, we will have to merge the two ordered parts (0 to M-1 and M to newSize-1). This can be done in O(N+M) time (one merge operation in merge sort of array length N+M). Hence the net time complexity of this algorithm shall be
O(NlogM + N + M)
Space complexity is O(max(N,M)) not considering the original lists space and only considering the extra space required in list 2.
Algorithm 2:
At each iteration, we are moving atleast 1 pointer forward. The total distance to travel by both pointers is N + M. Hence the order of time complexity in worst case is O(N+M) which is better than 1st algorithm.
However, the space complexity required in this case is larger (O(N+M)).
Here is another approach:
Iterate through both lists, and insert all the values into a set.
This will remove all duplicates and the result will be the union of two lists.
Two important notes: You'll loose the order of the numbers. Also, it takes additional space.
Time complexity: O(n + m)
Space Complexity: O(n + m)
If you need to maintain order of the result set, use some custom version of LinkedHashMap.
Actually, algorithm 2 should not work if the input lists are not sorted.
To sort the array it is order O(m*lg(m)+ n*lg(n))
You can build a hash table on the first list, then for each item from the second list, you check if this item exists in the hash table. This works in O(m+n).
There are a few things that need to be specified:
Do the input lists contain duplicates?
Must the result be ordered?
I'll assume that, using std::list, you can cheaply insert at the head or at the tail.
Let's say List 1 has N elements and List 2 has M elements.
Algorithm 1
It iterates over every item of List 1 searching for it in List 2.
Assuming that there may be duplicates and that the result must be ordered, the worse case time for the search is that no element in List 1 exists in List 2, hence it's at least:
O(N × M).
To insert the item of List 1 in the right place, you need to iterate List 2 again until the point of insertion. The worse case will be when every item in List 1 is smaller (if List 2 is searched from the beginning) or greater (if List 2 is searched from the end). Since the previous items of List 1 have been inserted in List 2, there would be M iterations for the first item, M + 1 for the second, M + 2 for the third, etc. and M + N - 1 iterations for the last item, for an average of M + (N - 1) / 2 per item.
Something like:
N × (M + (N - 1) / 2)
For big-O notation, constant factors don't matter, so:
N × (M + (N - 1))
For big-O notation, non-variable additions don't matter, so:
O(N × (M + N))
Adding to the original O(N × M):
O(N × M) + O(N × (M + N))
O(N × M) + O(N × M + N2)
The second equation is just to make the constant factor elimination evident, e.g. 2 × (N × M), thus:
O(N × (M + N))
O(N2 + N × M)
These two are equivalent, which ever you like the most.
Possible optimizations:
If the result doesn't have to be ordered, insertion can be O(1), hence the worse time case is:
O(N × M)
Don't just test each List 1 item in List 2 by equality, test if each item by e.g. greater than, so that you can stop searching in List 2 when List 1's item is greater than List 2's item; this wouldn't reduce the worse case, but it would reduce the average case
Keep the List 2 iterator that points to where List 1's item was found to be greater than List 2's item, to make the sorted insertion O(1); on insertion make sure to keep an iterator that starts at the inserted item, because although List 1 is ordered, it might contain duplicates; with these two, the worse time case becomes:
O(N × M)
For the next iterations, search for List 1's item in the rest of List 2 with the iterator we kept; this reduces the worse case, because if you reach the end of List 2, you'll be just "removing" duplicates from List 1; with these three, the worse time case becomes:
O(N + M)
By this point, the only difference between this algorithm and Algorithm 2 is that List 2 is changed to contain the result, instead of creating a new list.
Algorithm 2
This is the merging of the merge sort.
You'll be walking every element of List 1 and every element of List 2 once, and insertion is always made at the head or tail of the list, hence the worse case time is:
O(N + M)
If there are duplicates, they're simply discarded. The result is more easily made ordered than not.
Final Notes
If there are no duplicates, insertion can be optimized in both cases. For instance, with doubly-linked lists, we can easily check if the last element in List 1 is greater than the first element in List 2 or vice-versa, and simply concatenate the lists.
This can be further generalized for any tail of List 1 and List 2. For instance, in Algorithm 1, if a List 1's item is not found in List 2, we can concatenate List 2 and the tail of List 1. In Algorithm 2, this is done in the last step.
The worse case, when List 1's items and List 2's items are interleaved, is not reduced, but again the average case is reduced, and in many cases by a big factor that makes a big difference In Real Life™.
I ignored:
Allocation times
Worse space differences in the algorithms
Binary search, because you mentioned lists, not arrays or trees
I hope I didn't make any blatant mistake.
I had implemented a typescript(js) based implementation of Union operation of 2 arrays of object in one of my previous projects. The data was too large and the default library functions like underscore or lodash were not optimistic. After some brain hunting i came up with the below binary search based algorithm. Hope it might help someone for performance tuning.
As far as complexity is concerned, the algorithm is binary search based and will end up to be O(log(N)).
Basically the code takes two unordered object arrays and a keyname to compare and:
1) sort the arrays
2) iterate through each element of first array and delete it in second array
3) concatenate resulting second array into first array.
private sortArrays = (arr1: Array<Object>, arr2: Array<Object>, propertyName: string): void => {
function comparer(a, b) {
if (a[propertyName] < b[propertyName])
return -1;
if (a[propertyName] > b[propertyName])
return 1;
return 0;
}
arr1.sort(comparer);
arr2.sort(comparer);
}
private difference = (arr1: Array<Object>, arr2: Array<Object>, propertyName: string): Array<Object> => {
this.sortArrays(arr1, arr2, propertyName);
var self = this;
for (var i = 0; i < arr1.length; i++) {
var obj = {
loc: 0
};
if (this.OptimisedBinarySearch(arr2, arr2.length, obj, arr1[i], propertyName))
arr2.splice(obj.loc, 1);
}
return arr2;
}
private OptimisedBinarySearch = (arr, size, obj, val, propertyName): boolean => {
var first, mid, last;
var count;
first = 0;
last = size - 1;
count = 0;
if (!arr.length)
return false;
while (arr[first][propertyName] <= val[propertyName] && val[propertyName] <= arr[last][propertyName]) {
mid = first + Math.floor((last - first) / 2);
if (val[propertyName] == arr[mid][propertyName]) {
obj.loc = mid;
return true;
}
else if (val[propertyName] < arr[mid][propertyName])
last = mid - 1;
else
first = mid + 1;
}
return false;
}
private UnionAll = (arr1, arr2, propertyName): Array<Object> => {
return arr1.concat(this.difference(arr1, arr2, propertyName));
}
//example
var YourFirstArray = [{x:1},{x:2},{x:3}]
var YourSecondArray= [{x:0},{x:1},{x:2},{x:3},{x:4},{x:5}]
var keyName = "x";
this.UnionAll(YourFirstArray, YourSecondArray, keyName)

Finding an element in partially sorted array

I had a following interview question.
There is an array of nxn elements. The array is partially sorted i.e the biggest element in row i is smaller than the smallest element in row i+1.
How can you find a given element with complexity O(n)
Here is my take on this:
You should go to the row n/2.And start compare for example you search for 100 and the first number you see is 110 so you know it's either in this row or in rows above now you go n/4 and so on.
From the comments
Isn't it O(n * log n) in total? He has
to parse through every row that he
reaches per binary search, therefore
the number of linear searches is
multiplied with the number of rows he
will have to scan in average. – Martin
Matysiak 5 mins ago.
I am not sure that is a right solution. Does anyone have something better
Your solution indeed takes O(n log n) assuming you're searching each row you parse. If you don't search each row, then you can't accurately perform the binary step.
O(n) solution:
Pick the n/2 row, instead of searching the entire row, we simply take the first element of the previous row, and the first element of the next row. O(1).
We know that all elements of the n/2 row must be between these selected values (this is the key observation). If our target value lies in the interval, then search all three rows (3*O(n) = O(n)).
If our value is outside this range, then continue in the binary search manner by selecting n/4 if our value was less than the range, and 3n/4 row if the value was greater, and again comparing against one element of adjacent rows.
Finding the right block of 3 rows will cost O(1) * O(log n), and finding the element will cost O(n).
In total O(log n) + O(n) = O(n).
Here is a simple implementation - since we need O(n) for finding an element within a row anyhow, I left out the bin-search...
void search(int n[][], int el) {
int minrow = 0, maxrow;
while (minrow < n.length && el >= n[minrow][0])
++minrow;
minrow = Math.max(0, minrow - 1);
maxrow = Math.min(n.length - 1, minrow + 1);
for (int row = minrow; row <= maxrow; ++row) {
for (int col = 0; col < n[row].length; ++col) {
if (n[row][col] == el) {
System.out.printf("found at %d,%d\n", row, col);
}
}
}
}