Python : complexity of counting occurrences in columns

Python : complexity of counting occurrences in columns - list

A = [['X', 'X', 'O'],
['X', 'X', 'X'],
['O', 'X', 'X']]
def count_X():
for i in range(3):
total = 0
for j in range(3):
if A[i][j] == 'X':
total += 1
print(total)
Is there any simpler solution without nested for loops O(n²)?

No, there isn't. First of all, you need to loop the rows, because you need the occurrence for each rows, that's already ON(n). Now, for each row, you have n elements and you do not know in advance which is 'X' and which isn't, so you loop all items, which is again O(n). Since you have an algorithm of O(n) that executes another algorithm of O(n) complexity, the actual complexity is O(n²). You cannot reliably get the total for each row's each element without looping the rows and their elements.

Related

Hashing 100 distinct values of range 1 billion

I was recently asked this question in an interview. I have an array of n elements. The array has only 100 distinct values. I need to print the count of occurrence of each number.
1<=n<=10^6
1<=A[i]<=10^12
Expected space complexity was O(k) where k is the number of distinct values in the array.
For example, 1 2 3 2 1 4 3 2 4 2 3 1 2 ; here k is 4.
First I suggested using maps in stl but he wanted he to implement my own data structure. Then I suggested using sorted insert for each element like in a binary search tree but that would give a time complexity of O(nlogn). He wanted an O(n) solution. I tried to think of any hash function but I could not come up with any such function. I also tried to think of trie data structure but again I will have to scan each digit of each number thus again giving a O(nlogn) complexity. What could be a possible approach to solve this?

Hash table won't guarantee theoretical complexity of O(n*k). But it's quite easy to make such one.
First, we need to make some assumption about values probability distribution - let it be uniform (or else we need some specialized hash function).
Next, let's choose hash table size, say, 201 entries (so it will be less than 50% full).
Next, let hash function be just hash(A[i]) = A[i] mod 201.
And then use open-addressing hash table H[] with 201 entries pairs: A[i] or NULL; frequency value.

I think that a hash table is a good solution for this, but I imagine the interviewer was expecting you to build your own hash table.
Here's a solution I came up with in Python. I'm using mod 100 as my hash function and using Separate chaining to deal with collisions.
import random
N = random.randint(1, 10**6)
K = 100
HASH_TABLE_SIZE = 100
distinct = [random.randint(1, 10**12) for _ in range(K)]
numbers = [random.choice(distinct) for _ in range(N)]
hash_table = [[] for _ in range(HASH_TABLE_SIZE)]
def hash(n):
hash_key = n % HASH_TABLE_SIZE
bucket = hash_table[hash_key]
for value in bucket:
if value[0] == n:
value[1] += 1
return
bucket.append([n, 1])
for number in numbers:
hash(number)
for bucket in hash_table:
for value in bucket:
print('{}: {}'.format(*value))
EDIT
Explaining the code a bit:
My hash table is a 100-element array. Each entry in the array is a list of (number, count) entries. To hash a number, I take its value modulo 100 to find an index into the array. I scan the numbers already in that bucket, and if any of them match the current number, I increment its count. If I don't find the number, I append a new entry to the list with the number and an initial count of 1.
Visually, the array looks sort of like this:
[
[ [0, 3], [34500, 1] ]
[ [101, 1] ],
[],
[ [1502, 1] ],
...
]
Note that at index n, each value stored in the bucket equals n (mod 100). On average, there will be only one value per bucket, since there are up to 100 distinct values and 100 elements in the array.
To print out the final counts, all that's required is to walk through the array and each entry in each bucket and print them out.
EDIT 2
Here's a slightly different implementation that uses Open addressing with linear probing instead. I think I actually prefer this approach.
hash_table = [None] * HASH_TABLE_SIZE
def hash(n):
hash_key = n % HASH_TABLE_SIZE
while hash_table[hash_key] is not None and hash_table[hash_key][0] != n:
hash_key = (hash_key + 1) % HASH_TABLE_SIZE
if hash_table[hash_key] is None:
hash_table[hash_key] = [n, 1]
else:
hash_table[hash_key][1] += 1
for number in numbers:
hash(number)
for entry in hash_table:
print('{}: {}'.format(*entry))
NOTE: This code will fail if there are actually more than 100 distinct numbers. (It will hang forever trying to find an open spot in the array.) It would be nice to keep detect that condition (e.g. once you've walked an entire lap in the array) and raise an exception.

Actually, you're wrong, the trie would give you O(N) complexity.
One insert/find/erase operation of a trie requires O(L) time, where L is the length of the strings pushed into this trie. Fortunately, you just insert numbers not larger than 1 trillion, which means that L is not larger than log(10^12) (logarithm base depends on the counting system you use in this trie. I personally would select 256 or 65536 depending on what part of a whole system does this structure play).
Suming up, you will need O(N) * O(log(10^12)) which is equal to O(N) by the definition of O().

Big 0 notation for duplicate function, C++

What is the Big 0 notation for the function description in the screenshot.
It would take O(n) to go through all the numbers but once it finds the numbers and removes them what would that be? Would the removed parts be a constant A? and then would the function have to iterate through the numbers again?
This is what I am thinking for Big O
T(n) = n + a + (n-a) or something involving having to iterate through (n-a) number of steps after the first duplicate is found, then would big O be O(n)?

Big O notation is considering the worst case. Let's say we need to remove all duplicates from the array A=[1..n]. The algorithm will start with the first element and check every remaining element - there are n-1 of them. Since all values happen to be different it won't remove any from the array.
Next, the algorithm selects the second element and checks the remaining n-2 elements in the array. And so on.
When the algorithm arrives at the final element it is done. The total number of comparisions is the sum of (n-1) + (n-2) + ... + 2 + 1 + 0. Through the power of maths, this sum becomes (n-1)*n/2 and the dominating term is n^2 so the algorithm is O(n^2).

This algorithm is O(n^2). Because for each element in the array you are iterating over the array and counting the occurrences of that element.
foreach item in array
count = 0
foreach other in array
if item == other
count += 1
if count > 1
remove item
As you see there are two nested loops in this algorithm which results in O(n*n).
Removed items doesn't affect the worst case. Consider an array containing unique elements. No elements is being removed in this array.
Note: A naive implementation of this algorithm could result in O(n^3) complexity.

You started with first element you will go through all elements in the vector thats n-1 you will do that for n time its (n * n-1)/2 for worst case n time is the best case (all elements are 4)

What is the time complexity of traversing a 2d array

What is the time complexity of traversing (rows ,columns) a two dimensional array?
bool check(int array [9][9])
{
int num=0;
for (int i = 0; i < 9; i++) {
for (int j = 0; j < 9; j++) {
if (array [i][j] == 0) {
num++;
}
}
}
return num;
}
I think each for loop will take square root of n so that nested loops totally take O(n) as traversing all elements, where I am defining n as the total size of the input (in this case 81 elements in array). Is that correct?

As you define n to be the total size of the input, yes the running time of the algorithm you propose will be O(n): you are performing one single operation on each element of the input, for n total operations.
Where the confusion is arising from this question is that by convention, multi-dimensional arrays are not referred to by their total size but rather by each of their dimensions separately. So rather than viewing array as being of size n (81) it would be considered to be an array of size p x q (9 x 9). That would give you a running time of O(pq). Or, if we limit it to square arrays with both dimensions r, O(r^2).
All are correct, which is why it's important to give a clear definition of your variables up front when talking about time complexity. Otherwise, when you use n to mean the total size when most people would assume that n would be a single dimension, you are inviting a lot of confusion.

The time complexity will be O (n*m) where n the number of arrays which is the 1st dimension and m the max size of each internal array ie, the 2nd dimension.

For any algorithm of the form
for (1..n) {
for (1..m) {
doSomething();
}
}
The average, best and worst case time complexity is O(n x m). In your case if n=m, it becomes O(n^2)

The time complexity is O(N), which means its time complexity is linear.
Let's look at the concept of time complexity. When we define any time complexity in Big O notation what we mean is how does the graph of N versus run time must look like in the worst execution case.
For given nested loop size of the data is 9*9 = 81.No matter what operation you perform in the inside for loop. The loops will not execute more than 9*9 = 81 times. If the size of the array was [10][10] the loops will execute not more than 100 times.
If you make graph of execution time of the code with number of inputs or data it will be linear.

The Time complexity is derived by how many times your code is going to do lookup of an element in the data structure to deduce the result. It does not matter whether it is 1-D, 2-D or n-D array. If you access an element not more than once for an n-D array to deduce the solution, the complexity is linear O(N), where N = N1 * N2 * ... *Nn
Let's understand this by taking real world example of two different hotels having N rooms each. You need to search your friend in the hotel.
In first scenario let's say first hotel has 100 rooms on single(ground) floor, you need to visit 100 rooms in worst case to find your friend, so here complexity is linear i.e. 0(N) or O(100).
In second scenario the hotel has 4 floors having 25 rooms each. In the worst case you have to visit 25*4=100 rooms (ignore the accessing time/process between floors), hence complexity is again linear.

A 2-d array arr[i][j] can be traversed by a single loop also, where the loop will run for (i × j) times.
Consider n = (i×j), then the time complexity for traversing a 2-d array is O(n).
Thanks to coder2design.com

why is Insertion sort best case big O complexity O(n)?

Following is my insertion sort code:
void InsertionSort(vector<int> & ioList)
{
int n = ioList.size();
for (int i = 1 ; i < n ; ++i)
{
for (int j = 0 ; j <= i ; ++j)
{
//Shift elements if needed(insert at correct loc)
if (ioList[j] > ioList[i])
{
int temp = ioList[j];
ioList[j] = ioList[i];
ioList[i] = temp;
}
}
}
}
The average complexity of the algorithm is O(n^2).
From my understanding of big O notation, this is because we run two loops in this case(outer one n-1 times and inner one 1,2,...n-1 = n(n-1)/2 times and thus the resulting asymptomatic complexity of the algorithm is O(n^2).
Now I have read that best case is the case when the input array is already sorted.
And the big O complexity of the algorithm is O(n) in such a case. But I fail to understand how this is possible as in both cases (average and best case) we have to run the loops the same number of times and have to compare the elements. The only thing that is avoided is the shifting of elements.
So does complexity calculation also involve a component of this swapping operation?

Yes, this is because your implementation is incorrect. The inner loop should count backward from i-1 down to 0, and it should terminate as soon as it finds an element ioList[j] that is already smaller than ioList[i].
It is because of that termination criterion that the algorithm performs in O(n) time in the best case:
If the input list is already sorted, the inner loop will terminate immediately for any i, i.e. the number of computational steps performed ends up being proportional to the number of times the outer loop is performed, i.e. O(n).

Your implementation of "insertion sort" is poor.
In your inner loop, you should not scan all the way up to i-1 swapping each element greater than ioList[i]. Instead, you should scan backwards from i-1 until you find the correct place to insert the new element (that is, until you find an element less than or equal to the new element), and insert it there. If the input is already sorted, then the correct insertion point is always found immediately, and so the inner loop does not execute i-1 times, it only executes once.
Your sort is also worse than insertion sort on average, since you always do i+1 operations for each iteration of the outer loop -- some of those ops are just a comparison, and some are a comparison followed by a swap. An insertion sort only needs to do on average half that, since for random/average input, the correct insertion point is half way through the initial sorted segment. It's also possible to avoid swaps, so that each operation is a comparison plus a copy.

All possible combinations of length 8 in a 2d array

I've been trying to solve a problem in combinations. I have a matrix 6X6 i'm trying to find all combinations of length 8 in the matrix.
I have to move from neighbor to neighbor form each row,column position and i wrote a recursive program which generates the combination but the problem is it generates a lot of duplicates as well and hence is inefficient. I would like to know how could i eliminate calculating duplicates and save time.
int a={{1,2,3,4,5,6},
{8,9,1,2,3,4},
{5,6,7,8,9,1},
{2,3,4,5,6,7},
{8,9,1,2,3,4},
{5,6,7,8,9,1},
}
void genSeq(int row,int col,int length,int combi)
{
if(length==8)
{
printf("%d\n",combi);
return;
}
combi = (combi * 10) + a[row][col];
if((row-1)>=0)
genSeq(row-1,col,length+1,combi);
if((col-1)>=0)
genSeq(row,col-1,length+1,combi);
if((row+1)<6)
genSeq(row+1,col,length+1,combi);
if((col+1)<6)
genSeq(row,col+1,length+1,combi);
if((row+1)<6&&(col+1)<6)
genSeq(row+1,col+1,length+1,combi);
if((row-1)>=0&&(col+1)<6)
genSeq(row-1,col+1,length+1,combi);
if((row+1)<6&&(row-1)>=0)
genSeq(row+1,col-1,length+1,combi);
if((row-1)>=0&&(col-1)>=0)
genSeq(row-1,col-1,length+1,combi);
}
I was also thinking of writing a dynamic program basically recursion with memorization. Is it a better choice?? if yes than I'm not clear how to implement it in recursion. Have i really hit a dead end with approach???
Thankyou
Edit
Eg result
12121212,12121218,12121219,12121211,12121213.
the restrictions are that you have to move to your neighbor from any point, you have to start for each position in the matrix i.e each row,col. you can move one step at a time, i.e right, left, up, down and the both diagonal positions. Check the if conditions.
i.e
if your in (0,0) you can move to either (1,0) or (1,1) or (0,1) i.e three neighbors.
if your in (2,2) you can move to eight neighbors.
so on...

To eliminate duplicates you can covert 8 digit sequences into 8-digit integers and put them in a hashtable.
Memoization might be a good idea. You can memoize for each cell in the matrix all possible combinations of length 2-7 that can be achieved from it. Going backwards: first generate for each cell all sequences of 2 digits. Then based on that of 3 digits etc.
UPDATE: code in Python
# original matrix
lst = [
[1,2,3,4,5,6],
[8,9,1,2,3,4],
[5,6,7,8,9,1],
[2,3,4,5,6,7],
[8,9,1,2,3,4],
[5,6,7,8,9,1]]
# working matrtix; wrk[i][j] contains a set of all possible paths of length k which can end in lst[i][j]
wrk = [[set() for i in range(6)] for j in range(6)]
# for the first (0rh) iteration initialize with single step paths
for i in range(0, 6):
for j in range(0, 6):
wrk[i][j].add(lst[i][j])
# run iterations 1 through 7
for k in range(1,8):
# create new emtpy wrk matrix for the next iteration
nw = [[set() for i in range(6)] for j in range(6)]
for i in range(0, 6):
for j in range(0, 6):
# the next gen. wrk[i][j] is going to be based on the current wrk paths of its neighbors
ns = set()
if i > 0:
for p in wrk[i-1][j]:
ns.add(10**k * lst[i][j] + p)
if i < 5:
for p in wrk[i+1][j]:
ns.add(10**k * lst[i][j] + p)
if j > 0:
for p in wrk[i][j-1]:
ns.add(10**k * lst[i][j] + p)
if j < 5:
for p in wrk[i][j+1]:
ns.add(10**k * lst[i][j] + p)
nw[i][j] = ns
wrk = nw
# now build final set to eliminate duplicates
result = set()
for i in range(0, 6):
for j in range(0, 6):
result |= wrk[i][j]
print len(result)
print result

There are LOTS of ways to do this. Going through every combination is a perfectly reasonable first approach. It all depends on your requirements. If your matrix is small, and this operation isn't time sensitive, then there's no problem.
I'm not really an algorithms guy, but I'm sure there are really clever ways of doing this that someone will post after me.
Also, in Java when using CamelCase, method names should start with a lowercase character.

int a={{1,2,3,4,5,6},
{8,9,1,2,3,4},
{5,6,7,8,9,1},
{2,3,4,5,6,7},
{8,9,1,2,3,4},
{5,6,7,8,9,1},
}
By length you mean summation of combination of matrix elements resulting 8. i.e., elements to sum up 8 with in row itself and with the other row elements. From row 1 = { {2,6}, {3,5}, } and now row 1 elements with row 2 and so on. Is that what you are expecting ?

You can think about your matrix like it is one-dimension array - no matter here ("place" the rows one by one). For one-dimension array you can write a function like (assuming you should print the combinations)
f(i, n) prints all combinations of length n using elements a[i] ... a[last].
It should skip some elements from a[i] to a[i + k] (for all possible k), print a[k] and make a recursive call f(i + k + 1, n - 1).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Python : complexity of counting occurrences in columns - list

A = [['X', 'X', 'O'], ['X', 'X', 'X'], ['O', 'X', 'X']] def count_X(): for i in range(3): total = 0 for j in range(3): if A[i][j] == 'X': total += 1 print(total) Is there any simpler solution without nested for loops O(n²)?

Related

Hashing 100 distinct values of range 1 billion

Big 0 notation for duplicate function, C++

What is the time complexity of traversing a 2d array

why is Insertion sort best case big O complexity O(n)?

All possible combinations of length 8 in a 2d array

Categories

Resources