An Optimum 2D Data Structure - c++

I've given this a lot of thought but haven't really been able to come up with something.
Suppose I want a m X n collection of elements sortable by any column and any row in under O(m*n), and also the ability to insert or delete a row in O(m+n) or less... is it possible?
What I've come up with is a linked-grid, where the nodes are inserted into a vector so I have indices for them, and indexed the first row and column to remove the necessity to traverse the list in any one direction. with my method I've achieved the above complexity, but I was just wondering if it is possible to reduce that further by a non-constant factor.
Example for sortability:
1 100 25 34
2 20 15 16
3 165 1 27
Sorted by 3rd row:
25 1 34 100
15 2 16 20
1 3 27 165
Sorting THAT by 1st column:
1 3 27 165
15 2 16 20
25 1 34 100

I would create two index arrays, one for the columns, and one for the rows. So for your data
1 100 25 34
2 20 15 16
3 165 1 27
You create two arrays:
cols = [0, 1, 2, 3]
rows = [0, 1, 2]
Then when you want to sort the matrix by the 3rd row, you keep the original matrix intact, but just change the indices array accordingly:
cols = [2, 0, 3, 1]
rows = [0, 1, 2]
The trick now is to access your matrix with one indirection. So instead of accessing it with m[x][y] you access it by m[cols[x]][rows[y]]. You also have to use m[cols[x]][rows[y]] when you perform the reordering of the rows/cols array.
This way sorting is O(n*log(n)), and access is O(1).
For the data structure, I would use an array with links to another array:
+-+
|0| -> [0 1 2 3 4]
|1| -> [0 1 2 3 4]
|2| -> [0 1 2 3 4]
+-+
To insert a row, just insert it at the last position and update the the rows index array accordingly, with the correct position. E.g. when rows was [0, 1, 2] and you want to insert it at the front, rows will become [3, 0, 1, 2]. This way insertion of a row is O(n).
To insert a column, you also add it as the last element, and update cols accordingly. Inserting a column is O(m), row is O(n).
Deletion is also O(n) or O(m), here you just replace the column/row you want to delete with the last one, and then remove the index from the index array.

Just to add to martinus and Mike's answers: what you need is, in essence, pivoting, which is what they suggest and a very well known technique used in pretty much any numeric algorithm involving matrices. For example, you can run a quick search for "LU decomposition with partial pivoting" and "LU decomposition with full pivoting". The additional vectors that store the permutations are called the "pivots".

If I were handed this problem, I'd create row and column remapping vectors. E.G. to sort rows, I'd determine row order as normal, but instead of copying rows, I'd just change the row remapping vector.
It would look something like this:
// These need to be set up elsewhere.
size_t nRows, nCols;
std::vector<T> data;
// Remapping vectors. Initially a straight-through mapping.
std::vector<size_t> rowMapping(nRows), colMapping(nCols);
for(size_t y = 0; y < nRows; ++y)
rowMapping[y] = y;
for(size_t x = 0; x < nCols; ++x)
colMapping[x] = x;
// Then you read data(row, col) with
T value = data[rowMapping[row] * nCols + colMapping[col]];
P.S. a small optimization would be to store pointers in rowMapping instead of indices. This would let you do T value = rowMapping[row][colMapping[col]];, however, you would have to recalculate the pointers every time that the dimensions of data changes, which could be error-prone.

You can use a hash table and insert (i,j) -> node where (i,j) is a 2-tuple containing 2 integers. You can write your own custom class which defines Equals method and a GetHash() method for that ... or Python gives it to you free of charge.
Now ... what exactly do you mean - sortable by a row or a column? Give an example with values please!

Perhaps by creating a small database for it?
Databases sorting algorithms probably are better than reinventing the wheel. MySql would do. In order to gain performance, table can be created in memory. Then you can index on columns as a usual table, and let the database engine do the dirty job (ordering and such). And then you just harvest the results.

Related

how do I get all combinations of elements in a list?

When getting a list of X elements, how can I get all doubles, triples, ... ( Y ) combinations of these elements ?
Y being the size of the required combinations. Ex : if Y = 2, I need to get all of the possible pairs.
I must not give the same combinations twice ( ex : [a, b] and [b, a] are the same combination )
Take a copy of the list.
If the list is empty, there are no combinations.
To get all combinations of size one, look at each element in turn.
To get all combinations of size n+1, first remove the first element. Then get all combinations of size n of the rest of the list, plus that first element. Then get all combinations of size n+1 of the rest of the list, and don't add the first element.
And then you are done.
You can get fancy and merely pretend to copy/remove elements for optimization sake.
You can iterate t from 2 to Y, and create an array A with the size X fill with X-t 0s in the front and t 1s in the back, then with the code below:
do{
//1s in array A now correspond to a valid combination
}while(std::next_permutation(A,A+X));
The loop will stop when all combination with size t are iterated
next_permutation is in header algorithm, it will reorder the array to the next lexicographically greater permutation or return false if the array is already in the lexicographically greatest permutation. Its complexity is O(n), since you also need to iterate through the array once, so it wouldn't be a problem. Total complexity for the whole process will be bounded by O(2^n*n).
So here is an example pseudo code
D[X] = {1,2,3,4} Y = 3 //the input
For t = 2,3,..,Y
A[X] = {0,...,0,1,...,1} // X - t 0s and t 1s
Do
For j = 0,1,...,X-1
if A[j] == 1
output D[j]
end if
end for
output newline
While next_permutation(A,A+X)
end for
The output will looks like
3 4
2 4
2 3
1 4
1 3
1 2
2 3 4
1 3 4
1 2 4
1 2 3

Determine all square sub matrices of a given NxN matrix in C++

GIven an NxN square matrix, I would like to determine all possible square sub matrices by removing equal number of rows and columns.
In order to determine all possible 2x2 matrices I need to loop 4 times. Similarly for 3x3 matrices I need to loop 6 times and so on. Is there a way to generate code in C++ so that the code for the loops is generated dynamically? I have checked some answers related to code generation in C++, but most of them use python in it. I have no idea regarding python. So, is it possible to write code to generate code in C++?
If I get what you are saying, you mean you require M loops to choose M rows, and M loops for M columns for an M x M sub matrix, 1 <= M <= N
You don't need 2*M loops to do this. No need to dynamically generate code with an ever-increasing number of loops!
Essentially, you need to "combine" all possible combinations of i_{1}, i_{2}, ..., i_{M} and j_{1}, j_{2}, ..., j_{M} such that 1 <= i_{1} < i_{2} < ... < i_{M} <= N (and similarly for j)
If you have all possible combinations of all such i_{1}, ..., i_{M} you are essentially done.
Say for example you are working with a 10 x 10 matrix and you require 4 x 4 sub matrices.
Suppose you selected rows {1, 2, 3, 4} and columns {1, 2, 3, 4} initially. Next select column {1, 2, 3, 5}. Next {1, 2, 3, 6} and so on till {1, 2, 3, 10}. Next select {1, 2, 4, 5}, next {1, 2, 4, 6} and so on till you reach {7, 8, 9, 10}. This is one way you could generate all ("10 choose 4") combinations in a sequence.
Go ahead, write a function that generates this sequence and you are done. It can take as input M, N, current combination (as an array of M values) and return the next combination.
You need to call this sequence to select the next row and the next column.
I have put this a little loosely. If something is not clear I can edit to update my answer.
Edit:
I will be assuming loop index starts from 0 (the C++ way!). To elaborate the algorithm further, given one combination as input the next combination can be generated by treating the combination as a "counter" of sorts (except that no digit repeats).
Disclaimer : I have not run or tested the below snippet of code. But the idea is there for you to see. Also, I don't use C++ anymore. Bear with me for any mistakes.
// Requires M <= N as input, (N as in N x N matrix)
void nextCombination( int *currentCombination, int M, int N ) {
int *arr = currentCombination;
for( int i = M - 1; i >= 0; i-- ) {
if( arr[i] < N - M + i ) {
arr[i]++;
for( i = i + 1, i < M; i++ ) {
arr[i] = arr[i - 1] + 1;
}
break;
}
}
}
// Write code for Initialization: arr = [0, 1, 2, 3]
nextCombination( arr, 4, 10 );
// arr = [0, 1, 2, 4]
// You can check if the last combination has been reached by checking if arr[0] == N - M + 1. Please incorporate that into the function if you wish.
Edit:
Actually I want to check singularity of all possible sub matrices. My approach is to compute all submatrices and then find their determinants. How ever after computing the determinant of 2x2 matrices , I'll store them and use while computing determinants of 3x3 matrices. And so on. Can you suggest me a better approach. I have no space and time constraints. – vineel
A straight-forward approach using what you suggest is to index the determinants based on the the rows-columns combination that makes a sub matrix. At first store determinants for 1 x 1 sub matrices in a hash map (basically the entries themselves).
So the hash map would look like this for the 10 x 10 case
{
"0-0" : arr_{0, 0},
"0-1" : arr_{0, 1},
.
.
.
"1-0" : arr_{1, 0},
"1-1" : arr_{1, 1},
.
.
.
"9-9" : arr_{9, 9}
}
When M = 2, you can calculate determinant using the usual formula (the determinants for 1 x 1 sub matrices having been initialized) and then add to the hash map. The hash string for a 2 x 2 sub matrix would look something like 1:3-2:8 where the row indices in the original 10 x 10 matrix are 1,3 and the column indices are 2, 8. In general, for m x m sub matrix, the determinant can be determined by looking up all necessary (already) computed (m - 1) x (m - 1) determinants - this is a simple hash map lookup. Again, add the determinant to hash map once calculated.
Of course, you may need to slightly modify the nextCombination() function - it currently assumes row and column indices run from 0 to N - 1.
On another note, since all sub matrices are to be processed starting from 1 x 1, you don't need something like a nextCombination() function. Given a 2 x 2 matrix, you just need to select one more row and column to form a 3 x 3 matrix. So you need to select one row-index (that's not part of the row indices that make the 2 x 2 sub matrix) and similarly one column-index. But doing this for every 2 x 2 matrix will generate duplicate 3 x 3 matrices - you need to think of some way to eliminate duplicates. One way to avoid duplicates is by choosing only such row/column whose index is greater than the highest row/column index in the sub matrix.
Again I have loosely defined the idea. You can build upon it.

C++ Find all combinations of elements in multiple arrays in Breadth-first search manner

I've got multiple arrays and want to find the permutations of all the elements in these arrays. Each element also carries a weight, and these arrays are sorted decreasing by weight. I've got an array with weight that mimics the arrays with he values themselves. I want my search to find permutations with the greatest weight to the lowest weight.
However, each element in an array has a weight associated with it so I want to run my search with those with the highest weight first.
Example:
arr0 = [A, B, C, D]
arr0_weight = [11, 7, 4, 3]
arr1 = [W, X, Y]
arr1_weight = [10, 9, 4]
Thus, the ideal output would be:
AW (11+10=21)
AX (11+9=20)
BW (7+10=17)
BX (7+9=16)
AY (11+4=15)
...
If I did just a for loop like this:
for (int i = 0; i < sizeof(arr0)/4; i++) {
for (int j = 0; j < sizeof(arr1)/4; j++) {
cout << arr0[i] << arr1[j] << endl; }}
I would get:
AW (11+10=21)
AX (11+9=20)
AY (11+4=15)
BW (7+10=17)
BX (7+9=16)
BZ (7+4=11)
Which isn't what I want because 17 > 15 and 16 > 15.
Also, what's a good way to do this for n arrays? If I don't know how many arrays I will have, and their size might not all be the same?
I've looked into putting the values into vectors but I can't find a way to do what I want (a sorted Cartesian product). Any help? Pseudo-code is fine if you don't have time - I'm just really stuck.
Thanks so much.
Your question is about algorithm, not C++.
You want to sort all tuples in Cartesian product from heaviest to lightest.
Easiest way is to find all tuples and sort them by their weight.
If you need sequential access, your should do following. Since weight of tuple is sum of weights of its elements, I think, greediness is optimal here. Let's move to arbitrary number of arrays of arbitrary dimensions. Create set of indices. Initially, it's contains zeros. First tuple that it represents is obviously heaviest. Find one of indices to increment: choose index that loses least weight, that has least difference with next element. Don't forget to keep track of exhausted arrays. When all vectors are exhausted, you're done.
To implement it in C++, you should employ vector<pair<element_t, weight_t>> for input data and set<pair<weight_difference_t, index_t>> as set of indices. All types are probably integers but I used custom types to show which data should be there. Your should also know how pair is compared.

How to do a set difference, except without eliminating repeated elements

I am trying to do the following in Matlab. Take two lists of numbers, possibly containing repeated elements, and subtract one set from the other set.
Ex: A=[1 1 2 4]; B=[1 2 4];
Desired result would be A-B=C=[1]
Or, another example, E=[3 3 5 5]; F=[3 3 5];
Desired result would be E-F=G=[5]
I wish I could do this using Matlab's set operations, but their function setdiff does not respect the repeated elements in the matrices. I appreciate that this is correct from a strict set theory standpoint, but would nevertheless like to tackle problems like: "I have 3 apples and 4 oranges, and you take 2 apples and 1 orange, how many of each do I have left." My range of possible values in these sets is in the thousands, so building a large matrix for tallying elements and then subtracting matrices does not seem feasible for speed reasons. I will have to do thousands of these calculations with thousands of set elements during a gui menu operation.
Example of what I would like to avoid for tackling the second example above:
E=[0 0 2 0 2]; F=[0 0 2 0 1];
G=E-F=[0 0 0 0 1];
Thanks for your help!
This can be done with the accumarray command.
A = [1 1 2 4]';
B = [1 2 4]'; % <-make these column vectors
X = accumarray(A, 1);
Y = accumarray(B, 1);
This will produce the output
X = [2 1 0 1]'
and
Y = [1 1 0 1]'
Where X(i) represents the number of incidents of the number i, in vector A, and Y(i) represents the number of incidents of number i in vector B.
Then you can just take X - Y.
One caveat: if the maximum values of A and B are different, the output from accummarray will have different lengths. If that is the case, you can just assign the output to be a subset of a vector of zeros that is the size of the larger vector.
I just want to improve on Prototoast's answer.
In order to avoid pitfalls involving non-positive numbers in A or B use hist:
A = [-10 0 1 1 2 4];
B = [1 2 4];
We need the minimum and maximum values in the union of A and B:
U = [A,B];
range_ = min(U):max(U);
So that we can use hist to give us same length vectors:
a = hist(A,range_)
b = hist(B,range_)
Now you need to subtract the histograms:
r = a-b
If you wish the set difference operator be symmetric then use:
r = abs(a-b)
The following will give you which items are in A \ B (\ here is your modified set difference):
C = range_(logical(r))
Hope this helps.

Position of elements in vector

I have several elements in a vector type that are read from cin and then i perfrom some calculations on the vector and it's order of elements gets changed. The problem is that I need to print the positions of the vector elements after the calculations. I don't know how to explain this well that's why i'll give an example:
10 1 100 1000
and 10 is 1st element, 1 is 2nd, 100 is 3rd etc. After the calculations the vector changes in :
100 10 1 1000
so I should print
3 1 2 4
because 100 is the 3rd element of the input, 10 is the 1st etc. etc.
I tried with an array[1000] (because there aren't numbers larger than 1000 in the input), but it won't work because there can be multiple numbers with the same value, like:
10 10 10 100
and the output can be 1 2 3 4 or 2 3 1 4 or 3 1 2 4 etc. but here i need to output 1 2 3 4 because it's the 'smallest'.
I tried with array f[1001] and f[10] = 1, f[100] = 2, f[1] = 3 - if the numbers from the input are 10 100 1. But in case there are multiple numbers with the same value like 10 10 100, then my idea's not working. Please help me in any possible way.
Sounds like you need to store both the value and the initial position. You should be able to do this with an array of structs:
struct UserInput
{
unsigned int initialPosition;
int userInputValue;
};
int main()
{
userInput theUserInput[100];
// increment a counter, starting at 1, and place it in
// "initialPosition" in the struct as user input is read
}
I'll leave the rest up to you... as it is after all homework :) good luck.
Use an associative array if you know what it is.
Use linear search to determine the index if the number of input is limited.
Consider using log10 (or strlen) to transform the 1, 10, 100, 1000, etc. into 0, 1, 2, 3, etc.
From your description of such example:
10(3) 10(2) 10(1) 100(4)
What we have to output is 1 2 3 4, instead of 3 2 1 4.
So I don't think your requirement is just print the initial position directly. You've to make the position sequences as small as possible.
Following is my solution:
Use a direct-mapping hash table to store all the initial positions for specified element. All the initial positions for the same element is sorted. So if you want output the smallest position sequence, you only need to read the initial positions for this specified element from first to last.
The detailed implementation is left to you, since it's a homework.