divide a ndarray by narray element-wise - python-2.7

I have this ndarray (not matrix):
mx = np.array([[10,25,33],[3,1,5],[50,50,52]])
[[10 25 33]
[ 3 1 5]
[50 50 52]]
and I want to get a ndarray of shares by dividing every element by the sum of the column. So the result of this operation:
[[10/63 25/76 33/90]
[ 3/63 1/76 5/90]
[50/63 50/76 52/90]]
I can do
np.true_divide(mx,mx.sum(axis=0))
Are ther some build-in functions to calculate shares or stuff like that?

The problem is related to how int behaves in division between python2 and python3. Should you start with float array it would work fine. There is also np.true_divide() that you mention in the comment.

Related

Find the same numbers between [a,b] intervals

Suppose I have 3 array of consecutive numbers
a = [1, 2, 3]
b = [2, 3, 4]
c = [3, 4]
Then the same number that appears in all 3 arrays is 3.
My algorithm is to use two for loops in each other to check for the same array and push it in another array (let's call it d). Then
d = [2, 3] (d = a overlap b)
And use it again to check for array d and c => The final result is 1, cause there are only 1 numbers that appears in all 3 arrays.
e = [3] (e = c overlap d) => e.length = 1
Other than that, if there exists only 1 array, then the algo should return the length of the array, as all of its numbers appear in itself. But I think my said algo above would take too long because the numbers of array can go up to 10^5. So, any idea of a better algorithm?
But I think my said algo above would take too long because the numbers of array can go up to 105. So, any idea of a better algorithm?
Yes, since these are ranges, you basically want to calculate the intersection of the ranges. This means that you can calculate the maximum m of all the first elements of the lists, and the minimum n of all the last elements of the list. All the numbers between m and n (both inclusive) are then members of all lists. If m>n, then there are no numbers in these lists.
You do not need to calculate the overlap by enumerating over the first list, and check if these are members of the last list. Since these are consecutive numbers, we can easily find out what the overlap is.
In short, the overlap of [a, ..., b] and [c, ..., d] is [ max(a,c), ..., min(b,d) ], there is no need to check the elements in between.

Map Eigen replicate Matrix

I am trying to bring code from Matlab to C++. There is some information related to my case in the KDE Eigen Forums.
What I try to achieve is related to Matlab's meshgrid, for which the solution given over there is
X = RowVectorXd::LinSpaced(1,3,3).replicate(5,1);
Y = VectorXd::LinSpaced(10,14,5).replicate(1,3);
i.e., .replicate the vectors the amount of the other dimension. In my case I have two existing (n x 1) vectors and want to create a (n^2, 2) matrix which contains all combinations of vector elements, that is:
[1 3 6]^T and [7 8]^T ==> [1 7, 3 7, 6 7, 1 8, 3 8, 6 8]^T
where ^T just means transposed, lines are comma-separated. (In my case the vectors use floats, but that shouldn't matter).
The first column of the matrix [1 3 6 1 3 6]^T is easily created by Eigen's .replicate function. However, I struggle to create the second column [7 7 7 8 8 8]^T.
My idea was to use .replicate in the other dimension (obtaining a matrix) and then use a rowWise Eigen::Map to bring it to a linear (vector) view (as suggested in the docs), but I understand the arising compiler error such that Eigen::Map doesn't work with an Eigen::Replicate type.
#include <Eigen/Core>
using namespace Eigen;
int main()
{
MatrixXd reptest1(1, 5);
reptest1 << 1, 2, 3, 4, 5;
auto result2 = reptest1.replicate(2, 1); // cols, rows: 5, 2
auto result3 = Map<Matrix<double, 1, Dynamic, Eigen::RowMajor> >(result2); // this doesn't work
return 0;
}
VS2017 complains: error C2440: '<function-style-cast>': cannot convert from 'Eigen::Replicate<Derived,-1,-1>' to 'Eigen::Map<Eigen::Matrix<double,1,-1,1,1,-1>,0,Eigen::Stride<0,0>>'
GCC also complains. no matching function for call (can't copy&paste exact message as it is on another machine).
Am I doing this too complicated? Should using Map work?
Map can only work on matrices, not on expressions. So replace auto result2 by MatrixXd result2, and you're done. See common pitfalls.

Size, Length conversion?

(I know I ask a lot of questions about this!)
Basically, I'm trying to convert some code from Matlab to C++ and I've come across this:
n = sum(size(blocks)) - len;
Now I have calculated the sum of the vector, and I have the length, but I do not know what "size" does? Because in C++ .size() will return the size of the vector.
Any ideas? (Not asking for code)!
In MatLab, size returns a vector of all the dimensions of a vector (or matrix). So if blocks is a 4x2 matrix, then sum(size(blocks)) will return 6. If the number of dimensions is 2 or less, the result always contains 2 elements. ie a column-vector of length 5 would return [5 1] and a row-vector the same length would return [1 5].
It's a bit odd to see sum(size(?)). Often you see prod instead of sum, which multiplies all the dimensions together.
Anyway, hope that answers your question to satisfaction =)
d = size(X) return the sizes of each dimension of array X in a vector d.
Lest say you have d = size(rand(2,3,4)) then this would return d = 2 3 4 . Basically it gives you the size of each block in that array

How to do a set difference, except without eliminating repeated elements

I am trying to do the following in Matlab. Take two lists of numbers, possibly containing repeated elements, and subtract one set from the other set.
Ex: A=[1 1 2 4]; B=[1 2 4];
Desired result would be A-B=C=[1]
Or, another example, E=[3 3 5 5]; F=[3 3 5];
Desired result would be E-F=G=[5]
I wish I could do this using Matlab's set operations, but their function setdiff does not respect the repeated elements in the matrices. I appreciate that this is correct from a strict set theory standpoint, but would nevertheless like to tackle problems like: "I have 3 apples and 4 oranges, and you take 2 apples and 1 orange, how many of each do I have left." My range of possible values in these sets is in the thousands, so building a large matrix for tallying elements and then subtracting matrices does not seem feasible for speed reasons. I will have to do thousands of these calculations with thousands of set elements during a gui menu operation.
Example of what I would like to avoid for tackling the second example above:
E=[0 0 2 0 2]; F=[0 0 2 0 1];
G=E-F=[0 0 0 0 1];
Thanks for your help!
This can be done with the accumarray command.
A = [1 1 2 4]';
B = [1 2 4]'; % <-make these column vectors
X = accumarray(A, 1);
Y = accumarray(B, 1);
This will produce the output
X = [2 1 0 1]'
and
Y = [1 1 0 1]'
Where X(i) represents the number of incidents of the number i, in vector A, and Y(i) represents the number of incidents of number i in vector B.
Then you can just take X - Y.
One caveat: if the maximum values of A and B are different, the output from accummarray will have different lengths. If that is the case, you can just assign the output to be a subset of a vector of zeros that is the size of the larger vector.
I just want to improve on Prototoast's answer.
In order to avoid pitfalls involving non-positive numbers in A or B use hist:
A = [-10 0 1 1 2 4];
B = [1 2 4];
We need the minimum and maximum values in the union of A and B:
U = [A,B];
range_ = min(U):max(U);
So that we can use hist to give us same length vectors:
a = hist(A,range_)
b = hist(B,range_)
Now you need to subtract the histograms:
r = a-b
If you wish the set difference operator be symmetric then use:
r = abs(a-b)
The following will give you which items are in A \ B (\ here is your modified set difference):
C = range_(logical(r))
Hope this helps.

An Optimum 2D Data Structure

I've given this a lot of thought but haven't really been able to come up with something.
Suppose I want a m X n collection of elements sortable by any column and any row in under O(m*n), and also the ability to insert or delete a row in O(m+n) or less... is it possible?
What I've come up with is a linked-grid, where the nodes are inserted into a vector so I have indices for them, and indexed the first row and column to remove the necessity to traverse the list in any one direction. with my method I've achieved the above complexity, but I was just wondering if it is possible to reduce that further by a non-constant factor.
Example for sortability:
1 100 25 34
2 20 15 16
3 165 1 27
Sorted by 3rd row:
25 1 34 100
15 2 16 20
1 3 27 165
Sorting THAT by 1st column:
1 3 27 165
15 2 16 20
25 1 34 100
I would create two index arrays, one for the columns, and one for the rows. So for your data
1 100 25 34
2 20 15 16
3 165 1 27
You create two arrays:
cols = [0, 1, 2, 3]
rows = [0, 1, 2]
Then when you want to sort the matrix by the 3rd row, you keep the original matrix intact, but just change the indices array accordingly:
cols = [2, 0, 3, 1]
rows = [0, 1, 2]
The trick now is to access your matrix with one indirection. So instead of accessing it with m[x][y] you access it by m[cols[x]][rows[y]]. You also have to use m[cols[x]][rows[y]] when you perform the reordering of the rows/cols array.
This way sorting is O(n*log(n)), and access is O(1).
For the data structure, I would use an array with links to another array:
+-+
|0| -> [0 1 2 3 4]
|1| -> [0 1 2 3 4]
|2| -> [0 1 2 3 4]
+-+
To insert a row, just insert it at the last position and update the the rows index array accordingly, with the correct position. E.g. when rows was [0, 1, 2] and you want to insert it at the front, rows will become [3, 0, 1, 2]. This way insertion of a row is O(n).
To insert a column, you also add it as the last element, and update cols accordingly. Inserting a column is O(m), row is O(n).
Deletion is also O(n) or O(m), here you just replace the column/row you want to delete with the last one, and then remove the index from the index array.
Just to add to martinus and Mike's answers: what you need is, in essence, pivoting, which is what they suggest and a very well known technique used in pretty much any numeric algorithm involving matrices. For example, you can run a quick search for "LU decomposition with partial pivoting" and "LU decomposition with full pivoting". The additional vectors that store the permutations are called the "pivots".
If I were handed this problem, I'd create row and column remapping vectors. E.G. to sort rows, I'd determine row order as normal, but instead of copying rows, I'd just change the row remapping vector.
It would look something like this:
// These need to be set up elsewhere.
size_t nRows, nCols;
std::vector<T> data;
// Remapping vectors. Initially a straight-through mapping.
std::vector<size_t> rowMapping(nRows), colMapping(nCols);
for(size_t y = 0; y < nRows; ++y)
rowMapping[y] = y;
for(size_t x = 0; x < nCols; ++x)
colMapping[x] = x;
// Then you read data(row, col) with
T value = data[rowMapping[row] * nCols + colMapping[col]];
P.S. a small optimization would be to store pointers in rowMapping instead of indices. This would let you do T value = rowMapping[row][colMapping[col]];, however, you would have to recalculate the pointers every time that the dimensions of data changes, which could be error-prone.
You can use a hash table and insert (i,j) -> node where (i,j) is a 2-tuple containing 2 integers. You can write your own custom class which defines Equals method and a GetHash() method for that ... or Python gives it to you free of charge.
Now ... what exactly do you mean - sortable by a row or a column? Give an example with values please!
Perhaps by creating a small database for it?
Databases sorting algorithms probably are better than reinventing the wheel. MySql would do. In order to gain performance, table can be created in memory. Then you can index on columns as a usual table, and let the database engine do the dirty job (ordering and such). And then you just harvest the results.