regexp-like library for matrix pattern search - regex

Is there a library (in any language) that can search patterns in matrixes like regular expressions work for strings ? Something like regular expresions for matrixes, or any matrix pattern search method ?

If you're not averse to using J, you can find out whether two matrices are equal by using the -: (match) operator. For example:
X =: 4 3 $ i.12
X
0 1 2
3 4 5
6 7 8
9 10 11
Y =: 4 3 $ (1+i.12)
Y
1 2 3
4 5 6
7 8 9
10 11 12
X -: X
1
X -: Y
0
One nice feature of the match operator is that you can use it to compare arrays of arbitrary dimension; if A is a 3x3x4 array and B is a 2x1 array, then A-:B returns 0.
To find out whether a matrix is a submatrix of another matrix, you can use the E: (member of interval) operator like so:
X =: 2 2 $ 1 2 4 5
X
1 2
4 5
Y =: 4 3 $ (1+i.12)
Y
1 2 3
4 5 6
7 8 9
10 11 12
X E. Y
1 0 0
0 0 0
0 0 0
0 0 0
The 1 at the top left of the result signifies that the part of Y that is equal to X has the given pixel as its upper left-hand corner. The reason for this is that there may be several overlapping copies of X embedded in Y, and only flagging the one pixel lets you see the location of every matching tile.

I found two things: gawk and a perl script.
It's a different problem because string regular expressions work (e.g., sed, grep) work line-by-line on one-dimensional strings.
Unless your matrices are one-dimensional (basically vectors), these programs and the algorithms they use won't work.
Good luck!

Just search rows of the pattern in each row of the input matrix using Aho-Corasick (time O(matrix size)). The result should be small enough to quickly join it into the final result.

I don't think there exists anything quite like regular expressions for dimensions higher than 1, but if you want to match an exact pattern instead of a class of patterns then I might suggest you read up on convolution (or rather Cross-correlation )
The reason being, there are many highly optimized library functions (eg. IPP) for doing this faster than you could ever hope to achieve on your own. Also this method scales to higher dimensions as well.
Also, this won't necessarily give you a "match", but rather a "peak" in a correlation map which will correspond to the match if that peak is equal to the sum of squared coefficients of the pattern you are searching for.

Related

Does Eigen "Sparse matrix format" example contains error?

Eigen 3.3.7 documentation for SparceMatrix
http://eigen.tuxfamily.org/dox/group__TutorialSparse.html
seems to contain an error in Sparse matrix format section:
This storage scheme is better explained on an example. The following matrix
0 3 0 0 0
22 0 0 0 17
7 5 0 1 0
0 0 0 0 0
0 0 14 0 8
and one of its possible sparse, column major representation:
Values: 22 7 _ 3 5 14 _ _ 1 _ 17 8
InnerIndices: 1 2 _ 0 2 4 _ _ 2 _ 1 4
OuterStarts: 0 3 5 8 10 12
InnerNNZs: 2 2 1 1 2
If 14 is moved from the third column to the second (i.e. its indices changed from [4,2] to [4,1]), then the first two arrays, Values and InnerIndices, make sense. OuterStarts doesn't seem to be correct for either 14 position, while InnerNNZs makes sense for 14 being in [4,2] element of the matrix, but is inconsistent with Values array.
Is this example incorrect or am I missing something?
In general, what is the best way of figuring out Eigen, besides examining the source code? I normally look at tests and examples, but building most benchmark and tests for sparse matrices results in compilation errors (were these tests written for older version of Eigen and not updated for version 3?)...
The key is that the user is supposed to reserve at least as many entries per column as they need. In this example the user only reserved 2 entries for the second column, so if you were to try to add another entry to that column, it would probably require an expensive reallocation, or at least a complicated shift to "steal" an unused entry from another column. (I have no idea how this is implemented.)
Upon a cursory look at the documentation you linked to, I didn't see anything about moving entries like you're trying to do. I'm not sure that Eigen supports such an operation. (Correct me if I'm wrong.) I'm also not sure why you would want to do that.
Your final question is probably too broad. I'm not an expert at Eigen, but it seems like a mature, powerful, and well-documented library. If you have any specific problems compiling examples, you should post them here or on an Eigen specific forum. Many people at scicomp.SE are well-versed in Eigen and are accommodating.

Sed: Substitute letters on certain positions

I have a file with the structure:
N1H3O1 C2H2
C1H4 H201
C1H1N1 N1H3
C2N1O1P1H3 P5
What I am trying to do is to count the sum of coefficients in each of the formulae. Thus, the desire output is:
1+3+1 5 2+2 4
1+4 5 2+1 3
1+1+1 3 3+1 4
2+1+1+1+3 8 5 5
What I did is a simple replacement of each letter with "+" and then deleting the first " +".
I however would like to know how to do it in a more proper way in sed, using branch and flow operators.
The problem with your input is the 0 which is used instead of O, which might make it difficult to design a regular expression for it, which you can see here:
([^A-Z]+)*([0-9]+)
Other than that, you might be able to capture the numbers by simply adding ([^A-Z]+).
However, you may not wish to do this task with regular expression, since your data except for that 0 is pretty structured, and you could maybe write a script to do so.

bash sort so that results are in numeric order as well as string order

Let's say I was comparing two adjacent lines to each other after running sort -u on a file. I find they both match n-characters over from the left side, then begin to disagree at some point, and where the disagreement begins, the first line had a digit "0" to "9". The second line has a non-digit. I want the two lines to swap positions. Why do I want this? Because the digit in the first line meand it is a longer number, and needs to go behind the other, so that these lines, regardless of the digit value, will rearrange from this:
xxxx-xxxx-xxxxxxx.xxxxxxx.xxxx.DD-xx.x.x.x
xxxx-xxxx-xxxxxxx.xxxxxxx.xxxx.D-xx.x.x.x
to this:
xxxx-xxxx-xxxxxxx.xxxxxxx.xxxx.D-xx.x.x.x
xxxx-xxxx-xxxxxxx.xxxxxxx.xxxx.DD-xx.x.x.x
And this:
1
10
11
12
13
14
15
2
3
4
5
6
7
8
9
becomes this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
because it forces numeric values with the same number of digits to be
compared with each other, as those grouped from the left with more digits are moved behind those with fewer digits.
My logic might break down at some point, but until I can code it, I can't check the results returned. So does anybody know how to do this in bash?
sort -g (general numeric sort) should do the trick.

MATLAB IF VALUE LESS THAN

Hi I am writing a matlab code at the moment. I am trying to compare the values in a list to the number 10 and if the value is less than 10 add 1 to the total. However I cannot seem to get the code right. My code so far
tot = 0
for i=1:n
if(x(i)<10)
tot = +1
else
y=0;
end
end
tot
The value I get for tot always = 1 and never increases? Can someone help edit this or if not provide a solution to the problem?
I would agree with the answer mentioned above, that one should avoid for loops for this. There can be a faster solution. Since, he is just interested in the counts, and not value of numbers, so there is no need to index things back.
Given:
a = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
Computing numbers less than 10 (you could put any number here)
answer = sum(a<10);
Good luck!
In languages like MATLAB and R, you really should not use for loops like this, even as an exercise. Each variable can be a vector, and operations can occur on the whole vector at once, rather than element-by-element.
Given:
x = [ 1 2 3 4 11 12 13 14 15 16 ]
To generate a list of all x less than 10 you would say:
x(x<10)
So to count them:
total = length(x(x<10))
No loop needed or wanted!

C++ Algorithm: Pick n numbers from 2D Matrix based on certain conditions

I have a 2D matrix of size [3][x] filled with numbers. I want to pick say x numbers from this matrix based on the condition
exactly one number from each column.
up to a Max of 'm' numbers from each row (total of all the 3 rows should be x numbers and 3m > x)
I want to find the least possible sum of these selected x numbers.
I was able to pick the numbers based on iterative approach of finding the 'x' small numbers based on above conditions from the matrix. But my answer is not optimal.
E.g.:
5 9 . . . .
6 15 . . . .
7 19 . . . .
Lets say 5 is picked up initially(so 6 and 7 cannot be picked now). Later on we try to pick 9 but if m elements of row(0) are over we will have to pick 15. Now our solution will be 5+15 = 20 but we could have used 6+9 = 15 as optimal solution.
I am trying to optimize my solution and looking for better algorithms. Can someone provide me some good idea for optimal solution?
The problem reminds me of this one: http://projecteuler.net/problem=345
The Hungarian algorithm might work: http://en.wikipedia.org/wiki/Hungarian_algorithm