Matlab: how to work with sparse keys to access sparse data? - regex

I am trying to access the sparse mlf with the keys such as BEpos and BEneg where one key per line. Now the problem is that most commands are not meant to deal with too large input: bin2dec requires clean binary numbers without spaces but the regexp hack fails to too many rows -- and so on.
How to work with sparse keys to access sparse data?
Example
K>> mlf=sparse([],[],[],2^31,1);
BEpos=Cg(pos,:)
BEpos =
(1,1) 1
(2,3) 1
(2,4) 1
K>> mlf(bin2dec(num2str(BEpos)))=1
Error using bin2dec (line 36)
Binary string must be 52 bits or less.
K>> num2str(BEpos)
ans =
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
K>> bin2dec(num2str('1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'))
Error using bin2dec (line 36)
Binary string must be 52 bits or less.
K>> regexprep(num2str(BEpos),'[^\w'']','')
Error using regexprep
The 'STRING' input must be a one-dimensional array
of char or cell arrays of strings.
Manually works
K>> mlf(bin2dec('1000000000000000000000000000000'))
ans =
All zero sparse: 1-by-1

Consider a different approach using manual binary to decimal conversions:
pows = pow2(size(BEpos,2)-1 : -1 : 0);
inds = uint32(BEpos*pows.')
I haven't benchmarked this, but it might work faster than bin2dec and cell arrays.
How it works
This is pretty simple: the powers of 2 are calculated and stored in pows (assuming the MSB is in the leftmost position). Then they are multiplied by the bits in the matching positions and summed to produce the corresponding decimal values.

Try to index with this:
inds = uint32( bin2dec(cellstr(num2str(BEpos,'%d'))) );

Related

Retrieve multiple ArrayFire subarrays from min/max data points

I have an array with sections of touching values in it. For example:
0 0 1 0 0 0 0 0 0 0
0 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 2 2 2 0 0
0 0 0 0 0 0 0 2 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 3 0 0 0 0 0 0 0
0 0 3 0 0 0 0 0 0 0
0 0 3 0 0 0 0 0 0 0
from this, I created a set of af::arrays: minX, maxX, minY, maxY. These define the box that encloses each group.
so for this example:
minX would be: [1,5,2] // 1 for label(1), 5 for label(2) and 2 for label(3)
maxX would be: [3,7,2] // 3 for label(1), 7 for label(2) and 2 for label(3)
minY would be: [0,3,7] // 0 for label(1), 3 for label(2) and 7 for label(3)
maxY would be: [1,4,9] // 1 for label(1), 4 for label(2) and 9 for label(3)
So if you take the i'th element from each of those arrays, you can get the upperleft/lowerright bounds of a box that encloses the corresponding label.
I would like use these values to pull out subarrays from this larger array. My goal is to put these values enclosed in the boxes into a flat list. In GPU memory, I also have calculated how many entries I would need for each box using the max/min X/Y values. So in this example - the result of the flat list should be:
result=[0 1 0 1 1 1 2 2 2 0 0 2 3 3 3]
where the first 6 entries are from the box
______
|0 1 0 |
|1 1 1 |
------
the second 6 entries are from the box
______
|2 2 2 |
|0 0 2 |
------
and the final three entries are from the box
___
| 3 |
| 3 |
| 3 |
---
I cannot figure out how to index into this af::array with min/max values in memory that resides on the GPU (and do not want to transfer them to the CPU). I was trying to see if gfor/seq would work for me, but it appears that af::seq cannot use array data, and everything I have tried with using af::index i could not get to work for me either.
I am able to change how I represent min/max (I could store indices for upper left/lower right) but my main goal is to do this efficiently on the GPU without moving data back and forth between the GPU and CPU.
How can this be achieved efficiently with ArrayFire?
Thank you for your help
How did you get there so far? which language are you using?
I guess you could be tiling the results to 3rd dimensions to handle each regions separately and end up with min/max vectors in GPU memory.

What is the 5x5 equivalent of the 3x3 emboss kernel?

-2 -1 0
-1 1 1
0 1 2
This is 3x3 emboss kernel. How should I write this in 5x5?
As I understand, these filters take directional differences (see the wikipidea page).
We can decompose you filter into directions
0 -1 0 0 0 0 -2 0 0
0 0 0 -1 0 1 0 0 0
0 1 0 0 0 0 0 0 2
So, I think you can expand it over these 3 directions giving emphasis
0 0 -1 0 0 0 0 0 0 0 -2 0 0 0 0
0 0 -1 0 0 0 0 0 0 0 0 -2 0 0 0
0 0 0 0 0 -1 -1 0 1 1 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 2 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 2
So, the final kernel would be
-2 0 -1 0 0
0 -2 -1 0 0
-1 -1 1 1 1
0 0 1 2 0
0 0 1 0 2
May be you can also try interpolating filter coefficients marked as x
-2 x -1 0 0
x -2 -1 0 0
-1 -1 1 1 1
0 0 1 2 x
0 0 1 x 2
The simple solution to fitting any lower-dimensional convolution kernel into a higher-dimensional matrix of the same rank is to surround it by zero weights. This is especially true when you're dealing with a concept like embossing, which is arguably more interested in immediate vector of change than the rate at which it is changing. That is, for this embossing matrix,
You could equivalently use this in 5 x 5:
Granted, this will get you a different visual effect than anything with any part of the matrix filled in; but sometimes, especially with edge-detection, immediate clarity is more important. We aren't always displaying it. If this were something like a Guassian blur kernel, having a greater range could improve the effect, but embossing isn't that different conceptually from Sobel-Feldman and it may be better to keep it tight.

sed to remove certain spaces in the middle

I have a text file like this:
6.2341 -0.4024 -2.0936 Cl 0 0 0 0 0 0 0 0 0 0 0 0
0.1148 -3.7525 1.0392 S 0 0 0 0 0 0 0 0 0 0 0 0
-2.5441 -0.8745 1.3714 F 0 0 0 0 0 0 0 0 0 0 0 0
The format is: columns 1 to 10, 11 to 20, 21 to 30 are x,y,z coordinates in (10.4) format, i.e. length=10, 4 digits after the decimal point; column 31 is always a space; columns 32 to 32 are the atom type; the remaining columns are not important.
However, for some unknown reason, the atom type field is right-shifted by two columns, like this:
6.2341 -0.4024 -2.0936 Cl 0 0 0 0 0 0 0 0 0 0 0 0
0.1148 -3.7525 1.0392 S 0 0 0 0 0 0 0 0 0 0 0 0
-2.5441 -0.8745 1.3714 F 0 0 0 0 0 0 0 0 0 0 0 0
How to use the sed command and regular expression to match these lines and delete the two extra spaces?
sed -r 's/(.{30}) /\1/' will do the trick.
Group the first 30 characters, match two additional spaces, replace the whole with the grouped characters.
If you don't mind using neither sed nor regular expressions you can just use cut to remove the 2 offending characters:
$ cut --complement -c31,32 file
6.2341 -0.4024 -2.0936 Cl 0 0 0 0 0 0 0 0 0 0 0 0
0.1148 -3.7525 1.0392 S 0 0 0 0 0 0 0 0 0 0 0 0
-2.5441 -0.8745 1.3714 F 0 0 0 0 0 0 0 0 0 0 0 0

Setting the number of edges in igraph_erdos_renyi_game c++

I would like to create a directed network graph using igraph c++ in which each node is randomly connected to exactly n other distinct nodes in the network (i.e. excluding connections to itself, and loops/multiple edges to the same loop). I was thinking of using the method igraph_erdos_renyi_game, but for some reason I do not get the desired degree distribution. In particular, if I set the arguments like this:
igraph_erdos_renyi_game(&g, IGRAPH_ERDOS_RENYI_GNM, n, m, true, false);
with n = 5, m = 1, I get this Adjacency Matrix (e.g.):
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1 0 0 0 0
0 0 0 0 0
If I use igraph_k_regular_game(&g, n, m, true, false) instead, I get exactly the desired outcome, namely (e.g.):
0 0 0 0 1
1 0 0 0 0
0 0 0 0 1
0 1 0 0 0
0 0 0 1 0
So to summarize, I would like for each node to have 1 edge to n randomly selected agents. Am I misinterpreting the way the Erdos-Renyi method works, or am I passing the wrong arguments?
m is the total number of edges in erdos_renyi_game(). So just use k_regular_game(), it does exactly what you want.

Replace space with semicolon when more than one with regex

I am trying for about 2 hours, and I'm not sure whether what I want to do even works.
I have a large file with some data that looks like
43034452 LONGSHIRTPAIETTE 17.30
27.90
0110
COLOR : : : : :
: : :
-11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
43034453 LONG SHIRT PAI ETTE 16.40
25.90
0110
COLOR : : : : :
: : :
-3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
43034454 BASIC 4.99
8.90
0110
COLOR : : : : :
: : :
-5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(The file has 36k rows.)
What I want to do is to get this whole thing clean.
In the end, the rows should look like
43034452;LONGSHIRTPAIETTE;17.30;27.90;0110
43034453;LONG SHIRT PAI ETTE;16.40;25.90;0110
43034454;BASIC;4.99;8.90;0110
So there is a lot of data that I don't need. I'm using Notepad++ to do my regex.
My regex string looks like ([0-9]*)\s{6,}([A-Z]*)\s*([0-9\.]*)\s*([0-9\.]*)\s*([0-9]*) at the moment.
This brings me the first number followed by 6 spaces. (It has to be like this because some rows start with FF and FF are not letters. It's some kind of sign that I can't identify but if I let Notepad++ show all signs I see FF.)
So as a result I get
\1: 43034452
\2: LONGSHIRTPAIETTE
\3: 17.30
\4: 27.90
\5: 0110
like expected, but on the next row it stops on the space. If I add \s to the pattern, then it also selects all spaces after the word part. And I obviously can't say "only one space", can I?
So my question is, can I use regex to get a selection like the one I want?
If so, what am I doing wrong?
Try this:
([0-9]+)\s{6,}((?:[A-Z]+\ )+)\s*([0-9\.]+)\s+([0-9\.]+)\s+([0-9]+)
Note a few things:
Tightening the *s to + where this is appropriate, so you're enforcing some characters in those columns, or actual whitespace
The use of a non-capturing group to repeat one or more instances of a word then a space.
Use the below regex
([0-9]*)\s{6,}([A-Z]+(?:\s+[A-Z]+)*)\s*([0-9\.]*)\s*([0-9\.]*)\s*([0-9]*).*?(?=\n\S|$)
and then replace the match with \1;\2;\3;\4;\5
Don't forget to enable the DOTALL modifier s.
DEMO
Your approach is correct.. just replace * with + (more than one) in your regex.
/([0-9]+)\s{6,}([A-Z ]+)\s+([0-9\.]+)\s+([0-9\.]+)\s+([0-9]+)/g
See the DEMO.