OpenCV - Extracting arrows from chart - c++

I'm trying to perform OpenCV on a flow chart to extract the structure (image 1). All images are computer generated. I can extract the blocks fine, and remove them from the image so we are just left with the arrows (image 3).
The issue is I'm unsure how to extract the connections. I.e. when I apply HoughLinesP there is large number of lines generated for each arrow (image 2). Does anyone know a method of extracting the lines such that I only get one 'line' for each arrow extracted?

Since the image has no noise, using Hough transform is not optimal.
I would binarize the image so that I get all non-white pixels from the image.
Then, using a matched filter, I would find vertical and horizontal line segments.
Arrows in any of the four directions can be found in a similar fashion.
Filter for a line of 2 pixels thickness can be something like:
0 0 0 0 0 0
0 0 0 0 0 0
1 1 1 1 1 1
1 1 1 1 1 1
0 0 0 0 0 0
0 0 0 0 0 0

Related

TCL: How the regex for every line should look like?

In TCL, in output I have something like this:
ABBAA 1 BAABA 1 DNS3 0 0 200 300 400 500 0 0
ABBAA 1 BAABA 1 DNS1 0 0 200 300 400 500 0 0
ABBAA 1 BAABA 1 DNS7 0 0 200 300 400 500 0 0
ABBAB 1 BAABB 1 DNS5 0 0 200 300 400 500 0 0
ABBAB 1 BAABB 1 DNS3 0 0 200 300 400 500 0 0
I would like to sort this table alike dataset by fourth column ascending (so the first one will be row with DNS1UP1, then DNS2UP2 etc.) I figured out that regexp will be easiest method by looking for string with "DNS.." in it. But my method doesn't work exacly how I thought, because it is matching only one line or no line at all.
My method:
regexp "ABB.*DNS1.*?\N"
ABB - match beginning of new line
.* - every character between ABB and DNS..
DNS1 - match the main looking for word
.* - every character between DNS... and new line symbol
?\n - non-greedy occurence of new line
Where am I wrong?
If you have a list of lines in such a regular format, you can just lsort them… with the right options. In particular, -dictionary is good for mixed text/numbers and -index 4 lets you choose the column to sort by.
set sortedLines [lsort -index 4 -dictionary $unsortedLines]
The only possible reasonable use of regexp in this would have been in preparing the data for the sort, but that string which you provided is already sortable (assuming you've done a split $data "\n" on it to actually convert it into a list of lines and are not just using a big ol' string).

When Drawing lines with VBO, how do I specify indices?

Was having trouble with drawing a list of lines. Do indices play a role? If They do, what would their ordering be in the indices buffer?
I could not find a reliable example online.
Yes, they definitely play a role. Say I wish to draw the following 5 lines:
0---1
|\ |
| \ |
| \|
2---3
For GL_LINES each pair of indices specifies the first and the last point. The content of the indices buffer would be:
0 1 1 3 3 2 2 0 0 3
For GL_LINE_STRIP I specify the vertices in order I want to join with the lines. The content of the indices buffer would be:
0 1 3 2 0 3

Notepad++ search combination in lines

I am looking for a specific combination in a txt file that contains multiple lines (Notepad ++). The structure of a line I am looking for is as follows:
xxxxxx N N -1 -1 -1 N (end line)
So I first have an identifier of 6 or more characters, followed by 6 numbers (N) spaced by a tab. N can be values 1, 0 or -1.
I am looking for those lines that contain '-1' in position 3, 4 and 5. The other positions can take any of the 3 values.
I have searched online and applied searches such as:
\t-?\t-?\t-1\t-1\t-1\t-?
\t?.\t?.\t-1\t-1\t-1\t?.
t?.\t?.\t-1\t-1\t-1\t?.\n
\t-1\t-1\t-1\t?.\n
Yet, the last N in the line is not taken into account, so that if its value is 0 for example, that line will not be selected.
What is the way to write this search? I understand Notepad ++ is written in C++.
Can you try to follow this pattern?:
^([a-zA-Z0-9]{6,})\s*(-1|0|1)\s*(-1|0|1)\s*((-1\s*?){3})\s*(-1|0|1)\s?
https://regex101.com/r/yM5xD3/2
Explanation:
^: Start of the line.
([a-zA-Z0-9]{6,}): Any character six or more times.
\s*: space/tab/newLine zero o more times.
(-1|0|1): One of those numbers.
\s*: ...
(-1|0|1): One of those numbers.
((-1\s*?){3}): -1 one time followed by space/tab/newLine zero or more times. (The '?' means that the regex will try to get the less amount of \s as possible)
\s*: ..
(-1|0|1): ...
And the last \s?: looks for zero or one Space/tab/newLineCharacter
You can try the following regex:
^[a-zA-Z0-9]+\t(-1|0|1)\t(-1|0|1)\t[\-][1]\t[\-][1]\t[\-][1]\t(-1|0|1)$
I tried on the following sample and it worked for me.
xxxxxx 1 1 -1 -1 -1 1
xxxxxx 0 1 -1 -1 -1 0
test12 -1 1 -1 1 -1 0
xxxxxx 1 1 -1 -1 -1 0
test13 0 1 -1 -1 1 -1
Hope it helps.

Stata: looping over observations

My data set looks like this
x1
1
0
0
1
0
0
1
1
In this data set the values following 1 belongs to the same group. For example the first 2 zero belong to group 1 and the second 2 zeros belong to the second group an so on. And I would like to get a final output similar to this. Note that the delta between the two 1's is arbitrary:
x1 x2
1 1
0 1
0 1
1 2
0 2
0 2
1 3
1 4
I think I need to write a loop that goes over the observations. But I cannot figure out the logical statements that will accomplish this.
Either
gen x2 = sum(x1)
or
gen x2 = sum(x1 == 1)
is sufficient. There is a loop over observations tacit as usual there, but you don't need an explicit loop.
In detail, sum() here is a cumulative or running sum. In your case, the first solution is simple and adequate. The reason for mentioning the second solution is because it's more general: we can tag the first observation in each block or spell with 1 and then create a running sum to form blocks of 1s, 2s, and so forth.

Divide a large image into two non overlapping images whose union is the large image

Given a large image composed of smaller images stored as a matrix. I need to find out a boundary dividing the large image into two parts(not necessarily equal but preferably nearly equal) without cutting past a smaller image.
Each small image is represented by a single integer in the larger image matrix.
Ex:
1 1 2 2 2
1 1 2 2 2
3 3 3 4 4
3 3 3 4 4
is the large image matrix composed of 4 small images.
I need to find one such boundary to separate it into two smaller images such that their sizes don't differ by a very large amount.
This is my solution:
1. Start from considering the 1st row.
2. Using binary search find the start of a boundary. In above example it will be like
1 1 | 2 2 2
1 1 2 2 2
3 3 3 4 4
3 3 3 4 4
3.Proceed down until the dividing line doesn't intersect an image. If end of large image is reached then stop.
1 1 | 2 2 2
1 1 | 2 2 2
3 3 3 4 4
3 3 3 4 4
4.Again do step 1,2,3 considering the remaining rows and make horizontal line from old line to new division line.
1 1 | 2 2 2
1 1 | 2 2 2
--
3 3 3 4 4
3 3 3 4 4
1 1 | 2 2 2
1 1 | 2 2 2
-----
3 3 3 | 4 4
3 3 3 | 4 4
End of large image...Stop.
Of-course if no vertical line can be found in step 2. We can look for a horizontal line first in a similar way like in the case of:
1 1 1 1 1
1 1 1 1 1
--
3 3 3 2 2
3 3 3 2 2
and then proceed.
How can I improve on this solution?
Are there better solutions and will my algorithm fail anytime?
I will be coding in C++. A heuristic/ greedy solution will be nice as well.
If the image is somehow big enough to make sense then you could get local differences to guide your boundaries selection.
Here is an example implemented in MATLAB for simplicity but you will get the picture:
suppose we create an image similar to the one you defined:
img = [ ones(20,20), 2*ones(20,30); ones(10,20), 2*ones(10,30); 3*ones(20,30), 4*ones(20,20)]
This command creates an image 50x50, having a 20x30 sub-image 1, a 30x30 sub-image 2, a 30x20 sub-image 3 and a 20x20 sub-image 4, as depicted graphically bellow:
Ideally you would like to get the boundaries between these "trays" representing the values 1 to 4. One way to do so is to shift the image one pixel left/right and one pixel top/bottom and subtract it with the original. This will produce another image with values only in the boundary positions.
See for example in MATLAB:
mask=((img-shift(img,1) + img-shift(img',1)')~=0);
This will create a mask by adding the difference of the right-shifted image and the original with the difference of the bottom-shifted image and the original, and, finally, by comparing the result with zero (zero values will be all pixel values except in boundaries). Function shift just shifts values of a matrix right or left. There is no need to put the code here since I just want to show the concept.
So you will end-up with the following mask image:
This mask has been cropped one pixel at the right and bottom since the previous subtractions produces a border that is not needed.
In this image, true values (white pixels) are on the last pixel of the previous image, i.e. image 1 ends at the 1st boundary and image 2 begins at the next pixel, so image 1 is bounded by x=20 and y=30, and so on for the other sub-images.