Representation of a simple undirected graph

Representation of a simple undirected graph - c++

I need your expertise:
I am about to implement a graph class in c++ and thinking about the right representation. The graphs are simple and undirected. Number of vertices get for now just up to 1000 but maybe higher in the future. Number of edges up to 200k and maybe higher. Each vertex got a color (int) and an id (int). Edges transport no more information than connecting to vertices.
I store the graph and just need to access if x and y are connected or not - this I need very often.
After initialising i never remove or add new vertices or edges (N = Number of vertices and M=number of edges given from the start)!
The one representation which is already available to me:
An adjacency list rolled out into just one long list. Along with this representation goes an array with starting indices for each vertex. Storage O(2M) and check if edge between x and y in an average of O(n/m)
A representation I thought of:
the idea is to, instead of rolling out the adjacency list into one array, do it with the adjacency matrix. So storage O(N^2)? Yes but I want to store an edge in one bit except of one byte.(actually 2 bits symmetricallywise)
Example: Say N=8, then create an vector<uint8_t> of length 8 (64 bit). Init each entry on 0. If there is an edge between vertex 3 and vertex 5, then add pow(2,5) to the entry of my vector belonging to vertex 3 and symmetrically. So there is a 1 in the entry of vertex 3 at position of vertex 5 exactly when there is an edge between 3 and 5. After inserting my graph into this data structure I think one should be able to access neighborhood in constant time by just a binary operation: Are 3 and 5 connected? Yes if v[3] ^ pow(2,5) == 0. When there are more vertices than 8, then every vertex needs to get more than one entry in the vector and I need to perform one modulo and one division operation for accessing the correct spot.
What do you think of the second solution - is it maybe already known and in use?
Am I wrong by thinking about an access of O(1)?
Is it to much effort for no real performance improvement?
The reason for loading both representations in one big list is due to cache improvements I was told.
I am happy to get some feedback on this idea. I might be way off - pls be kind in that case :D

A 1000x1000 matrix with 200,000 edges will be quite sparse. Since the graph is undirected, the edges in the matrix will be written twice:
VerticeA -> VerticeB and VerticeB -> VerticeA
You will end up filling up 40% of the matrix, the rest will be empty.
Edges
The best approach I can think of here is to use a 2D vector of booleans:
std::vector<std::vector<bool>> matrix(1000, std::vector<bool>(1000, false));
The lookup will take O(1) time and std::vector<bool> saves space by using a single bit for each boolean value. You will end up using 1Mbit or ~128kB (125 kB) of memory.
The storage is not necessarily an array of bool values, but the library implementation may optimize storage so that each value is stored in a single bit.
This will allow you to check for an edge like this:
if( matrix[3][5] )
{
// vertice 3 and 5 are connected
}
else
{
// vertice 3 and 5 are not connected
}
Vertices
If the id values of the vertices form a continuous range of ints (e.g. 0,1,2,3,...,999) then you could store the color information in a std::vector<int> which has O(1) access time:
std::vector<int> colors(1000);
This would use up memory equal to:
1000 * sizeof(int) = 4000 B ~ 4 kB (3.9 kB)
On the other hand, if the id values don't form a continuous range of ints it might be a better idea to use a std::unordered_map<int, int> which will on average give you O(1) lookup time.
std::unordered_map<int, int> map;
So e.g. to store and look up the color of vertice 4:
map[4] = 5; // assign color 5 to vertice 4
std::cout << map[4]; // prints 5
The amount of memory used by std::unordered_map<int, int> will be:
1000 * 2 * sizeof(int) = 8000 B ~ 8 kB (7.81 kB)
All together, for edges:
Type
Memory
Access time
std::vector<std::vector<bool>>
125 kB
O(1)
and for vertices:
Type
Memory
Access time
std::vector<int>
3.9 kB
O(1)
std::unordered_map<int, int>
7.8 kB
O(1) on avg.

If you go for a bit matrix then the memory usage is O(V^2), so ~1Mb bits or 128KB, of which slightly less than half really are duplicates.
If you make an array of the edges O(E) and another array of index into the edges from the vertexes to the first of its edge you use 200K*sizeof(int) or 800KB which is much more, half of it is also duplicates (A-B and B-A are the same) which here actually could be saved. Same if you know (or can template you out of it) that the number of vertexes can be stored in an uint16_t half can be saved again.
To save half you just check which of the Vertexes has the lower number and checks its edges.
To find out when to stop looking you use the index on the next Vertex.
So with your numbers it is fine or even good to use a bit matrix.
The first problem comes when (V^2)/8 > (E*4) though the binary search in the Edge algorithm would still be much slower than checking a bit. That would occur if we set E = V * 200 (1000 Vertexes vs 200K edges)
V*V/8 > V*200*4
V/8 > 200*4
V > 200*4*8 = 6400
That would be 5120000 ~ 5MB easily fits into a L3 cache nowadays. If the connectivity (here average number of connections per vertex) is higher than 200 so much the better.
Checking the edges will also cost lg2(connectivity)*K(mispredicts) which gets rather steep. checking the bit matrix would be O(1).
You would need to measure, among others when the bit matrix breaks the L3 significantly while the Edge list still fits L3 and when it spills over in virtual memory.
In other words with a high connectivity the bit matrix should beat the Edge list with a much lower connectivity or much higher number of vertexes the Edge list might be faster.

Related

Linear interpolation of two vector arrays with different lengths

I have two curves. One handdrawn and one is a smoothed version of the handdrawn.
The data of each curve is stored in 2 seperate vector arrays.
Time Delta is also stored in the handdrawn curve vector, so i can replay the drawing process and so that it looks natural.
Now i need to transfer the Time Delta from Curve 1 (Raw input) to Curve 2 (already smoothed curve).
Sometimes the size of the first vector is larger and sometimes smaller than the second vector.
(Depends on the input draw speed)
So my question is: How do i fill vector PenSmoot.time with the correct values?
Case 1: Input vector is larger
PenInput.time[0] = 0 PenSmoot.time[0] = 0
PenInput.time[1] = 5 PenSmoot.time[1] = ?
PenInput.time[2] = 12 PenSmoot.time[2] = ?
PenInput.time[3] = 2 PenSmoot.time[3] = ?
PenInput.time[4] = 50 PenSmoot.time[4] = ?
PenInput.time[5] = 100
PenInput.time[6] = 20
PenInput.time[7] = 3
PenInput.time[8] = 9
PenInput.time[9] = 33
Case 2: Input vector is smaller
PenInput.time[0] = 0 PenSmoot.time[0] = 0
PenInput.time[1] = 5 PenSmoot.time[1] = ?
PenInput.time[2] = 12 PenSmoot.time[2] = ?
PenInput.time[3] = 2 PenSmoot.time[3] = ?
PenInput.time[4] = 50 PenSmoot.time[4] = ?
PenSmoot.time[5] = ?
PenSmoot.time[6] = ?
PenSmoot.time[7] = ?
PenSmoot.time[8] = ?
PenSmoot.time[9] = ?
Simplyfied representation:
PenInput holds the whole data of a drawn curve (Raw Input)
PenInput.x // X coordinate)
PenInput.y // Y coordinate)
PenInput.pressure // The pressure of the pen)
PenInput.timetotl // Total elapsed time)
PenInput.timepart // Time fragments)
PenSmoot holds the data of the massaged (smoothed,evenly distributed) curve of PenInput
PenSmoot.x // X coordinate)
PenSmoot.y // Y coordinate)
PenSmoot.pressure // Unknown - The pressure of the pen)
PenSmoot.timetotl // Unknown - Total elapsed time)
PenSmoot.timepart // Unknown - Time fragments)
This is the struct that i have.
struct Pencil
{
sf::VertexArray vertices;
std::vector<int> pressure;
std::vector<sf::Int32> timetotl;
std::vector<sf::Int32> timepart;
};

[This answer has been extensively revised based on editing to the question.]
Okay, it seems to me that you just about need to interpolate the time stamps in parallel with the points.
I'm going to guess that the incoming data is something on the order of an array of points (e.g., X, Y coordinates) and an array of time deltas with the same number of each, so time-delta N tells you the time it took to get from point N-1 to point N.
When you interpolate the points, you're probably going to want to do it intelligently. For example, in the shape shown in the question, we have what look like two nearly straight lines, one with positive slope, and the other with negative slope. According to the picture, that's composed of 263 points. We could reduce that to three points and still have a fairly reasonable representation of the original shape by choosing the two end-points plus one point where the two lines meet.
We probably don't need to go quite that far though. Especially taking time into account, we'd probably want to use at least 7 points for the output--one for each end-point of each colored segment. That would give us 6 straight line segments. Let's say those are at points 0, 30, 140, 180, 200, 250, and 263.
We'd then use exactly the same segmentation on the time deltas. Add up the deltas from 0 to 30 to get an average speed for the first segment. Add up the deltas for 31 through 140 to get an average speed for the second segment (and so on to the end).
Increasing the number of points works out roughly the same way. We need to look at exactly which input points were used to create a pair of output points. For a simplistic example, let's assume we produced output that was precisely double the number of input points. We'd then interpolate time deltas exactly halfway between each pair of input points.
In the case shown in the question, we start with unevenly distributed inputs, but produce evenly distributed outputs. So the second output point might be an average of the first four input points. The next output point might be an average of three input points (and so on). In many cases, it's likely that neither end-point of a segment in the output corresponds precisely to any point in the input.
That's fine too. We interpolate between two points of the input to figure out the time hack for the starting point of the output segment. Likewise for the ending point. Then we can compute the total time it should have taken to travel between them based on the time delta between the points.
If you want to get fancy, you could use a higher order interpolation instead of linear. That does require more input points per interpolation, but it looks like you probably have plenty to do something like a quadratic or cubic interpolation (in most cases). This is likely to make the most differences at transitions--places the "pen" was accelerating or decelerating quickly. In such an place, linear interpolation can give somewhat misleading results (though, given the number of points you seem to be working with, it may not make enough difference to notice).
As an illustration, let's consider a straight line. We're going to start from 5 input points, and produce 7 output points.
So, the input points are [0, 2, 7, 10, 15], and the associated time deltas are [0, 1, 4, 8, 3].
So, out total distance traveled is 16, and we want our output points to be evenly distributed. So, the distance between output points will be 16/7 = (roughly) 2.29.
So, obviously the first output point and time are both 0. The second output point is 2.29. To compute the output time, we take the entirety of the time to the first input point (0->2), plus .29 / (7-2) * (4-1). That interpolated section gives 1.37, so our first output time delta is 2.37.
The next output point should be at a distance of 4.58. Since the second input segment goes from 2 to 7, our entire second output segment will lie within the second input segment. So, we take 2.29 / (7-2), telling use that this output segment occupies .458 of the input segment. We then multiply that by the time for the second input segment to get the time delta for the second output segment: .458 * (4-1) = 1.374.
[...and it continues on the same way until we reach the end.]

Histogram Binning of Gradient Vectors

I am working on a project that has a small component requiring the comparison of distributions over image gradients. Assume I have computed the image gradients in the x and y directions using a Sobel filter and have for each pixel a 2-vector. Obviously getting the magnitude and direction is reasonably trivial and is as follows:
However, what is not clear to me is how to bin these two components in to a two dimensional histogram for an arbitrary number of bins.
I had considered something along these lines(written in browser):
//Assuming normalised magnitudes.
//Histogram dimensions are bins * bins.
int getHistIdx(float mag, float dir, int bins) {
const int magInt = reinterpret_cast<int>(mag);
const int dirInt = reinterpret_cast<int>(dir);
const int magMod = reinterpret_cast<int>(static_cast<float>(1.0));
const int dirMod = reinterpret_cast<int>(static_cast<float>(TWO_PI));
const int idxMag = (magInt % magMod) & bins
const int idxDir = (dirInt % dirMod) & bins;
return idxMag * bins + idxDir;
}
However, I suspect that the mod operation will introduce a lot of incorrect overlap, i.e. completely different gradients getting placed in to the same bin.
Any insight in to this problem would be very much appreciated.
I would like to avoid using any off the shelf libraries as I want to keep this project as dependency light as possible. Also I intend to implement this in CUDA.

This is more of a what is an histogram question? rather than one of your tags. Two things:
In a 2D plain two directions equal by modulation of 2pi are in fact the same - so it makes sense to modulate.
I see no practical or logical reason of modulating the norms.
Next, you say you want a "two dimensional histogram", but return a single number. A 2D histogram, and what would make sense in your context, is a 3D plot - the plane is theta/R, 2 indexed, while the 3D axis is the "count".
So first suggestion, return
return Pair<int,int>(idxMag,idxDir);
Then you can make a 2D histogram, or 2 2D histograms.
Regarding the "number of bins"
this is use case dependent. You need to define the number of bins you want (maybe different for theta and R). Maybe just some constant 10 bins? Maybe it should depend on the amount of vectors? In any case, you need a function that receives either the number of vectors, or the total set of vectors, and returns the number of bins for each axis. This could be a constant (10 bins) initially, and you can play with it. Once you decide on the number of bins:
Determine the bins
For a bounded case such as 0<theta<2 pi, this is easy. Divide the interval equally into the number of bins, assuming a flat distribution. Your modulation actually handles this well - if you would have actually modulated by 2*pi, which you didn't. You would still need to determine the bin bounds though.
For R this gets trickier, as this is unbounded. Two options here, but both rely on the same tactic - choose a maximal bin. Either arbitrarily (Say R=10), so any vector longer than that is placed in the "longer than max" bin. The rest is divided equally (for example, though you could choose other distributions). Another option is for the longest vector to determine the edge of the maximal bin.
Getting the index
Once you have the bins, you need to search the magnitude/direction of the current vector in your bins. If bins are pairs representing min/max of bin (and maybe an index), say in a linked list, then it would be something like (for mag for example):
bin = histogram.first;
while ( mag > bin.min ) bin = bin.next;
magIdx = bin.index;
If the bin does not hold the index you can just use a counter and increase it in the while. Also, for the magnitude the final bin should hold "infinity" or some large number as a limit. Note this has nothing to do with modulation, though that would work for your direction - as you have coded. I don't see how this makes sense for the norm.
Bottom line though, you have to think a bit about what you want. In any case all the "objects" here are trivial enough to write yourself, or even use small arrays.

I think you should arrange your bins in a square array, and then bin by vx and vy independently.
If your gradients are reasonably even you just need to scan the data first to accumulate the min and max in x and y, and then split the gradients evenly.
If the gradients are very unevenly distributed, you might want to sort the (eg) vx first and arrange that the boundaries between each bin exactly evenly divides the values.
An intermediate solution might be to obtain the min and max ignoring the (eg) 10% most extreme values.

Finding the most efficient data structure to create an index file

I have a video file, which consists of many successive frames of binary data. Each frame has also a unique timestamp (which is NOT its sequential number in file, but rather a value, provided by the camera at the recording time). On the other hand, I've got an API function which retrieves that frame based on the sequential number of that frame. To make things a bit more complicated - I have a player, who is provided with the timestamp, and should get the binary data for that frame.
Another sad thing here: timestamps are NOT sequential. They can be sequential, but it is not guaranteed, as a wraparound may occur around max unsigned short size.
So a sequence of timestamps could be either
54567, 54568, ... , 65535, 65536 , ... or
54567, 54568, ..., 65535, 0, 1, ...
So it might look like the following:
Frame 0
timestamp 54567
binary data
........
Frame 1
timestamp 54569
binary data
........
Frame 2
timestamp 54579
binary data
.
.
.
Frame n
timestamp m
binary data
0 <= n <= 65536 (MAX_UNSIGNED_SHORT)
0 <= m <= MAX_UNSIGNED_INT
The clip player API should be able to get the binary frame by the timestamp. However, internally, I can get the frame only by its frame sequential number. So if I am asked for timestamp m, I need to iterate over n frames, to find the frame with timestamp m.
To optimize it, I chose to create an index file, which would give me a match between timestamp and the frame sequential number. And here is my question:
Currently my index file is composed of binary pairs of size 2*sizeof(unsigned int), which contains timestamp and frame sequential number. Player later on creates from that file stl map with key==timestamp, value==frame sequential number.
Is there any way to do it more efficiently? Should I create my index file as a dump of some data structure, so it could later be loaded into memory by the clip player while opening the clip, so I would have an O(1) access to frames? Do you have other suggestions?
UPD:
I have updated the names and requirements (timestamps are not necessarily sequential, and frames num bounded by MAX_UNSIGNED_SHORT value). Also wanted to thank everyone who already took the time and gave an answer. The interpolation search is an interesting idea, although I never tried it myself. I guess the question would be the delta between O(1) and O(log log N) in runtime.

It would seem that we should be able to make the following assumptions:
a) the video file itself will not be modified after it is created
b) the player may want to find successive frames i.e. when it is doing normal playback
c) the player may want to find random frames i.e. when it is doing FF, REW or skip by or to chapter
Given this, why not just do a HashMap associating the Frame Id and the Frame Index? You can create that once, the player can read it and then can do an easy and time bounded look up of the requested Frame.

There are a series of tradeoffs to make here.
Your index file is already a dump of a data structure: an array. If you don't plan on often inserting or deleting frames, and keep this array in a sorted order, it's easy to do a binary search (using std::binary_search) on the array. Insertion and deletion take O(N), but searching is still O(log N). The array will occupy less space in memory, and will be faster to read and write from your index file.
If you're doing a lot of inserting and removing frames, then coverting to a std::map structure will give you better performance. If the number of frames is large, or you want to store more metadata with them, you might want to look at a B-tree structure, or just use an embedded database like Sqlite or BerkeleyDB. Both of these implement B-tree indexing and are well-tested pieces of code.

Simply store the frame data in an array where indices represent frame numbers. Then create a hash map from camera indices to frame numbers. You can get the frame belonging to either a frame number or camera index in O(1) while barely using more memory than your current approach.
Alternatively, you can maintain an array, indexed by frame number, that stores a (camera index, data) pair and perform a O(log n) binary search on it when you need to access it by camera index. This takes advantage of the fact that the camera indices are sorted.
In C++'s standard library, hash maps are available as std::unordered_map (if you compiler/STL supports them, which might not be the case since they have only recently been added to the C++ standard), although the tree-based std::map (with O(log n) lookup) is probably good enough for this purpose.
A binary search implementation is available as std::binary_search.

Filtering 1bpp images

I'm looking to filter a 1 bit per pixel image using a 3x3 filter: for each input pixel, the corresponding output pixel is set to 1 if the weighted sum of the pixels surrounding it (with weights determined by the filter) exceeds some threshold.
I was hoping that this would be more efficient than converting to 8 bpp and then filtering that, but I can't think of a good way to do it. A naive method is to keep track of nine pointers to bytes (three consecutive rows and also pointers to either side of the current byte in each row, for calculating the output for the first and last bits in these bytes) and for each input pixel compute
sum = filter[0] * (lastRowPtr & aMask > 0) + filter[1] * (lastRowPtr & bMask > 0) + ... + filter[8] * (nextRowPtr & hMask > 0),
with extra faff for bits at the edge of a byte. However, this is slow and seems really ugly. You're not gaining any parallelism from the fact that you've got eight pixels in each byte and instead are having to do tonnes of extra work masking things.
Are there any good sources for how to best do this sort of thing? A solution to this particular problem would be amazing, but I'd be happy being pointed to any examples of efficient image processing on 1bpp images in C/C++. I'd like to replace some more 8 bpp stuff with 1 bpp algorithms in future to avoid image conversions and copying, so any general resouces on this would be appreciated.

I found a number of years ago that unpacking the bits to bytes, doing the filter, then packing the bytes back to bits was faster than working with the bits directly. It seems counter-intuitive because it's 3 loops instead of 1, but the simplicity of each loop more than made up for it.
I can't guarantee that it's still the fastest; compilers and especially processors are prone to change. However simplifying each loop not only makes it easier to optimize, it makes it easier to read. That's got to be worth something.
A further advantage to unpacking to a separate buffer is that it gives you flexibility for what you do at the edges. By making the buffer 2 bytes larger than the input, you unpack starting at byte 1 then set byte 0 and n to whatever you like and the filtering loop doesn't have to worry about boundary conditions at all.

Look into separable filters. Among other things, they allow massive parallelism in the cases where they work.
For example, in your 3x3 sample-weight-and-filter case:
Sample 1x3 (horizontal) pixels into a buffer. This can be done in isolation for each pixel, so a 1024x1024 image can run 1024^2 simultaneous tasks, all of which perform 3 samples.
Sample 3x1 (vertical) pixels from the buffer. Again, this can be done on every pixel simultaneously.
Use the contents of the buffer to cull pixels from the original texture.
The advantage to this approach, mathematically, is that it cuts the number of sample operations from n^2 to 2n, although it requires a buffer of equal size to the source (if you're already performing a copy, that can be used as the buffer; you just can't modify the original source for step 2). In order to keep memory use at 2n, you can perform steps 2 and 3 together (this is a bit tricky and not entirely pleasant); if memory isn't an issue, you can spend 3n on two buffers (source, hblur, vblur).
Because each operation is working in complete isolation from an immutable source, you can perform the filter on every pixel simultaneously if you have enough cores. Or, in a more realistic scenario, you can take advantage of paging and caching to load and process a single column or row. This is convenient when working with odd strides, padding at the end of a row, etc. The second round of samples (vertical) may screw with your cache, but at the very worst, one round will be cache-friendly and you've cut processing from exponential to linear.
Now, I've yet to touch on the case of storing data in bits specifically. That does make things slightly more complicated, but not terribly much so. Assuming you can use a rolling window, something like:
d = s[x-1] + s[x] + s[x+1]
works. Interestingly, if you were to rotate the image 90 degrees during the output of step 1 (trivial, sample from (y,x) when reading), you can get away with loading at most two horizontally adjacent bytes for any sample, and only a single byte something like 75% of the time. This plays a little less friendly with cache during the read, but greatly simplifies the algorithm (enough that it may regain the loss).
Pseudo-code:
buffer source, dest, vbuf, hbuf;
for_each (y, x) // Loop over each row, then each column. Generally works better wrt paging
{
hbuf(x, y) = (source(y, x-1) + source(y, x) + source(y, x+1)) / 3 // swap x and y to spin 90 degrees
}
for_each (y, x)
{
vbuf(x, 1-y) = (hbuf(y, x-1) + hbuf(y, x) + hbuf(y, x+1)) / 3 // 1-y to reverse the 90 degree spin
}
for_each (y, x)
{
dest(x, y) = threshold(hbuf(x, y))
}
Accessing bits within the bytes (source(x, y) indicates access/sample) is relatively simple to do, but kind of a pain to write out here, so is left to the reader. The principle, particularly implemented in this fashion (with the 90 degree rotation), only requires 2 passes of n samples each, and always samples from immediately adjacent bits/bytes (never requiring you to calculate the position of the bit in the next row). All in all, it's massively faster and simpler than any alternative.

Rather than expanding the entire image to 1 bit/byte (or 8bpp, essentially, as you noted), you can simply expand the current window - read the first byte of the first row, shift and mask, then read out the three bits you need; do the same for the other two rows. Then, for the next window, you simply discard the left column and fetch one more bit from each row. The logic and code to do this right isn't as easy as simply expanding the entire image, but it'll take a lot less memory.
As a middle ground, you could just expand the three rows you're currently working on. Probably easier to code that way.

Extracting segments from a list of 8-connected pixels

Current situation: I'm trying to extract segments from an image. Thanks to openCV's findContours() method, I now have a list of 8-connected point for every contours. However, these lists are not directly usable, because they contain a lot of duplicates.
The problem: Given a list of 8-connected points, which can contain duplicates, extract segments from it.
Possible solutions:
At first, I used openCV's approxPolyDP() method. However, the results are pretty bad... Here is the zoomed contours:
Here is the result of approxPolyDP(): (9 segments! Some overlap)
but what I want is more like:
It's bad because approxPolyDP() can convert something that "looks like several segments" in "several segments". However, what I have is a list of points that tend to iterate several times over themselves.
For example, if my points are:
0 1 2 3 4 5 6 7 8
9
Then, the list of point will be 0 1 2 3 4 5 6 7 8 7 6 5 4 3 2 1 9... And if the number of points become large (>100) then the segments extracted by approxPolyDP() are unfortunately not duplicates (i.e : they overlap each other, but are not strictly equal, so I can't just say "remove duplicates", as opposed to pixels for example)
Perhaps, I've got a solution, but it's pretty long (though interesting). First of all, for all 8-connected list, I create a sparse matrix (for efficiency) and set the matrix values to 1 if the pixel belongs to the list. Then, I create a graph, with nodes corresponding to pixels, and edges between neighbouring pixels. This also means that I add all the missing edges between pixels (complexity small, possible because of the sparse matrix). Then I remove all possible "squares" (4 neighbouring nodes), and this is possible because I am already working on pretty thin contours. Then I can launch a minimal spanning tree algorithm. And finally, I can approximate every branch of the tree with openCV's approxPolyDP()
To sum up: I've got a tedious method, that I've not yet implemented as it seems error-prone. However, I ask you, people at Stack Overflow: are there other existing methods, possibly with good implementations?
Edit: To clarify, once I have a tree, I can extract "branches" (branches start at leaves or nodes linked to 3 or more other nodes) Then, the algorithm in openCV's approxPolyDP() is the Ramer–Douglas–Peucker algorithm, and here is the Wikipedia picture of what it does:
With this picture, it is easy to understand why it fails when points may be duplicates of each other
Another edit: In my method, there is something that may be interesting to note. When you consider points located in a grid (like pixels), then generally, the minimal spanning tree algorithm is not useful because there are many possible minimal trees
X-X-X-X
|
X-X-X-X
is fundamentally very different from
X-X-X-X
| | | |
X X X X
but both are minimal spanning trees
However, in my case, my nodes rarely form clusters because they are supposed to be contours, and there is already a thinning algorithm that runs beforehand in the findContours().
Answer to Tomalak's comment:
If DP algorithm returns 4 segments (the segment from the point 2 to the center being there twice) I would be happy! Of course, with good parameters, I can get to a state where "by chance" I have identical segments, and I can remove duplicates. However, clearly, the algorithm is not designed for it.
Here is a real example with far too many segments:

Using Mathematica 8, I created a morphological graph from the list of white pixels in the image. It is working fine on your first image:
Create the morphological graph:
graph = MorphologicalGraph[binaryimage];
Then you can query the graph properties that are of interest to you.
This gives the names of the vertex in the graph:
vertex = VertexList[graph]
The list of the edges:
EdgeList[graph]
And that gives the positions of the vertex:
pos = PropertyValue[{graph, #}, VertexCoordinates] & /# vertex
This is what the results look like for the first image:
In[21]:= vertex = VertexList[graph]
Out[21]= {1, 3, 2, 4, 5, 6, 7, 9, 8, 10}
In[22]:= EdgeList[graph]
Out[22]= {1 \[UndirectedEdge] 3, 2 \[UndirectedEdge] 4, 3 \[UndirectedEdge] 4,
3 \[UndirectedEdge] 5, 4 \[UndirectedEdge] 6, 6 \[UndirectedEdge] 7,
6 \[UndirectedEdge] 9, 8 \[UndirectedEdge] 9, 9 \[UndirectedEdge] 10}
In[26]:= pos = PropertyValue[{graph, #}, VertexCoordinates] & /# vertex
Out[26]= {{54.5, 191.5}, {98.5, 149.5}, {42.5, 185.5},
{91.5, 138.5}, {132.5, 119.5}, {157.5, 72.5},
{168.5, 65.5}, {125.5, 52.5}, {114.5, 53.5},
{120.5, 29.5}}
Given the documentation, http://reference.wolfram.com/mathematica/ref/MorphologicalGraph.html, the command MorphologicalGraph first computes the skeleton by morphological thinning:
skeleton = Thinning[binaryimage, Method -> "Morphological"]
Then the vertex are detected; they are the branch points and the end points:
verteximage = ImageAdd[
MorphologicalTransform[skeleton, "SkeletonEndPoints"],
MorphologicalTransform[skeleton, "SkeletonBranchPoints"]]
And then the vertex are linked after analysis of their connectivity.
For example, one could start by breaking the structure around the vertex and then look for the connected components, revealing the edges of the graph:
comp = MorphologicalComponents[
ImageSubtract[
skeleton,
Dilation[vertices, CrossMatrix[1]]]];
Colorize[comp]
The devil is in the details, but that sounds like a solid starting point if you wish to develop your own implementation.

Try math morphology. First you need to dilate or close your image to fill holes.
cvDilate(pimg, pimg, NULL, 3);
cvErode(pimg, pimg, NULL);
I got this image
The next step should be applying thinning algorithm. Unfortunately it's not implemented in OpenCV (MATLAB has bwmorph with thin argument). For example with MATLAB I refined the image to this one:
However OpenCV has all needed basic morphological operations to implement thinning (cvMorphologyEx, cvCreateStructuringElementEx, etc).
Another idea.
They say that distance transform seems to be very useful in such tasks. May be so.
Consider cvDistTransform function. It creates to an image like that:
Then using something like cvAdaptiveThreshold:
That's skeleton. I guess you can iterate over all connected white pixels, find curves and filter out small segments.

I've implemented a similar algorithm before, and I did it in a sort of incremental least-squares fashion. It worked fairly well. The pseudocode is somewhat like:
L = empty set of line segments
for each white pixel p
line = new line containing only p
C = empty set of points
P = set of all neighboring pixels of p
while P is not empty
n = first point in P
add n to C
remove n from P
line' = line with n added to it
perform a least squares fit of line'
if MSE(line) < max_mse and d(line, n) < max_distance
line = line'
add all neighbors of n that are not in C to P
if size(line) > min_num_points
add line to L
where MSE(line) is the mean-square-error of the line (sum over all points in the line of the squared distance to the best fitting line) and d(line,n) is the distance from point n to the line. Good values for max_distance seem to be a pixel or so and max_mse seems to be much less, and will depend on the average size of the line segments in your image. 0.1 or 0.2 pixels have worked in fairly large images for me.
I had been using this on actual images pre-processed with the Canny operator, so the only results I have are of that. Here's the result of the above algorithm on an image:
It's possible to make the algorithm fast, too. The C++ implementation I have (closed source enforced by my job, sorry, else I would give it to you) processed the above image in about 20 milliseconds. That includes application of the Canny operator for edge detection, so it should be even faster in your case.

You can start by extraction straight lines from your contours image using HoughLinesP which is provided with openCV:
HoughLinesP(InputArray image, OutputArray lines, double rho, double theta, int threshold, double minLineLength = 0, double maxLineGap = 0)
If you choose threshold = 1 and minLineLenght small, you can even obtain all single elements. Be careful though, since it yields many results in case you have many edge pixels.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js