I have a binary image and I'm looking for a robust way to find the lines in the shape and the topology (how the lines connect).
I have experimented in matlab (although what I'm asking for is which methods to use).
I've tried using skeletonization on the binary image and then used hough-transform, works sometimes but not a robust solution. I struggled with boundary disturbance.
Could anyone point me in a direction of which methods to use here (and in what order).
Binary file for testing
Frankly speaking, I've been monitoring this question for some time in the hope to see useful answer. The task itself does not seem to be really complex (and I'll try to prove that), but elegant solution is still a bit far from me.
Some time ago I solved similar task and it does look like with my very basic home-grown solution your initial example is easily traceable:
It is just a simple scan (1), vertical and horizontal lines identification(2) along with further analysis of the more complex areas (3). Having all areas analysed (3) it is not that hard to find intersection points and optimise them as well (4).
The result is pretty rough, but it confirms the feasibility if this approach.
I do understand this is a bit far from Matlab, but I just want to highlight several important moments:
skeletonization will potentially break the initial geometry
further analysis of the skeleton seems to be a bit tricky and unreliable
with a bit of enhanced quality your images can be traced with way more simple approach
BTW, in my approach different operations can be performed in parallel. Scan step is adjustable and even with reduced number of scans result is pretty good:
With more steps intersection points can be identified more precisely:
I came to conclusion it is really important to use all information provided by an initial image. All simplifications, etc will remove valuable facts increasing overall complexity of the task.
Update
Would this approach work if the figure did not have a majority of vertical and horizontal lines?
Those steps are pretty independent, so there is no strict requirement to have vertical or horizontal lines. Naturally, it is more complex task to identify intersections and do some additional tweaking in order to enhance accuracy.
Easy to see there are some significant errors introduced by vertical lines at the beginning and at the end of the shape. Very straightforward optimisation gives us better results:
This is indeed a tricky problem, going from a binary image to a graph (i.e. topology). Fundamentally involving crossing from the discrete world of pixels and 2D image data, to a abstract data structure of nodes and connectivity...
But what can provide the “glue” between? Quite an open question I’m afraid, requiring a complex interpretation of visual data.
Fortunately, someone else has shared a good attempt here in python: http://planet.lengrand.fr/?post_id=267
(This obviously assumes a full python installation NetworkX and any other dependencies. I did this on Mac with homebrew via a few call such as > brew install opencv; pip install networkx; brew install graph-tool; brew install graphviz. The ipython notebook below also makes use of http://scikit-image.org and http://mahotas.readthedocs.org/en/latest/ - so quite a heady cocktail of Computer Vision and Image Processing code! Finally: you’ll of course need ipython installed...)
Here is an example (which first loads in the downloaded notebook from above - all run from: > ipython —pylab):
%run C8Skeleton_to_graph-01.ipynb
import scipy.io as sio # Need this to load matlab files...
mat = sio.loadmat('bw.mat')
img = np.zeros((30,70),np.uint8) # Buffered image border
img[5:25,5:65] = mat['BW'] # Insert matrix data into middle
skeleton = mh.thin(img) # Do skeletonization...
graph = nx.MultiGraph() # Graph we’ll create
C8_Skeleton_To_Graph_01(graph, skeleton) # Do it!
figure(1)
subplot(211)
plt.imshow(img,plt.get_cmap('gray'), vmin=0, vmax=1, origin='upper');
subplot(212)
plt.imshow(skeleton,plt.get_cmap('gray'), vmin=0, vmax=1, origin='upper');
figure(2)
nx.draw(graph)
Showing that skeletonization of your originally provided Matlab data (with buffer around edges):
Results in the following graph:
Notice the graph topology and layout correspond with the structure of the thinned image - including the "spurs" created at the end. This is an ongoing problem/research area with such an approach...
EDIT: but could be possibly solved by removing stray arcs (leading to leaf nodes with degree==1) in the graph. e.g.
remove = [node for node,degree in graph.degree().items() if degree == 1]
graph.remove_nodes_from(remove)
nx.draw(graph)
Related
It seems that dlib needs a loss layer that dictates how the layers most distant to our input layer are treated. I cannot find any documentation towards the loss layers but it seems that there is no way to have just some summation layer.
Summing up all the values of the last layer would be exactly what I need for the regression, though (see also: https://deeplearning4j.org/linear-regression)
I was thinking along the lines of writing a custom loss layer but could not find information about this, either.
So, have I overseen some corresponding layer here or is there a possibility to have what I need?
The loss layers in dlib are listed in the menu on dlib's machine learning page. Look for the words "loss layers". There is lots of documentation.
The current released version of dlib doesn't include a regression loss. However, if you get the current code from github you can use the new loss_mean_squared layer to do regression. See: https://github.com/davisking/dlib/blob/master/dlib/dnn/loss_abstract.h
I'm trying to implement a program that will take a scanned (possibly rotated) document like an ID card, detect its type based on two or more image templates and normalize it (de-rotated it and resize so it matches the template). Everything will be scanned, so luckily perspective is not a problem.
I have already tried a number of approaches with no success:
I tried using openCV's features2d to detect the template and findHomograpy to normalize it but it fails extremely often. If I take a template, change it a little bit (other data/photo on ID card), rotate ~40 degrees then it usually fails, no matter what configuration of descriptors detectors and matcher I use.
Also tried this http://manpages.ubuntu.com/manpages/gutsy/man1/unpaper.1.html which is an de-rotate tool and then tried to do normal matching but unpaper doesn't work great with rotation angles greater than 20 deg.
If there's a ready solution it would be really great, a commercial library (preferably c/c++ or a command line tool) is also an option. I hate to admit that but I fail miserably when try to understand computer vision papers so liniking unfortunately won't help me.
Thank you very much for help!
I recently saw the virtual mirror concept on you tube, I tried it out and researched about it. It seems that the creators have used augmented reality so that people can see the output on their screens. On researching I found out that we identify a pattern on which a 3D image is superimposed.
Question 1:How are they able to superimpose the jewellery and track the face of the person without identifying any pattern?
I also tried to check various libraries that I can use to make a program similar to the one they show. Seems to me that a lot of people are using Android phones and iPhones and making apps that use augmented reality.
Question 2:Is there any way that I can use c++ and try to make a program that uses augmented reality?
Oh, and the most important thing, the link to the application is provided below:
http://www.boutiqueaccessories.com.au/virtual-mirror/w1/i1001664/
Do try it out. Its a good experience. :D
I'm not able to actually try the live demo, but the linked video suggests that they either use some simplified pattern recognition (get the person's outline), or they simply track you based on the initial image (with your position/texture being determined by the outline being shown.
Following the video, it's easy to see that there's no real/advanced AR behind this. The images are simply overlayed or hidden (e.g. in case it's missing track of one ear due to you looking to the side) and they're not transformed (no perspective or resizing happening). They definitely seem to track the head (or features like ears, neck, etc.). depending on your background and surroundings that's actually a rather trivial task.
Question 2: Sure! There are lots of premade toolsets out there, but you could as well use some general image processing library such as OpenCV to do the math. Augmented reality usually uses some kind of pattern (e.g. a card or page with a known pattern) to determine the correct position and transformation for the contents to be added to the image. There are also approaches using the device's orientation and perspective changes in camera images to determine depth/position (I really like this demo).
I have to process a lot of scanned IDs and I need to extract photos from them for further processing.
Here's a fictional example:
The problem is that the scans are not perfectly aligned (rotated up to 10 degrees). So I need to find their position, rotate them and cut out the photo. This turned out to be a lot harder than I originally thought.
I checked OpenCV and the only thing I found was rectangle detection but it didn't give me good results: the rectangle not always matches good enough on samples. Also its image matching algorithm works only for not-rotated image since it's just a brute force comparison.
So I though about using ARToolkit (augmented reality lib) because I know that it's able to very precisely locate given marker on an image. But it it seems that the markers have to be very simple, so I can't use a constant part of the document for this purpose (please correct me if I'm wrong). Also, I found it super-hard to compile it on Ubuntu 11.10.
OCR - haven't tried this one yet and before I start my research I'd be thankful for any suggestions what to look for.
I look for a C(preferable)/C++ solution. Python is an option too.
If you don't find another ideal solution, one method I ended up using for OCR preprocessing in the past was to convert the source images to PPM and use unpaper in Ubuntu. You can attempt to deskew the image based on whichever sides you specify as having clearly-defined edges, and there is an option to bypass the filters that would normally be applied to black and white text. You probably don't want those for images.
Example for images skewed no more than 15 degrees, using the bottom and right edges to detect rotation:
unpaper -n -dn bottom,right -dr 15 input.ppm output.ppm
unpaper was written in C, if the source is any help to you.
What kind of debugging is available for image processing/computer vision/computer graphics applications in C++? What do you use to track errors/partial results of your method?
What I have found so far is just one tool for online and one for offline debugging:
bmd: attaches to a running process and enables you to view a block of memory as an image
imdebug: enables printf-style of debugging
Both are quite outdated and not really what I would expect.
What would seem useful for offline debugging would be some style of image logging, lets say a set of commands which enable you to write images together with text (probably in the form of HTML, maybe hierarchical), easy to switch off at both compile and run time, and the least obtrusive it can get.
The output could look like this (output from our simple tool):
http://tsh.plankton.tk/htmldebug/d8egf100-RF-SVM-RBF_AC-LINEAR_DB.html
Are you aware of some code that goes in this direction?
I would be grateful for any hints.
Coming from a ray tracing perspective, maybe some of those visual methods are also useful to you (it is one of my plans to write a short paper about such techniques):
Surface Normal Visualization. Helps to find surface discontinuities. (no image handy, the look is very much reminiscent of normal maps)
color <- rgb (normal.x+0.5, normal.y+0.5, normal.z+0.5)
Distance Visualization. Helps to find surface discontinuities and errors in finding a nearest point. (image taken from an abandoned ray tracer of mine)
color <- (intersection.z-min)/range, ...
Bounding Volume Traversal Visualization. Helps visualizing a bounding volume hierarchy or other hierarchical structures, and helps to see the traversal hotspots, like a code profiler (e.g. Kd-trees). (tbp of http://ompf.org/forum coined the term Kd-vision).
color <- number_of_traversal_steps/f
Bounding Box Visualization (image from picogen or so, some years ago). Helps to verify the partitioning.
color <- const
Stereo. Maybe useful in your case as for the real stereographic appearance. I must admit I never used this for debugging, but when I think about it, it could prove really useful when implementing new types of 3d-primitives and -trees (image from gladius, which was an attempt to unify realtime and non-realtime ray tracing)
You just render two images with slightly shifted position, focusing on some point
Hit-or-not visualization. May help to find epsilon errors. (image taken from metatrace)
if (hit) color = const_a;
else color = const_b
Some hybrid of several techniques.
Linear interpolation: lerp(debug_a, debug_b)
Interlacing: if(y%2==0) debug_a else debug_b
Any combination of ideas, for example the color-tone from Bounding Box Visualization, but with actual scene-intersection and lighting applied
You may find some more glitches and debugging imagery on http://phresnel.org , http://phresnel.deviantart.com , http://picogen.deviantart.com , and maybe http://greenhybrid.deviantart.com (an old account).
Generally, I prefer to dump bytearray of currently processed image as raw data triplets and run Imagemagick to create png from it with number e.g img01.png. In this way i can trace the algorithms very easy. Imagemagick is run from the function in the program using system call. This make possible do debug without using any external libs for image formats.
Another option, if you are using Qt is to work with QImage and use img.save("img01.png") from time to time like a printf is used for debugging.
it's a bit primitive compared to what you are looking for, but i have done what you suggested in your OP using standard logging and by writing image files. typically, the logging and signal export processes and staging exist in unit tests.
signals are given identifiers (often input filename), which may be augmented (often process name or stage).
for development of processors, it's quite handy.
adding html for messages would be simple. in that context, you could produce viewable html output easily - you would not need to generate any html, just use html template files and then insert the messages.
i would just do it myself (as i've done multiple times already for multiple signal types) if you get no good referrals.
In Qt Creator you can watch image modification while stepping through the code in the normal C++ debugger, see e.g. http://labs.qt.nokia.com/2010/04/22/peek-and-poke-vol-3/