Description of the "issue":
I want to use keypoints (a.k.a. tie points) between two successive images from an Apple smartphone using ARKit but I can't find these.
I can find 3D values in world reference frame from the ARPointCloud or rawFeaturePoints but I cannot find 2D values (i.e. in the image reference frame) for each images of the pair where they were actually detected (probably using some modified SIFT detector or whatever algorithm... in fact I'd like to know which algo is used aswell).
Question:
Do you know in which object they are stored or how I can retrieve them?
I'd like to reproject them onto the images taken by the camera in an other software (python, scikit-image, or even opencv) to do some processing.
Whatever algorithm ARKit uses to generate feature points is internal and private to ARKit. As such, any intermediary results are equally hidden from public API, and both the algorithm and results are subject to change between iOS releases or across different devices.
Related
Is there any method to create a polygon(not a rectangle) around an object in an image for object recognition.
Please refer the following images:
the result I am looking for
and
the original image
.
I am not looking for bounding rectangles like this.I know the concepts of transfer learning, using pre-trained models for object recognition and other object detection concepts.
The main aim is the object detection but not giving results using bounding box but a fitter polygon instead.Link to some resources or papers will be helpful.
Here is a very simple (and a bit hacky) idea, but it might help: take a per-pixel scene labeling algorithm, e.g. SegNet, and then turn the resulting segmented image into a binary image, where the white pixels are the class of interest (in your example, white for cars and black for the rest). Now compute edges. You can add those edges to the original image to obtain a result similar to what you want.
What you want is called image segmentation, which is different to object detection. The best performing methods for common object classes (e.g. cars, bikes, people, dogs,...) do this using trained CNNs, and are usually called semantic segmentation networks awesome links. This will, in theory, give you regions in your image corresponding to the object you want. After that you can fit an enclosing polygon using what is called the convex hull.
I'm working with SDK 1.8 and I'm getting the depth stream from the Kinect. Now, I want to hold a paper of A4 size in front of the camera and want to get the co-ordinates of the corners of that paper so I can project an image onto it.
How can I detect the corners of the paper and get the co-ordinates? Does Kinect SDK 1.8 provide that option?
Thanks
Kinect SDK 1.8 does not provide this feature itself (to my knowledge). Depending on the language which you use for coding, there most certainly are libraries which allow such an operation if you segment it into steps.
OpenCV for example is quite useful in image-processing. When I once worked with the Kinect for object recognition, I used AForge with C#.
I recommend to target the challenge as follows:
Edge Detection:
You will apply edge detection algorithms such as the Canny Filter onto the image. First you will probably - depending on the library - transform your depth picture into a greyscale picture. The resulting image will be greyscale as well and the intensity of a pixel correlates with the probability of it belonging to an edge. Using a threshold, you will binarize this picture to black/white.
Hough Transformation: is used to get the position and parameters of a line within a image, which allows further calculation. Hough Transformation is VERY sensistive to its parameters and you will spend a lot of time in tuning those to get good results.
Calculation of edge points: Assuming that your Hough Transformation was successful, you can now calculate all intersections or the given lines which will yield the points that you are looking for.
All of these steps (especially Edge Detection and Hough Transformation) have been asked/answered/discussed in this forum.
If you provide code and intermediate results or further question, you can get a more detailled answer.
p.s.
I remember that the kinect was not that accurate and that noise was a topic. Therefore you might consider using a filter before doing these operations.
I need to implement a simple image search in my app using TensorFlow.
The requirements are these:
The dataset contains around a million images, all of the same size, each containing one unique object and only that object.
The search parameter is an image taken with a phone camera of some object that is potentially in the dataset.
I've managed to extract the image from the camera picture and straighten it to rectangular form and as a result, a reverse-search image indexer like TinEye was able to find a match.
Now I want to reproduce that indexer by using TensorFlow to create a model based on my data-set (make each image's file name a unique index).
Could anyone point me to tutorials/code that would explain how to achieve such thing without diving too much into computer vision terminology?
Much appreciated!
The Wikipedia article on TinEye says that Perceptual Hashing will yield results similar to TinEye's. They reference this detailed description of the algorithm. But TinEye refuses to comment.
The biggest issue with the Perceptual Hashing approach is that while it's efficient for identifying the same image (subject to skews, contrast changes, etc.), it's not great at identifying a completely different image of the same object (e.g. the front of a car vs. the side of a car).
TensorFlow has great support for deep neural nets which might give you better results. Here's a high level description of how you might use a deep neural net in TensorFlow to solve this problem:
Start with a pre-trained NN (such as GoogLeNet) or train one yourself on a dataset like ImageNet. Now we're given a new picture we're trying to identify. Feed that into the NN. Look at the activations of a fairly deep layer in the NN. This vector of activations is like a 'fingerprint' for the image. Find the picture in your database with the closest fingerprint. If it's sufficiently close, it's probably the same object.
The intuition behind this approach is that unlike Perceptual Hashing, the NN is building up a high-level representation of the image including identifying edges, shapes, and important colors. For example, the fingerprint of an apple might include information about its circular shape, red color, and even its small stem.
You could also try something like this 2012 paper on image retrieval which uses a slew of hand-picked features such as SIFT, regional color moments and object contour fragments. This is probably a lot more work and it's not what TensorFlow is best at.
UPDATE
OP has provided an example pair of images from his application:
Here are the results of using the demo on the pHash.org website on that pair of similar images as well as on a pair of completely dissimilar images.
Comparing the two images provided by the OP:
RADISH (radial hash): pHash determined your images are not similar with PCC = 0.518013
DCT hash: pHash determined your images are not similar with hamming distance = 32.000000.
Marr/Mexican hat wavelet: pHash determined your images are not similar with normalized hamming distance = 0.480903.
Comparing one of his images with a random image from my machine:
RADISH (radial hash): pHash determined your images are not similar with PCC = 0.690619.
DCT hash: pHash determined your images are not similar with hamming distance = 27.000000.
Marr/Mexican hat wavelet: pHash determined your images are not similar with normalized hamming distance = 0.519097.
Conclusion
We'll have to test more images to really know. But so far pHash does not seem to be doing very well. With the default thresholds it doesn't consider the similar images to be similar. And for one algorithm, it actually considers a completely random image to be more similar.
https://github.com/wuzhenyusjtu/VisualSearchServer
It is a simple implementation of similar image searching using TensorFlow and InceptionV3 model. The code implements two methods, a server that handles image search, and a simple indexer that do Nearest Neighbor matching based on the pool3 features extracted.
Specifically, my question is every consequent frame has different number of points and KNN/SVM fails to implement unless I have the same number of points for each frame. So how to apply ml on 3D frames which have are different in size? My ply output file consists of x,y,z coordinates for each point and more than 10000 points per frame.
You can use open3d to downsample the points to a fixed number for all point clouds and then use deep learning libraries for classification or segmentation. PointNet developed by the Stanford AI Lab is one of the best algorithms for this.
If you have 10000points per pointCloud. It's a pretty decent data precision for a 3D object. As a 3D artist, not a scientist I would try to find a hack. Like if your second pointCloud has 10065 points more or less. I will just ignore randomly the extra 65points on the second pointCloud so they match in length ( sum all your points number divide by frame number to get the reference value). But that can damage your data maybe (depends of how much they vary in length).
If I had to use scan raw data I would use a strongh geometry processing library like the C++ pointCloud library ? http://pointclouds.org/
and its python binding: http://ns50.pointclouds.org/news/2013/02/07/python-bindings-for-the-point-cloud-library/
Or a 3D software ? (Or tensor Flow ?)
You can extract global descriptors from each pointcloud and train a machine learning algorithm like a SVM or an ANN with them.
There are a lot of different global descriptors, here you can take a look at a few of them: PCL Descriptors
Once you have them train a machine learning algorithm like the ones shown in Python Machine Learning Classification
I am new to OpenCV. I would like to know if we can compare two images (one of the images made by photoshop i.e source image and the otherone will be taken from the camera) and find if they are same or not.
I tried to compare the images using template matching. It does not work. Can you tell me what are the other procedures which we can use for this kind of comparison?
Comparison of images can be done in different ways depending on which purpose you have in mind:
if you just want to compare whether two images are approximately equal (with a few
luminance differences), but with the same perspective and camera view, you can simply
compute a pixel-to-pixel squared difference, per color band. If the sum of squares over
the two images is smaller than a threshold the images match, otherwise not.
If one image is a black-white variant of the other, conversion of the color images is
needed (see e.g. http://www.johndcook.com/blog/2009/08/24/algorithms-convert-color-grayscale). Afterwarts simply perform the step above.
If one image is a subimage of the other, you need to perform registration of the two
images. This means determining the scale, possible rotation and XY-translation that is
necessary to lay the subimage on the larger image (for methods to register images, see:
Pluim, J.P.W., Maintz, J.B.A., Viergever, M.A. , Mutual-information-based registration of
medical images: a survey, IEEE Transactions on Medical Imaging, 2003, Volume 22, Issue 8,
pp. 986 – 1004)
If you have perspective differences, you need an algorithm for deskewing one image to
match the other as well as possible. For ways of doing deskewing look for example in
http://javaanpr.sourceforge.net/anpr.pdf from page 15 and onwards.
Good luck!
You should try SIFT. You apply SIFT to your marker (image saved in memory) and you get some descriptors (points robust to be recognized). Then you can use FAST algorithm with the camera frames in order to find the coprrespondent keypoints of the marker in the camera image.
You have many threads about this topic:
How to get a rectangle around the target object using the features extracted by SIFT in OpenCV
How to search the image for an object with SIFT and OpenCV?
OpenCV - Object matching using SURF descriptors and BruteForceMatcher
Good luck