Conditional Random Fields

Conditional Random Fields - c++

Is there a training and optimization algorithm for 2-D (two dimensional) conditional random fields (CRF) suited for classification of imagery?
Has anyone used CRF package in R (http://crf.r-forge.r-project.org/html/CRF-package.html) for image classification? I would like to have a view of a working example code.
Thanks.

Look up on Markov Random Fields. Here's a link to a paper you might be interested in: Patric Perez: Markov Random Fields and Images (1998).

I do not think it will work alone. Since image classification is about scaling and affine transformation, so the key feature for accurate image classification is preprocessing not classification algorithm.

classification of imagery usually involves bag of words and feature pooling and stuff, whereas conditional random field is for labeling sequential data. so it might not be appropriate to use crf in this scenario.

Related

Dealing with imbalance dataset for multi-label classification

In my case, I’ve 33 labels per samples. The input label tensors for a corresponding image are like [0,0,1,0,1,1,1,0,0,0,0,0…...33]. And the samples for some labels are quite low and some are high. I'm looking for predict the regression values. So what will be the best approach to improve the prediction? I would like to apply data balancing technique. But so far I found the balancing technique available only for multi-class. I’m grateful to you if you share your best knowledge about regarding my problem or any other idea to improve the performance. Thanks in Advance.

When using a single.model to regress multiple values, it is usually beneficial to preprocess the predictions to be in roughly the same range.
Look for example on the way detection models predict (regress) bounding box coordinates: values are scaled and the net predicts only corrections.

Using AutoML to evaluate tha hyperparameters of the algorithm Word2Vec

Is it possible with AutoML (from H2O) to use only the Word2Vec algorithm and try out different values for the parameters to find out which parameter settings give me the most accurate vectors for my data set? So I don't want AutoML to apply the algorithms DeepLearning, GBM etc. to my dataset. Only the Word2Vec algorithm… How Do I do that?
So far I only managed to build a word2vec model with H2O.
I would like to test different Settings of the hyperparameters of Word2Vec with AutoML to evaluate which Settings are optimal...

The Word2Vec algorithm is a data transformation algorithm (converting rows of text to a matrix), not a supervised machine learning algorithm (which is what AutoML and all the algorithms inside of it do).
The typical way that Word2Vec is used is it apply Word2Vec to your text data so that your data can be used to train a supervised ML algorithm. From here you can run any supervised algorithm (GLM, Random Forest, GBM, etc) on this transformed dataset -- or my recommendation is to just pass the transformed data to AutoML, so it can find the best algorithm for you.
You will have to try out different settings for Word2Vec manually and see how well they do, given some particular supervised learning algorithm that you want to apply to your problem. Hopefully that clears up the confusion.

similarity measure scikit-learn document classification

I am doing some work in document classification with scikit-learn. For this purpose, I represent my documents in a tf-idf matrix and feed a Random Forest classifier with this information, works perfectly well. I was just wondering which similarity measure is used by the classifier (cosine, euclidean, etc.) and how I can change it. Haven't found any parameters or informatin in the documentation.
Thanks in advance!

As with most supervised learning algorithms, Random Forest Classifiers do not use a similarity measure, they work directly on the feature supplied to them. So decision trees are built based on the terms in your tf-idf vectors.
If you want to use similarity then you will have to compute a similarity matrix for your documents and use this as your features.

Building a simple image search using TensorFlow

I need to implement a simple image search in my app using TensorFlow.
The requirements are these:
The dataset contains around a million images, all of the same size, each containing one unique object and only that object.
The search parameter is an image taken with a phone camera of some object that is potentially in the dataset.
I've managed to extract the image from the camera picture and straighten it to rectangular form and as a result, a reverse-search image indexer like TinEye was able to find a match.
Now I want to reproduce that indexer by using TensorFlow to create a model based on my data-set (make each image's file name a unique index).
Could anyone point me to tutorials/code that would explain how to achieve such thing without diving too much into computer vision terminology?
Much appreciated!

The Wikipedia article on TinEye says that Perceptual Hashing will yield results similar to TinEye's. They reference this detailed description of the algorithm. But TinEye refuses to comment.
The biggest issue with the Perceptual Hashing approach is that while it's efficient for identifying the same image (subject to skews, contrast changes, etc.), it's not great at identifying a completely different image of the same object (e.g. the front of a car vs. the side of a car).
TensorFlow has great support for deep neural nets which might give you better results. Here's a high level description of how you might use a deep neural net in TensorFlow to solve this problem:
Start with a pre-trained NN (such as GoogLeNet) or train one yourself on a dataset like ImageNet. Now we're given a new picture we're trying to identify. Feed that into the NN. Look at the activations of a fairly deep layer in the NN. This vector of activations is like a 'fingerprint' for the image. Find the picture in your database with the closest fingerprint. If it's sufficiently close, it's probably the same object.
The intuition behind this approach is that unlike Perceptual Hashing, the NN is building up a high-level representation of the image including identifying edges, shapes, and important colors. For example, the fingerprint of an apple might include information about its circular shape, red color, and even its small stem.
You could also try something like this 2012 paper on image retrieval which uses a slew of hand-picked features such as SIFT, regional color moments and object contour fragments. This is probably a lot more work and it's not what TensorFlow is best at.
UPDATE
OP has provided an example pair of images from his application:
Here are the results of using the demo on the pHash.org website on that pair of similar images as well as on a pair of completely dissimilar images.
Comparing the two images provided by the OP:
RADISH (radial hash): pHash determined your images are not similar with PCC = 0.518013
DCT hash: pHash determined your images are not similar with hamming distance = 32.000000.
Marr/Mexican hat wavelet: pHash determined your images are not similar with normalized hamming distance = 0.480903.
Comparing one of his images with a random image from my machine:
RADISH (radial hash): pHash determined your images are not similar with PCC = 0.690619.
DCT hash: pHash determined your images are not similar with hamming distance = 27.000000.
Marr/Mexican hat wavelet: pHash determined your images are not similar with normalized hamming distance = 0.519097.
Conclusion
We'll have to test more images to really know. But so far pHash does not seem to be doing very well. With the default thresholds it doesn't consider the similar images to be similar. And for one algorithm, it actually considers a completely random image to be more similar.

https://github.com/wuzhenyusjtu/VisualSearchServer
It is a simple implementation of similar image searching using TensorFlow and InceptionV3 model. The code implements two methods, a server that handles image search, and a simple indexer that do Nearest Neighbor matching based on the pool3 features extracted.

classifying a weighted feature vector

I want to give weights to features of a data set before using the feature in any classification algorithm like KNN or J48, but i don't know how to evaluate a weighted feature vector.
dose any of the classification algorithms accept weights as input instead of just '0' and '1'?
especially, is any of Weka's ready classification functions capable of working with weights (not 0 and 1 as filters)?

In most situations, you can just scale the data set according to your weights. This is trivial to prove for Minkowski distances such as Euclidean distance.

Not all of weka's classification algorithms support weights but some do.
You need to set weight information while after loading your dataset , see example code in weka wiki. I remember that Weka J48 , decision tree , supports weights in developer version but can not find reference. There exists a patch though.
This search for feature weights in weka wiki may help.
I suggest trying add weights to data set and training in your data.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js