Is there a publication that explains how they evaluate how "sure" the automatic system is for the label it assigns? I understand part of the labelling process is done by humans but I'm interested in how they evaluate the confidence of the prediction.
I suggest you read the Ground Trust FAQ page as it addresses of your concerns.
Q: How does Amazon SageMaker Ground Truth help with increasing the accuracy of my training datasets?
A: Amazon SageMaker Ground Truth offers the following features to help you increase the accuracy of data labeling performed by humans:
(a) Annotation consolidation: This counteracts the error/bias of individual workers by sending each data object to multiple workers and then consolidating their responses (called “annotations”) into a single label. It then takes their annotations and compares them using an annotation consolidation algorithm. This algorithm first detects outlier annotations that are disregarded. It then performs a weighted consolidation of the annotations, assigning higher weights to more reliable annotations. The output is a single label for each object.
(b) Annotation interface best practices: These are features of the annotation interfaces that enable workers to perform their tasks more accurately. Human workers are prone to error and bias, and well-designed interfaces improve worker accuracy. One best practice is to display brief instructions along with good and bad label examples in a fixed side panel. Another best practice is to darken the area outside of the box bounding boundary when workers are drawing the bounding box on an image.
Related
For reinforcement learning experiments, I often run independent repetitions for each hyperparameter setting. Ideally, I would visualize the average of these repetitions (per setting), including confidence intervals around the mean learning curve. I suppose many RL researchers have this issue.
I run my hyperparameter experiments with Ray Tune, which automatically visualizes each independent run in Tensorboard (which is very useful). It would be really helpful if I could automatically aggregate the results over the repetitions (with confidence), and then compare the different hyperparameter settings (and plot them for papers). I could not find any method in Tune/Tensorboard to do this, nor an intergration with another framework that can do this.
As an example, I would ideally get a curve like below, but then directly in Tensorboard
I suppose more people will have this issue, and was curious whether anyone knows a package or quick solution to go from Ray Tune output to the above figure (without coding it manually).
Thanks a lot!
Best regards,
Thomas
I am working on a project where I need to take a picture of a surface using my phone and then analyze the surface for defects and marks.
I want to take the image and then send it to the cloud for analysis.
Does AWS-Rekognition provide such a service to analyze the defects I want to study?
Or Would I need to write a custom code using opencv or something?
While Amazon Rekognition can detect faces and objects, it has no idea what it meant by a "defect".
Imagine if you had 10 people lined up and showed them a picture, asking them if they could see a defect. Would they all agree? They'd probably ask you what you mean by a defect and how bad something has to look before it could be considered a defect.
Similarly, you would need to train a system on what is a valid defect and what is not a defect.
This is a good use case for Amazon SageMaker. You would need to provide LOTS of sample images of defects and not-defects. They should be shot from many different angles in many different lighting situations, similar to the images you would want to test.
It would then build a model that could be used for detecting 'defects' in supplied images. You could even put the model into an AWS DeepLens unit to do the processing locally.
Please note, however, that you need to provide a large number of images (hundreds is good, thousands is better) to be able to train it to correct detect 'defects'.
In a computer vision project, the image I want to process can be partitioned in "zones" containining multiple products of the same kind.
Provided that I can retrieve image information of all the possible kinds of product, I need to detect which kind is present in each zone, without the need to detect the position of each single product. In summary, I need to recognize "sets of products".
As additional info, the products have not a rigid shape, they are not oriented in the same manner and luminosity changes (so I am basically searching for shape, orientation and luminosity invariant approaches).
The reliable info I can exploit is that the products logos - or parts of them - are often visible and the products are quite colorful.
I would like to know about possible approaches that exploit the fact that I know the zones partition and approaches that do not exploit it.
Edit: I didn't make this clear, for this is for the possible future development of an application.
I am looking into individual facial recognition for an application, but an essential part of this seems to be a fairly large training set of images for each individual to be recognized.
Is it important for the images to be taken at different times in different environments, or could several images captured over a few seconds with a handheld camera possibly provide the necessary variations for a good training set?
(This isn't for human facial recognition, by the way, so existing tools and databases won't really help too much. I'm aware that 2D image recognition can not necessarily be applied to all species; let's just assume that it does work in my use case.)
This paper may answer some of your questions:
http://uran.donetsk.ua/~masters/2011/frt/dyrul/library/article8.pdf
From the pattern classification point of view, a usual problem in face recognition is having a plethora of classes and only a few, possibly only one, training sample(s) per class. For this reason, more sophisticated classifiers are not needed but a nearest-neighbour classifier is used.
While I'm not an expert on the subject, it appears to be a common problem to have only one image per person as a training sample and one that has been solved with at least some level of accuracy in controlled lighting/positional situations.
To specifically answer your question, a training set that had multiple images of each person with little or no variation ("several images captured over a few seconds with a handheld camera"), would not be as valuable as one that had more variation (e.g. different facial expressions, lighting, backgrounds).
I was given a project on vehicle type identification with neural network and that is how I came to know the awesomeness of neural technology.
I am a beginner with this field, but I have sufficient materials to learn it. I just want to know some good places to start for this project specifically, as my biggest problem is that I don't have very much time. I would really appreciate any help. Most importantly, I want to learn how to match patterns with images (in my case, vehicles).
I'd also like to know if python is a good language to start this in, as I'm most comfortable with it.
I am having some images of cars as input and I need to classify those cars by there model number.
Eg: Audi A4,Audi A6,Audi A8,etc
You didn't say whether you can use an existing framework or need to implement the solution from scratch, but either way Python is excellent language for coding neural networks.
If you can use a framework, check out Theano, which is written in Python and is the most complete neural network framework available in any language:
http://www.deeplearning.net/software/theano/
If you need to write your implementation from scratch, look at the book 'Machine Learning, An Algorithmic Perspective' by Stephen Marsland. It contains example Python code for implementing a basic multilayered neural network.
As for how to proceed, you'll want to convert your images into 1-D input vectors. Don't worry about losing the 2-D information, the network will learn 'receptive fields' on its own that extract 2-D features. Normalize the pixel intensities to a -1 to 1 range (or better yet, 0 mean with a standard deviation of 1). If the images are already centered and normalized to roughly the same size than a simple feed-forward network should be sufficient. If the cars vary wildly in angle or distance from the camera, you may need to use a convolutional neural network, but that's much more complex to implement (there are examples in the Theano documentation). For a basic feed-forward network try using two hidden layers and anywhere from 0.5 to 1.5 x the number of pixels in each layer.
Break your dataset into separate training, validation, and testing sets (perhaps with a 0.6, 0.2, 0.2 ratio respectively) and make sure each image only appears in one set. Train ONLY on the training set, and don't use any regularization until you're getting close to 100% of the training instances correct. You can use the validation set to monitor progress on instances that you're not training on. Performance should be worse on the validation set than the training set. Stop training when the performance on the validation set stops improving. Once you've accomplished this you can try different regularization constants and choose the one that results in the best validation set performance. The test set will tell you how well your final result is performing (but don't change anything based on test set results, or you risk overfitting to that too!).
If your car images are very complex and varied and you cannot get a basic feed-forward net to perform well, you might consider using 'deep learning'. That is, add more layers and pre-train them using unsupervised training. There's a detailed tutorial on how to do this here (though all the code examples are in MatLab/Octave):
http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial
Again, that adds a lot of complexity. Try it with a basic feed-forward NN first.