Train a model to draw bounding boxes on certain objects in an image? - computer-vision

Is it possible to use GCP Machine Learning products to train a model to draw bounding boxes on certain objects in an image? I'd like to be able to feed labeled images and have it predict where that label would belong.

I think you are looking for something like this, where the Tensorflow machine learning library is used:
https://cloud.google.com/solutions/creating-object-detection-application-tensorflow
A note:
When you say that you want to be able to feed labeled images and have it predict where that label would belong, i assume you mean where that object is present in the image in terms of the bounding box coordinates. If so then the library should take care of that for you, your job is just to train the network with your labeled images.

Related

How to find logos on a website screenshot

I'm looking for a way to check if a given logo appears on a screenshot of a webpage. So basically, I need to be able to find a small predefined image on a larger image that may or may not contain the smaller image. A match could be of a different scale, somewhat different colors. I need to judge occurrence similarity as well. Need some pointers for what to look at, I've never worked with computer vision before.
Simplest yet not simple way to do it is a normal CNN trained on augmented dataset of the logos.
Trying to keep the answer short, Just make a cnn in tensorflow and train your model on tons images of logos with labels on each training image, It's a simple task and a not-very-crafty CNN must be able to get your work done.
CNN- Convolutional Neural Network
Reference : https://etasr.com/index.php/ETASR/article/view/3919

How to use CNN-LSTMs to classify image sequences for multiple bounding boxes in a video stream?

I am working on a pytorch project, where I’m using a webcam video stream. An object detector is used to find objects within the frame and each box is given an id by a tracker. Then, I want to analyse each bounding box with a CNN-LSTM and classify it (binary classification) based on the previous frame sequence of that box (for the last 5 frames). I want the program to run a close to real-time as possible.
Currently I am stuck with the CNN-LSTM part of my problem - the detector and tracker are working quite well already.
I am a little bit clueless on how to approach this task. Here are the questions I have:
1) How does inferencing work in this case? Do I have to save np arrays for each bounding box containing the last 5 frames, then add the current frame and delete the oldest one? Then use the model for each bounding box that is in the current frame. This way sounds very slow and inefficient. Is there a faster or easier way?
2) Do you have any tipps for creating the dataset? I have a couple of videos with bounding boxes and labels. Should I loop through the videos and save save each frame sequence for each bounding box in a new folder, together with a csv that contains the label? I have never worked with an CNN-LSTM, so I don’t know how to load the data for training.
3) Would it be possible to use the extracted features of the CNN in parallel? As mentioned above, The extracted features should be used by the LSTM for a binary classification problem. The classification is only needed for the current frame. I would like to use an additional classifier (8 classes) based on the extracted CNN features, also only for the current frame. For this classifier, the LSTM is not needed.
Since my explaining propably is very confusing, the following image hopefully helps with understanding what I want to build:
Architecture
This is the architecture I want to use. Is this possible using Pytorch? So far, I only worked with CNNs and LSTM seperately. Any help is apprechiated :)

How to separate same label data points by specifying marker/ symbol for the data point when visualising using Embedding Projector

I have a trained network that is giving me features of 2048 dimensions. I want to visualize them using t-sne plot through a Tensorboard embedding projector. Each of the data points belongs to some 10 different labels so this distinction I can do by specifying the different colors to different labels. However, the catch is, within 1 label, these features belong to 2 categories, one is the RGB image feature and the other is the infrared image feature. So now I need to show some marker like 'x' or 'o' which is available in Scatter plot different for RGB and infrared features.
However, I couldn't find this feature to specify marker anywhere in Embedding Projector documentation. Currently, I am using Pytorch's embedding projector using Summary writer but I don't mind switching to Tensorflow if it has such functionality.
Kindly help me if anybody knows if this can be done in a Tensorboard embedding projector.
Thanks in advance

Real-time object tracking in OpenCV

I have written an object classification program using BoW clustering and SVM classification algorithms. The program runs successfully. Now that I can classify the objects, I want to track them in real time by drawing a bounding rectangle/circle around them. I have researched and came with the following ideas.
1) Use homography by using the train set images from the train data directory. But the problem with this approach is, the train image should be exactly same as the test image. Since I'm not detecting specific objects, the test images are closely related to the train images but not essentially an exact match. In homography we find a known object in a test scene. Please correct me if I am wrong about homography.
2) Use feature tracking. Im planning to extract the features computed by SIFT in the test images which are similar to the train images and then track them by drawing a bounding rectangle/circle. But the issue here is how do I know which features are from the object and which features are from the environment? Is there any member function in SVM class which can return the key points or region of interest used to classify the object?
Thank you

How to make motion history image for presentation into one single image?

I am working on a project with gesture recognition. Now I want to prepare a presentation in which I can only show images. I have a series of images defining a gesture, and I want to show them in a single image just like motion history images are shown in literature.
My question is simple, which functions in opencv can I use to make a motion history image using lets say 10 or more images defining the motion of hand.
As an example I have the following image, and I want to show hand's location (opacity directly dependent on time reference).
I tried using GIMP to merge layers with different opacity to do the same thing, however the output is not good.
You could use cv::updateMotionHistory
Actually OpenCV also demonstrates the usage in samples/c/motempl.c