GCP Video Intelligence API Object Tracking - google-cloud-platform

I've used the Video Intelligence API to do object tracking on video.
In the document [1], it recognizes more than 20,000 objects, places, and actions in stored and streaming video.
I have a questions. Is there any document that shows what kind of objects can be recognized or can't be recognized?
It's my first question. Thank you.
[1] https://cloud.google.com/video-intelligence

In this GCP documentation, it enumerates the categories in which Cloud Video Intelligence API can detect, analyze, track, transcribe and recognize: https://cloud.google.com/video-intelligence/docs/how-to
Among the things that are listed on the GCP documentation that Cloud Video Intelligence API can detect, track and recognize are: faces, people, shot changes, explicit content, objects, logos and text. Cloud Video Intelligence API are already pre-trained, if in case there are objects that Cloud Video Intelligence API can't recognize, you can train your own custom models using AutoML Video Intelligence. To get started with AutoML Video Intelligence, you can refer to this GCP documentation: https://cloud.google.com/video-intelligence/automl/docs/beginners-guide
As to the limitation of object that can be recognized in Cloud Video Intelligence API, there is no document that states which object are not recognizable. The only limits that are in the Cloud Video Intelligence API documentation are in terms of video size, per request and length. GCP Documentation: https://cloud.google.com/video-intelligence/quotas

Related

How are Speech-to-Text and Video Intelligence SPEECH_TRANSCRIPTION related?

My goal is to process several videos using a speech-to-text model.
Google confusingly has two products that seem to do the same thing.
What are the major differences between these offering?
Google Cloud Speech-to-Text: https://cloud.google.com/speech-to-text/docs/basics
Speech-to-Text has an "enhanced video" model for interpreting the audio.
Google Video Intelligence: https://cloud.google.com/video-intelligence/docs/feature-speech-transcription
VI has the option to request a SPEECH_TRANSCRIPTION feature
The main difference between the two are the input used. Speech to Text API only accepts audio inputs while Video Intelligence accepts video inputs.
As mentioned in your question "Speech to Text has an enhance video model", it means that it has a model that is designed to transcribe audio that originated from video files. Meaning the original file was in video, then converted to audio. As seen in this tutorial, the video was converted to audio prior to transcribing it.
I suggest to use Video Intelligence API if you would like to directly transcribe the audio content into text. You can follow this tutorial on how to transcribe text using Video Intelligence API.

Google AutoML Vision API and Google Vision API Custom Algorithm

I am looking at Google AutoML Vision API and Google Vision API. I know that if you use Google AutoML Vision API that it is a custom model because you train ML models based on your own images and define your own labels. And when using Google Vision API, you are using a pretrained model...
However, I am wondering if it is possible to use my own algorithm (one which I created and not provided by Google) and using that instead with Vision / AutoML Vision API ? ...
Sure, you can definitely deploy your own ML algorithm on Google Cloud, without being tied up to the Vision or AutoML API.
Two approaches that I have used many times for this same use case:
Serverless approach, if your model is relatively light in terms of computational resources requirement - Deploy your own custom cloud function. More info here.
To be more specific, the way it works is that you just call your cloud function, passing your image directly (base64 or pointing to a storage location). The function then automatically allocates all required resources (automatically), run your custom algorithm to process the image and/or run inferences, send the results back and vanishes (all resources released, no more running costs). Neat :)
Google AI Platform. More info here
Use AI Platform to train your machine learning models at scale, to host your trained model in the cloud, and to use your model to make predictions about new data.
In doubt, go for AI Platform, as the whole pipeline is nicely lined-up for any of your custom code/models. Perfect for deployment in production as well.

Exporting Google cloud video intelligence model

I've trained a video classification model using Google's video intelligence platform, I want to now download the model to predict on-prem for security purpose but I don't see anyway of exporting the model. Is there any way to do so?
I inform you that indeed you are right. As of today the AutoML Video Intelligence is on Beta and there is no way to export your model.
I would advise you to stay alert for the Release Notes to check for updates on the product.

Google AutoML Video Intelligence Tools?

I'm using AutoML Video Intelligence and it's very tedious and I was wondering if there was an easier way to create Datasets for the object tracking. An easy way to get the time and position of the box?
I'm pretty sure that you can find the answers on the mentioned questions reading GCP knowledge base documentation in particular about AutoML Video Intelligence product.
At least Object tracking process is nicely explained in terms of implementation with either GCP console UI or constructing HTTP calls to Cloud REST AutoML API.
Furthermore, you can find example tutoring the way how to handle video segments positioning for the relevant prediction requests.
You can adjust initial question, extending it with a certain details about your use case in order to preciously address the solution.

Is it necessary to have data in the form of images to use Google Vision API for object identification?

I'm planning to use Google Vision Api to identify and count objects from a video. I can get the data in terms of a good quality video format only. So, I want to know whether I can use it or will have to train the model with images only?