I'm using AutoML Video Intelligence and it's very tedious and I was wondering if there was an easier way to create Datasets for the object tracking. An easy way to get the time and position of the box?
I'm pretty sure that you can find the answers on the mentioned questions reading GCP knowledge base documentation in particular about AutoML Video Intelligence product.
At least Object tracking process is nicely explained in terms of implementation with either GCP console UI or constructing HTTP calls to Cloud REST AutoML API.
Furthermore, you can find example tutoring the way how to handle video segments positioning for the relevant prediction requests.
You can adjust initial question, extending it with a certain details about your use case in order to preciously address the solution.
Related
I've used the Video Intelligence API to do object tracking on video.
In the document [1], it recognizes more than 20,000 objects, places, and actions in stored and streaming video.
I have a questions. Is there any document that shows what kind of objects can be recognized or can't be recognized?
It's my first question. Thank you.
[1] https://cloud.google.com/video-intelligence
In this GCP documentation, it enumerates the categories in which Cloud Video Intelligence API can detect, analyze, track, transcribe and recognize: https://cloud.google.com/video-intelligence/docs/how-to
Among the things that are listed on the GCP documentation that Cloud Video Intelligence API can detect, track and recognize are: faces, people, shot changes, explicit content, objects, logos and text. Cloud Video Intelligence API are already pre-trained, if in case there are objects that Cloud Video Intelligence API can't recognize, you can train your own custom models using AutoML Video Intelligence. To get started with AutoML Video Intelligence, you can refer to this GCP documentation: https://cloud.google.com/video-intelligence/automl/docs/beginners-guide
As to the limitation of object that can be recognized in Cloud Video Intelligence API, there is no document that states which object are not recognizable. The only limits that are in the Cloud Video Intelligence API documentation are in terms of video size, per request and length. GCP Documentation: https://cloud.google.com/video-intelligence/quotas
So I have been collecting data of numerous text-descriptions about articles, where as each description was structred differently. Now, I would have to "create" an algorithm, which sorts out the title of that article for me what is a hard task. I have come around Google ML natural language and it seems to be able to create one for me.
Unfortunately, I am not really able to exactly find out how I can use it,
so my question is... How precisely can I set it up ? And additionally, it would be helpful to know if firebase has such a service, since I am planning to build a firebase project.
Thanks in advance for any help !
Unfortunately models created using Google AutoML Natural Language are not exportable to Tensorflow lite (mobile models). Based from your use case you will need a model for text classification, the provided link has a sample of how this model work. You can follow this tutorial to train a custom model using the data that you have so it can identify if a title of a article is a hard task or not.
Once training is done you can now:
Deploy it in Firebase
Download the model in your device and perform testing.
You can find detailed instructions from training the model to testing it on your device for either iOS or android.
We are automating the process of our deep learning project. Images are automatically uploaded to a dataset in AutoML Vision (Object detection) in the Google Cloud Platform. We have a couple of team members who regularly annotate the uploaded images by using the provided Annotation Tool in Web UI. We need to measure the productivity of our team members by counting the annotations they make for each of them. I haven't found an efficient solution yet. I would appreciate it if you could share your ideas.
There is not a feature to identify who annotated which images; however, the approach I can think of is that you can split the work between your team members and distribute the labels that each one should annotate. Then you can simply count the number annotations for each label. For instance, in from this guide you can give Baked Goods and Cheese to one collaborator and Salad and Seafood to another one, and so on, so that you can check the totals in the UI. Even, the label statistics can give you more details of annotations for each label (hence for each team member), note that statistics are only available in AutoML Vision Object Detection UI.
An automated approach, in case you are interested in, is Human Labeling Service; according to documentation, currently, it is only available by email because of the Coronavirus (COVID-19) measures
If recommendations above don't fit your needs, you could always file a Feature Request for asking the desired functionality and add the required details.
I want to develop a chatbot like application which gives response to input questions using Google Cloud Platform.
Naturally, Dialogflow is suited for this such applications. But due to business conditions, I cannot use Dialogflow.
An alternative could be AutoML Natural Language, where I do not need much machine learning expertise.
AutoML Natural Language requires documents which are labelled. These documents can be used for training a model.
My example document:
What is cost of Swiss tour?
Estimate of Switzerland tour?
I would use a label such as Switzerland_Cost for this document.
Now, in my application I would have a mapping between Labels and Responses.
During Prediction, when I give an input question to the trained model, I would get a predicted label. I can then use this label to return the mapped response.
Is there a better approach to my scenario?
I'm from Automl team. This seems like a good approach to me. People use Automl NL for intent detection, which is pretty aligned with what you try to do here.
I am working on a speech recognition task, which involves the detection of children's speaking capability, improvement over time...
I'd like to use the Google Cloud Speech to Text API for the ASR part of the detection. Then I would use the transcripts of different measurements to estimate the advancement.
But! The sentence level autocorrect of Google Speech API consistently rewrites the previous limb of the spoken sentence...
Is there a way to disable the autocorrect of this ASR?
I can't bypass this problem with the "speechContext", "single_utterance" or "maxAlternatives" options.
"single_utterance" may work with words, but it corrects the misspells..
Any advice in this field?
If you use streaming instead of batch recognize, you should receive an answer as soon as that part of the audio is transcribed, it does not wait for the rest of the sentence. You should then just store the first answer provided by the stream, not the further corrections.
This means that you don't have to wait until isFinal=True.
For a quick and dirty example of what I mean, go tho the speech API page, and run the streaming test with the developer tools open. There you'll see the streaming data received as the words are being spoken: