I want to use google cloud vision API in my android app to detect whether the uploaded picture is mainly food or not. the problem is that the response JSON is rather big and confusing. it says a lot about the picture but doesn't say what the whole picture is of (food or something like that). I contacted the support team but didn't get an answer.
What you really want is a custom classification, not specifically raw Cloud Vision annotation.
Either use the https://cloud.google.com/automl/ or invent an own wheel like I did: https://stackoverflow.com/a/55880316/322020
Related
Recently i find an OCR tool, which is called PaddleOCR. Has anyone used it, and how this OCR system preformance compare to Google Cloud Vision API?
I heard PaddleOCR called itself an industry-level open-sourced OCR engine, so I test a few images between it and Google Cloud Vision.
Generally speaking, commercial APIs like Google Cloud and Azure suppose to work better than the open-sourced OCR engine, it does, but for some scenarios, it's not too far away.
If the text is clear and flat, both work great. The main difference is the result format. Google API gives you rich content including block, paragraph, and word location information. PaddleOCR only returns the result according to the text line (transcriptions and locations).
If your test images are more complicated, like curved text, handwriting, or blurry. Commercial APIs probably work great than the open-sourced engine. However, when it can not meet your needs, try to use PaddleOCR training a new model.
Here is some visualization images:
PaddleOCR:
test1
test2
Google Cloud Vision API:
test1
test2
So I have been collecting data of numerous text-descriptions about articles, where as each description was structred differently. Now, I would have to "create" an algorithm, which sorts out the title of that article for me what is a hard task. I have come around Google ML natural language and it seems to be able to create one for me.
Unfortunately, I am not really able to exactly find out how I can use it,
so my question is... How precisely can I set it up ? And additionally, it would be helpful to know if firebase has such a service, since I am planning to build a firebase project.
Thanks in advance for any help !
Unfortunately models created using Google AutoML Natural Language are not exportable to Tensorflow lite (mobile models). Based from your use case you will need a model for text classification, the provided link has a sample of how this model work. You can follow this tutorial to train a custom model using the data that you have so it can identify if a title of a article is a hard task or not.
Once training is done you can now:
Deploy it in Firebase
Download the model in your device and perform testing.
You can find detailed instructions from training the model to testing it on your device for either iOS or android.
I have a text in japanese that I'm turning into an mp3 with the Google Cloud Text to Speech functionality.
I also want to have word timestamps for the mp3 that gets returned by Google.
Google Speech to Text offers this functionality but when I submit the files I get from TTS to STT, the result is not always good.
What is the best way to also get word timestamps for the TTS mp3?
Google Cloud Speech-to-Text it's a ML based service, so it's expected that the results are not always as "good" as you may expect them, it has it's limitations.
What I could suggest is to take a look at their relevant documentation about this topic like the best practices, the guide and the basics page that talk about it. Additionally, you could take a look at the issues within their issue tracker platform, like for example this issue for additional information on it and even if you find a reproducible issue within the service you can publish it there, so their team can be aware of it.
I want to be able to select similar images using Google Cloud Vision AI out of range of images that I provide.
It seems there is Web Detection feature allows to show similar images across the web but I want to search across user provided images or even just within a particular website (not across entire web).
Is this possible to do?
There is no built in feature that allows you to do that on the Google Cloud Vision API, but what you can do is fetch the URL of the matching images detected across the web and filter the ones you are interested in. You can follow this tutorial.
I hope that helps
You could try out Vision API's ProductSearch: https://cloud.google.com/vision/product-search/docs/
See this answer: https://stackoverflow.com/a/58402071/11201290
I want to develop a chatbot like application which gives response to input questions using Google Cloud Platform.
Naturally, Dialogflow is suited for this such applications. But due to business conditions, I cannot use Dialogflow.
An alternative could be AutoML Natural Language, where I do not need much machine learning expertise.
AutoML Natural Language requires documents which are labelled. These documents can be used for training a model.
My example document:
What is cost of Swiss tour?
Estimate of Switzerland tour?
I would use a label such as Switzerland_Cost for this document.
Now, in my application I would have a mapping between Labels and Responses.
During Prediction, when I give an input question to the trained model, I would get a predicted label. I can then use this label to return the mapped response.
Is there a better approach to my scenario?
I'm from Automl team. This seems like a good approach to me. People use Automl NL for intent detection, which is pretty aligned with what you try to do here.