I extracted the Form data Using google Cloud vision API but I have following Problem? - computer-vision

Google Cloud Vision makes randomly boxes/section as in figure below before extract and detect document_text_detection
due to this I did not get same pattern to extract required information for my purpose using regular expression so
image 1
different form same section but different box annotations
enter image description here
question 1
Can I control the box annotations?
question 2
can I give my own box annotations during API hit?
question 3:
Any alternative solution for my case?

Related

Getting word timestamps for TTS

I have a text in japanese that I'm turning into an mp3 with the Google Cloud Text to Speech functionality.
I also want to have word timestamps for the mp3 that gets returned by Google.
Google Speech to Text offers this functionality but when I submit the files I get from TTS to STT, the result is not always good.
What is the best way to also get word timestamps for the TTS mp3?
Google Cloud Speech-to-Text it's a ML based service, so it's expected that the results are not always as "good" as you may expect them, it has it's limitations.
What I could suggest is to take a look at their relevant documentation about this topic like the best practices, the guide and the basics page that talk about it. Additionally, you could take a look at the issues within their issue tracker platform, like for example this issue for additional information on it and even if you find a reproducible issue within the service you can publish it there, so their team can be aware of it.

Google Cloud Vision - How to check logs that include original image uploaded or results of processed request

We are trying to look into the details of Google Cloud Vision transactions. We are interested on the Cloud Vision requests where the returned processing is below satisfactory (e.g. empty JSON). In general: we are interested in what input was received and what did GCV process with that?
I had assumed this would be auto-logged?
It seems that the default logging solution does not provide much information about the value of the transaction other than the time or error type. (Is there a way to dig deeper into the log?)
Is there a way to log (or somehow view the uploaded url of) the original image that the service received and/or the results of the processed request?
Could you provide an example of how to retrieve the detected results and/or the input image, say, for "DOCUMENT_TEXT_DETECTION"
Can you be a bit more specific? Which specific Google Cloud Vision service are you trying to use (Image Classification, Object Detention)? Are you using the GCP console (i.e. UI), the API, ...? Which can of information do you want to get?
In any case, you can use advanced logs to have a look at your Google Cloud Vision logs. For instance, you can use the following filter to see the error logs:
protoPayload.serviceName="vision.googleapis.com"
severity>=ERROR
Or remove the second line for getting all the logs related with Cloud Vision. You can then click on "Expand" to get all the information about the job.

Google AutoML Video Intelligence Tools?

I'm using AutoML Video Intelligence and it's very tedious and I was wondering if there was an easier way to create Datasets for the object tracking. An easy way to get the time and position of the box?
I'm pretty sure that you can find the answers on the mentioned questions reading GCP knowledge base documentation in particular about AutoML Video Intelligence product.
At least Object tracking process is nicely explained in terms of implementation with either GCP console UI or constructing HTTP calls to Cloud REST AutoML API.
Furthermore, you can find example tutoring the way how to handle video segments positioning for the relevant prediction requests.
You can adjust initial question, extending it with a certain details about your use case in order to preciously address the solution.

Google Cloud Vision AI how to select similar images from range of provided images

I want to be able to select similar images using Google Cloud Vision AI out of range of images that I provide.
It seems there is Web Detection feature allows to show similar images across the web but I want to search across user provided images or even just within a particular website (not across entire web).
Is this possible to do?
There is no built in feature that allows you to do that on the Google Cloud Vision API, but what you can do is fetch the URL of the matching images detected across the web and filter the ones you are interested in. You can follow this tutorial.
I hope that helps
You could try out Vision API's ProductSearch: https://cloud.google.com/vision/product-search/docs/
See this answer: https://stackoverflow.com/a/58402071/11201290

Is it possible for azure custom vision api to detect multiple objects in single image

I am new to Azure Cognitive services. I want to detect multiple objects in a single image. Is it possible with custom vision api.
Any help is appreciated. Thank you.
You should be able to with the Object Detection part of Custom Vision. Simply give it images of multiples to train on and it should start detecting both items.
For example, I was playing with it a while ago to see if it could detect red and white wines. After sending a few images with both to train on I started getting results like the below.