How to disable sentence-level auto correction in Google Cloud Speech-to-Text API - google-cloud-platform

I am working on a speech recognition task, which involves the detection of children's speaking capability, improvement over time...
I'd like to use the Google Cloud Speech to Text API for the ASR part of the detection. Then I would use the transcripts of different measurements to estimate the advancement.
But! The sentence level autocorrect of Google Speech API consistently rewrites the previous limb of the spoken sentence...
Is there a way to disable the autocorrect of this ASR?
I can't bypass this problem with the "speechContext", "single_utterance" or "maxAlternatives" options.
"single_utterance" may work with words, but it corrects the misspells..
Any advice in this field?

If you use streaming instead of batch recognize, you should receive an answer as soon as that part of the audio is transcribed, it does not wait for the rest of the sentence. You should then just store the first answer provided by the stream, not the further corrections.
This means that you don't have to wait until isFinal=True.
For a quick and dirty example of what I mean, go tho the speech API page, and run the streaming test with the developer tools open. There you'll see the streaming data received as the words are being spoken:

Related

Getting word timestamps for TTS

I have a text in japanese that I'm turning into an mp3 with the Google Cloud Text to Speech functionality.
I also want to have word timestamps for the mp3 that gets returned by Google.
Google Speech to Text offers this functionality but when I submit the files I get from TTS to STT, the result is not always good.
What is the best way to also get word timestamps for the TTS mp3?
Google Cloud Speech-to-Text it's a ML based service, so it's expected that the results are not always as "good" as you may expect them, it has it's limitations.
What I could suggest is to take a look at their relevant documentation about this topic like the best practices, the guide and the basics page that talk about it. Additionally, you could take a look at the issues within their issue tracker platform, like for example this issue for additional information on it and even if you find a reproducible issue within the service you can publish it there, so their team can be aware of it.

Google AutoML Video Intelligence Tools?

I'm using AutoML Video Intelligence and it's very tedious and I was wondering if there was an easier way to create Datasets for the object tracking. An easy way to get the time and position of the box?
I'm pretty sure that you can find the answers on the mentioned questions reading GCP knowledge base documentation in particular about AutoML Video Intelligence product.
At least Object tracking process is nicely explained in terms of implementation with either GCP console UI or constructing HTTP calls to Cloud REST AutoML API.
Furthermore, you can find example tutoring the way how to handle video segments positioning for the relevant prediction requests.
You can adjust initial question, extending it with a certain details about your use case in order to preciously address the solution.

Google Speech Transcription

If the Google Speech API transcribes audio at near real time, and my latency to the actual server is only 50ms, why do I receive my final streaming transcription result after 1.6s? Shouldn’t I receive it in only a couple hundred milliseconds?
You can use API Keys for authentication. Bear in mind with using the API Keys you lose the ability to track from whom the requests are being made and it might be easier for others to discover the key.
Also, you might want to check this other question: how to speed up google cloud speech

Google Speech streaming recognition slow response time

What is the fastest expected response time of the Google Speech API with streaming audio data? I am sending an audio stream to the API and am receiving the interim results with a 2000ms delay, of which I was hoping I could drop to below 1000ms. I have tested different sampling rates and different voice models.
I'm afraid that response time can't be measured or guaranteed because of the nature of the service. We don't know what is done under the hood, in fact there is no SLA for response time even though there is SLA for availability.
Something that can help you is working on building a good request:
Reducing 100-miliseconds frame size, for example, could ensure a good tradeoff between latency and efficiency.
Following Best Practices will help you to make a clean request so that the latency can be reduced.
You may want to check following links on specific uses cases to know how they addressed latency issues:
Realtime audio streaming to Google Speech engine
How to speed up google cloud speech
25s Latency in Google Speech to Text
If you really care about response time you'd better use Kaldi-based service on your own infrastructure. Something like https://github.com/alumae/kaldi-gstreamer-server together with https://github.com/Kaljurand/dictate.js
Google Cloud Speech itself works pretty fast, you can check how quick your microphone gets transcribed https://cloud.google.com/speech-to-text/.
You may probably experience buffering issue on your side, the tool you are using may buffer data before sending(buffer flush) to underlying device(stream).
You can find out how to decrease output buffer of that tool to lower values e.g. 2Kb, so data will reach Node app and Google service faster. Google recommends to send data that equals to 100ms buffer size.

Getting data from local running java app to google cloud app and back

I wanted to dive into the world of distributed systems, cloud computing, IoT, etc., and I gotta be honest, I imagined everything being a little more intuitive than it finally turned out.
I had a tiny testing architecture in mind, that I'd like to set up with Google Clouds and their services, but I am kinda stuck since I can't get my head around some concepts.
What I basically wanted to do (as a first step) is writing a simple java application that would run locally on my computer. This application should just generate random numbers and send those numbers somehow to the google cloud. On the cloud I wanted to define another java application that would manipulate those random numbers in some kind of way (it doesn't matter actually). Afterwards, the output should somehow get back to me of course. And actually, at the moment, I don't even care about how exactly. It could be somehow back to my local app (with some kind of listener, would that be possible?). But it could also simply store the results somewhere on the google cloud? Or maybe upload them to my google drive?
I guess you already noticed that - at some points - I don't even know what i want exactly, since I'm not sure of what is possible, and what not.
Could you provide me some help to get this set up?
The most important questions for me right now are:
Do I need to use a pubsub system, where my generated numbers are sent
to, and which then forwards this to the cloud app, that transforms my
data?
How do I get my data from the local app to the cloud services?
Would my data transforming app run on Google Dataflow?
Above I wrote "as a first step"... because later I would also like to send config files (for example in json format, or xml) to the cloud, and the
cloud application should transform those config files... if I get the
first scenario running the I guess this woul also be no problem
right?
Those are just a few of the questions that are on my mind currently. The most important ones I guess.
It would be a big help. Sorry, if the questions are not very precise, but I really need some kind of pointing into the right direction.
Thank you in advance!
I think it would be good to read up on some of the technologies you mention here:
Google Cloud Pubsub: Pub/Sub enables you to publish messages to a topic, and consume them in another place in the (Google) Cloud. You can see some different examples of publishers and consumers in the link. In your case you could for example write a Java application that writes random numbers to the Pub/Sub queue, where they will sit for 7 days to be consumed by another component (for example, Google Cloud Dataflow). To get started developing, you can find the SDKs here (there is a Java SDK).
Google Cloud Dataflow is managed service running Apache Beam pipelines to process your data at scale. You can learn about the different concepts here and get started designing your pipeline here. I suggest taking a look at some examples first though, which will make it more easy to grasp what is actually going on. Dataflow has a PubSub connector, so in your application you will be able to read from the topic you created before. In Dataflow you can for example multiply all your random numbers and write them to a certain sink (for example Google Cloud Storage, or even BigQuery or PubSub again).
Google Cloud Storage: is a cloud storage where you can put files, for example the output of your Dataflow pipeline. You will be able to manually download the files using the Cloud Console UI, or you can use one of the SDKs to download the output programmatically.
Hope this gives you an overview and some pointers to start. Whenever you are ready and have a more concrete use case in mind, you can start looking at some more components.