Google Speech streaming recognition slow response time - google-cloud-platform

What is the fastest expected response time of the Google Speech API with streaming audio data? I am sending an audio stream to the API and am receiving the interim results with a 2000ms delay, of which I was hoping I could drop to below 1000ms. I have tested different sampling rates and different voice models.

I'm afraid that response time can't be measured or guaranteed because of the nature of the service. We don't know what is done under the hood, in fact there is no SLA for response time even though there is SLA for availability.
Something that can help you is working on building a good request:
Reducing 100-miliseconds frame size, for example, could ensure a good tradeoff between latency and efficiency.
Following Best Practices will help you to make a clean request so that the latency can be reduced.
You may want to check following links on specific uses cases to know how they addressed latency issues:
Realtime audio streaming to Google Speech engine
How to speed up google cloud speech
25s Latency in Google Speech to Text

If you really care about response time you'd better use Kaldi-based service on your own infrastructure. Something like https://github.com/alumae/kaldi-gstreamer-server together with https://github.com/Kaljurand/dictate.js

Google Cloud Speech itself works pretty fast, you can check how quick your microphone gets transcribed https://cloud.google.com/speech-to-text/.
You may probably experience buffering issue on your side, the tool you are using may buffer data before sending(buffer flush) to underlying device(stream).
You can find out how to decrease output buffer of that tool to lower values e.g. 2Kb, so data will reach Node app and Google service faster. Google recommends to send data that equals to 100ms buffer size.

Related

Google Speech Transcription

If the Google Speech API transcribes audio at near real time, and my latency to the actual server is only 50ms, why do I receive my final streaming transcription result after 1.6s? Shouldn’t I receive it in only a couple hundred milliseconds?
You can use API Keys for authentication. Bear in mind with using the API Keys you lose the ability to track from whom the requests are being made and it might be easier for others to discover the key.
Also, you might want to check this other question: how to speed up google cloud speech

Efficient Google PubSub Publishing

The docs for PubSub state that the max payload after decoding is 10MB. My question is whether or not it is advantageous to compress the payload at the publisher before publishing to increase data throughput?
This especially can be helpful if the payload has a high compression ratio like a json formatted payload.
If you are looking for efficiency on PubSub I would first concentrate on using the best API, and that's the gRPC one. If are using the client libraries then the chance is high that it's using gRPC anyway. Why gRPC?
gRPC is binary and your payload doesn't need to go through hoops to be enoded
REST needs to base64 the payload, making it bigger and has an extra encoding step
Second I would try to batch the message if possible, making the number of calls lower, eliminating some latency.
And last I would look at compression, but that means you need to specifically de-compress it at the subscriber. This means your application code will get more complex. If all your workloads are on the Google Cloud Platform I wouldn't bother with compression. If your workload is outside of GCP you might consider it, but testing would make sense.
An alternative for compression and if your schema is stable, is looking at using ProtoBuf.
To conclude, I would:
Make sure your using gRPC
Batch where possible
Only compress when needed and after benchmarking (implies extra logic in your application)

How to disable sentence-level auto correction in Google Cloud Speech-to-Text API

I am working on a speech recognition task, which involves the detection of children's speaking capability, improvement over time...
I'd like to use the Google Cloud Speech to Text API for the ASR part of the detection. Then I would use the transcripts of different measurements to estimate the advancement.
But! The sentence level autocorrect of Google Speech API consistently rewrites the previous limb of the spoken sentence...
Is there a way to disable the autocorrect of this ASR?
I can't bypass this problem with the "speechContext", "single_utterance" or "maxAlternatives" options.
"single_utterance" may work with words, but it corrects the misspells..
Any advice in this field?
If you use streaming instead of batch recognize, you should receive an answer as soon as that part of the audio is transcribed, it does not wait for the rest of the sentence. You should then just store the first answer provided by the stream, not the further corrections.
This means that you don't have to wait until isFinal=True.
For a quick and dirty example of what I mean, go tho the speech API page, and run the streaming test with the developer tools open. There you'll see the streaming data received as the words are being spoken:

High latency issue of online prediction

I've deployed a linear model for classification on Google Machine Learning Engine and want to predict new data using online prediction.
When I called the APIs using Google API client library, it took around 0.5s to get the response for a request with only one instance. I expected the latency should be less than 10 microseconds (because the model is quite simple) and 0.5s was way too long. I also tried to make predictions for the new data offline using the predict_proba method. It took 8.2s to score more than 100,000 instances, which is much faster than using Google ML engine. Is there a way I can reduce the latency of online prediction? The model and server which sent the request are hosted in the same region.
I want to make predictions in real-time (the response is returned immediately after the APIs gets the request). Is Google ML Engine suitable for this purpose?
Some more info would be helpful:
Can you measure the network latency from the machine you are accessing the service to gcp? Latency will be lowest if you are calling from a Compute Engine instance in the same region that you deployed the model to.
Can you post your calling code?
Is this the latency to the first request or to every request?
To answer your final question, yes, cloud ml engine is designed to support a high queries per second.

mechanical turk architecture for streaming an endless lists of tasks

How should we architect a solution that uses Amazon Mechanical Turk API to process a stream of tasks instead of a single batch of bulk tasks?
Here's more info:
Our app receives a stream of about 1,000 photos and videos per day. Each picture or video contains 6-8 numbers (it's the serial number of an electronic device) that need to be transcribed, along with a "certainty level" for the transcription (e.g. "Certain", "Uncertain", "Can't Read"). The transcription will take under 10 seconds per image and under 20 seconds per video and will require minimal skill or training.
Our app will get uploads of these images continuously throughout the day and we want to turn them into numbers within a few minutes. The ideal solution would be for us to upload new tasks every minute (under 20 per minute during peak periods) and download results every minute too.
Two questions:
To ensure a good balance of fast turnaround time, accuracy, and cost effectiveness, should we submit one task at a time, or is it best to batch tasks? If so, what variables should we consider when setting a batch size?
Are there libraries or hosted services that wrap the MTurk API to more easily handle use-cases like ours where HIT generation is streaming and ongoing rather than one-time?
Apologies for the newbie questions, we're new to Mechanical Turk.
Streaming tasks one at a time to Turk
You can stream tasks individually through mechanical turk's api by using the CreateHIT operation. Every time you receive an image in your app, you can call the CreateHIT operation to immediately send the task to Turk.
You can also setup notifications through the api, so you can be alerted as soon as a task is completed. Turk Notification API Docs
Batching vs Streaming
As for batching vs streaming, you're better off streaming to achieve a good balance of turnaround time and cost. Batching won't drive down costs too much and improving accuracy is largely dependent on vetting, reviewing, and tracking worker performance either manually or implementing automated processes.
Libraries and Services
Most libraries offer all of the operations available in the api, so you can just google or search Github for a library in your programming language. (We use the Ruby library rturk)
A good list of companies that offer hosted solutions can be found under the Metaplatforms section of a answer on Quora to the question: What are some crowdsourcing services similar to Amazon Mechanical Turk? (Disclaimer: my company, Houdini is one of the solutions listed there.)