Google Speech Transcription - google-cloud-platform

Google Speech Transcription - google-cloud-platform

If the Google Speech API transcribes audio at near real time, and my latency to the actual server is only 50ms, why do I receive my final streaming transcription result after 1.6s? Shouldn’t I receive it in only a couple hundred milliseconds?

You can use API Keys for authentication. Bear in mind with using the API Keys you lose the ability to track from whom the requests are being made and it might be easier for others to discover the key.
Also, you might want to check this other question: how to speed up google cloud speech

Related

Creating serverless audio processing with Google Cloud Storage and Google Speech to Text

I'm trying to create a small app that will allow me to translate audio to text via the Google Speech to Text services. I'd like to bypass the need for heavy processing and leverage as many cloud tools as possible to have audio streamed to the text to speech service. I've been able to get the streaming process to work, however, I have to relay the data to my server first and this creates an expense I'd like to cut out. There are a few questions that would help solve my problem in a cost effective way!
Can I created a signed URL for a Google Text To Speech streaming session?
Can I leverage the cloud and Cloud Functions to trigger processing by the text to speech service and then retrieve real time updates?
Can I get a signed URL that links to a copy of the audio streamed to the Google text to speech service?

Google Speech streaming recognition slow response time

What is the fastest expected response time of the Google Speech API with streaming audio data? I am sending an audio stream to the API and am receiving the interim results with a 2000ms delay, of which I was hoping I could drop to below 1000ms. I have tested different sampling rates and different voice models.

I'm afraid that response time can't be measured or guaranteed because of the nature of the service. We don't know what is done under the hood, in fact there is no SLA for response time even though there is SLA for availability.
Something that can help you is working on building a good request:
Reducing 100-miliseconds frame size, for example, could ensure a good tradeoff between latency and efficiency.
Following Best Practices will help you to make a clean request so that the latency can be reduced.
You may want to check following links on specific uses cases to know how they addressed latency issues:
Realtime audio streaming to Google Speech engine
How to speed up google cloud speech
25s Latency in Google Speech to Text

If you really care about response time you'd better use Kaldi-based service on your own infrastructure. Something like https://github.com/alumae/kaldi-gstreamer-server together with https://github.com/Kaljurand/dictate.js

Google Cloud Speech itself works pretty fast, you can check how quick your microphone gets transcribed https://cloud.google.com/speech-to-text/.
You may probably experience buffering issue on your side, the tool you are using may buffer data before sending(buffer flush) to underlying device(stream).
You can find out how to decrease output buffer of that tool to lower values e.g. 2Kb, so data will reach Node app and Google service faster. Google recommends to send data that equals to 100ms buffer size.

How to disable sentence-level auto correction in Google Cloud Speech-to-Text API

I am working on a speech recognition task, which involves the detection of children's speaking capability, improvement over time...
I'd like to use the Google Cloud Speech to Text API for the ASR part of the detection. Then I would use the transcripts of different measurements to estimate the advancement.
But! The sentence level autocorrect of Google Speech API consistently rewrites the previous limb of the spoken sentence...
Is there a way to disable the autocorrect of this ASR?
I can't bypass this problem with the "speechContext", "single_utterance" or "maxAlternatives" options.
"single_utterance" may work with words, but it corrects the misspells..
Any advice in this field?

If you use streaming instead of batch recognize, you should receive an answer as soon as that part of the audio is transcribed, it does not wait for the rest of the sentence. You should then just store the first answer provided by the stream, not the further corrections.
This means that you don't have to wait until isFinal=True.
For a quick and dirty example of what I mean, go tho the speech API page, and run the streaming test with the developer tools open. There you'll see the streaming data received as the words are being spoken:

Google Tag Manager clickstream to Amazon

So the questions has more to do with what services should i be using to have the efficient performance.
Context and goal:
So what i trying to do exactly is use tag manager custom HTML so after each Universal Analytics tag (event or pageview) send to my own EC2 server a HTTP request with a similar payload to what is send to Google Analytics.
What i think, planned and researched so far:
At this moment i have two big options,
Use Kinesis AWS which seems like a great idea but the problem is that it only drops the information in one redshift table and i would like to have at least 4 o 5 so i can differentiate pageviews from events etc ... My solution to this would be to divide from the server side each request to a separated stream.
The other option is to use Spark + Kafka. (Here is a detail explanation)
I know at some point this means im making a parallel Google Analytics with everything that implies. I still need to decide what information (im refering to which parameters as for example the source and medium) i should send, how to format it correctly, and how to process it correctly.
Questions and debate points:
Which options is more efficient and easiest to set up?
Send this information directly from the server of the page/app or send it from the user side making it do requests as i explained before.
Does anyone did something like this in the past? Any personal recommendations?

You'd definitely benefit from Google Analytics custom task feature instead of custom HTML. More on this from Simo Ahava. Also, Google Big Query is quite a popular destination for streaming hit data since it allows many 'on the fly computations such as sessionalization and there are many ready-to-use cases for BQ.

Getting data from local running java app to google cloud app and back

I wanted to dive into the world of distributed systems, cloud computing, IoT, etc., and I gotta be honest, I imagined everything being a little more intuitive than it finally turned out.
I had a tiny testing architecture in mind, that I'd like to set up with Google Clouds and their services, but I am kinda stuck since I can't get my head around some concepts.
What I basically wanted to do (as a first step) is writing a simple java application that would run locally on my computer. This application should just generate random numbers and send those numbers somehow to the google cloud. On the cloud I wanted to define another java application that would manipulate those random numbers in some kind of way (it doesn't matter actually). Afterwards, the output should somehow get back to me of course. And actually, at the moment, I don't even care about how exactly. It could be somehow back to my local app (with some kind of listener, would that be possible?). But it could also simply store the results somewhere on the google cloud? Or maybe upload them to my google drive?
I guess you already noticed that - at some points - I don't even know what i want exactly, since I'm not sure of what is possible, and what not.
Could you provide me some help to get this set up?
The most important questions for me right now are:
Do I need to use a pubsub system, where my generated numbers are sent
to, and which then forwards this to the cloud app, that transforms my
data?
How do I get my data from the local app to the cloud services?
Would my data transforming app run on Google Dataflow?
Above I wrote "as a first step"... because later I would also like to send config files (for example in json format, or xml) to the cloud, and the
cloud application should transform those config files... if I get the
first scenario running the I guess this woul also be no problem
right?
Those are just a few of the questions that are on my mind currently. The most important ones I guess.
It would be a big help. Sorry, if the questions are not very precise, but I really need some kind of pointing into the right direction.
Thank you in advance!

I think it would be good to read up on some of the technologies you mention here:
Google Cloud Pubsub: Pub/Sub enables you to publish messages to a topic, and consume them in another place in the (Google) Cloud. You can see some different examples of publishers and consumers in the link. In your case you could for example write a Java application that writes random numbers to the Pub/Sub queue, where they will sit for 7 days to be consumed by another component (for example, Google Cloud Dataflow). To get started developing, you can find the SDKs here (there is a Java SDK).
Google Cloud Dataflow is managed service running Apache Beam pipelines to process your data at scale. You can learn about the different concepts here and get started designing your pipeline here. I suggest taking a look at some examples first though, which will make it more easy to grasp what is actually going on. Dataflow has a PubSub connector, so in your application you will be able to read from the topic you created before. In Dataflow you can for example multiply all your random numbers and write them to a certain sink (for example Google Cloud Storage, or even BigQuery or PubSub again).
Google Cloud Storage: is a cloud storage where you can put files, for example the output of your Dataflow pipeline. You will be able to manually download the files using the Cloud Console UI, or you can use one of the SDKs to download the output programmatically.
Hope this gives you an overview and some pointers to start. Whenever you are ready and have a more concrete use case in mind, you can start looking at some more components.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js