I am trying to do domain adaptation with text data to improve the speech to text results of Google Cloud Speech-to-Text.
I have already done this with the Azure and AWS Speech to text systems. There you just throw a huge text corpus with domain specific language at the system and you usually get better results after that.
For the Google speech to text system I have not found anything like that. What I found is this tutorial: https://cloud.google.com/speech-to-text/docs/speech-adaptation
This sadly only allows very specific adaptations (manually adding words that should be recognized better).
I have tried doing a keyword extraction on my text corpus and putting the extracted words in the speech_contexts[{"phrases": []}] parameter but this didn't change my results.
Is there any way to train the Google speech to text service (language model) with a large text corpus for domain adaptation?
Related
I need to process a large quantity of multipage pdfs (around 23,000 documents and an average of 30 pages) into text. Since the documents are typewritten and scanned I want to use OCR recognition to avoid characters recognition mistakes. The problem is the estimated running time on R (using the Tesseract package) is crazy. Is there an online service provider that can be used for this task?
N.B. I had a look both at Amazon Web Service and Google Cloud, but is extremely difficult for me to understand how to use them, especially how to automate the whole process
I have a text in japanese that I'm turning into an mp3 with the Google Cloud Text to Speech functionality.
I also want to have word timestamps for the mp3 that gets returned by Google.
Google Speech to Text offers this functionality but when I submit the files I get from TTS to STT, the result is not always good.
What is the best way to also get word timestamps for the TTS mp3?
Google Cloud Speech-to-Text it's a ML based service, so it's expected that the results are not always as "good" as you may expect them, it has it's limitations.
What I could suggest is to take a look at their relevant documentation about this topic like the best practices, the guide and the basics page that talk about it. Additionally, you could take a look at the issues within their issue tracker platform, like for example this issue for additional information on it and even if you find a reproducible issue within the service you can publish it there, so their team can be aware of it.
I'm trying to create a small app that will allow me to translate audio to text via the Google Speech to Text services. I'd like to bypass the need for heavy processing and leverage as many cloud tools as possible to have audio streamed to the text to speech service. I've been able to get the streaming process to work, however, I have to relay the data to my server first and this creates an expense I'd like to cut out. There are a few questions that would help solve my problem in a cost effective way!
Can I created a signed URL for a Google Text To Speech streaming session?
Can I leverage the cloud and Cloud Functions to trigger processing by the text to speech service and then retrieve real time updates?
Can I get a signed URL that links to a copy of the audio streamed to the Google text to speech service?
I have an audio file and I have an exact transcript of that audio file. I would like to be able to get the timestamps of each word in that specific transcript.
I don't want timestamps for for the non-accurate recognized speech. I can already do that, and it is useful, but it's not quite good enough due to the mistakes in the speech recognition.
Does anyone know if this is possible with Google speech recognition?
It is not possible with Google speech recognition. You have to use other services. Even open source tools exist.
Can we upload a training data (in .txt) using Python code in Dialogflow or Google cloud platform using Detect intent and Agent API ? If so, please share your insights.
You can look at using a PUT request to add additional training data to your intents. However, there is not a direct option to upload a text file. Generally Dialogflow does a really good job of interpreting the user's intent with just a handful of training samples, making it feasible to type each in manually or copy & paste. As it uses machine learning to match similar phrases, it shouldn't be necessary to upload a large text file.
Yes, you for training phrases you can upload one .txt file(one line per phrase) or multiple .txt zipped archive files(there's a limit of 10).
There's more on this here in the docs.