Facebook messenger bot - speech to text - facebook-graph-api

Hi :) I built a Facebook messenger bot with webhook to my server in order to handle messages.
recently I added a cool feature - voice recognition.
but, I receive the Facebook audio attachment as url to mp4 file, none of the familiar Speech 2 text api's (Google speech, Watson etc..) support mp4 so I must convert it to FLAC and afterwards I can get the transcript.
it cost me about 6-8 seconds to 5 sec audio..
There is any Speech 2 text api which support mp4? or alternatively any way to get FLAC audio from Facebook?
Thanks!

Amazon Lex supports up to 15 seconds of speech input to convert to a text. https://aws.amazon.com/lex/faqs/

Related

Google Speech to text available offline?

I would like to leverage Google's Speech to text service for a desktop app, but I would like it to be offline. Is this possible?
They have on-prem solutions but can it be offline so no data is sent?
https://cloud.google.com/speech-to-text#all-features
Google's Speech to Text API only works through the cloud, it is not possible to work offline, this is because Speech API and Text to Speech API make request using REST or RPC calls.
The Speech-to-Text On-Prem allows you to deploy the Speech to Text API through a container or any GKE cluster, but that doesn't mean you can do it in your local desktop.
Google Speech-to-text API is available offline only for English in several devices. If You want this API to work for you offline for other languages, you should install that Specific language on you device - otherwise it won't work.
Basically Google Speech Recognition requires internet access to make REST and RPC calls. If you have a working with internet access, it will work on every Language you want. But in offline mode it only works on device-specific language, most probably English.

Google cloud speech to text C++

I want to use google speech to text api in my c++ Qt/mingw application but I had a problems building c++ library for it. So I implement simple REST client for speech to text api it recognize small audio files. And now I want to use audio stream and there is no rest api for it. So I have a question, how can I implement client for streaming speech recognition without using all google libraries with dependensies. I have not find any examples for grpc api or even documentation about how does streeming speech recogniton work with grpc.

How to link/sync complete Contact list with Google speech API (REST)

I am currently working on speech to text conversion using Google speech REST API. The program is working and is giving me the text of the speech given. My use case is to convert the person's name (spoken) into text. For e.g. "Rohan Chawhan".
What have I observed:
Now when I compared the results of Google Assistant (on Phone (Android/iOS)) and the Google speech REST API (on Linux PC) here is what I found:
- When the Phone & Gmail Contacts are NOT synced.
Both, Google Assistant and the Speech API shows me the same text which is incorrect ("Rohan Chauhan"). It is probably because "Rohan Chauhan" is more common than "Rohan Chawhan", in India.
- When the Phone or Gmail Contacts are synced.
Google Assistant detects the name correctly "Rohan Chawhan" if it is present in the contact. Since in the Speech REST API shows me the same error text as above "Rohan Chauhan"
What am I looking for:
Is there A way I can sync/upload/link a contact list/database/table of names for Google Speech API?
Yes, you can use Phrase hints
see google documentation
https://cloud.google.com/speech-to-text/docs/basics#phrase-hints

Test Google Speech API with audio file

I want to see if Google Speech API will be accurate enough for my purposes. I have an audio file I want to test it with, but the demo on the main page only lets you record from a microphone. Is there a way to test Google's speech processing with an audio file without having to learn the API first?
No, you will have to use the API if you wish to upload a file.
The steps are described here on how to make the API request and it is fairly straightforward. The same page also details how to set up your account, enable billing and getting the access token for the request.

Bypassing speech recognition in Amazon AVS

As I understand AVS you send an audio clip to the API which is parsed for speech recognition and then interprets that text and gives you some result based on what you asked.
What I want to do is make kind of a cli version of Alexa where you type in what you would normally say out loud to Amazon echo.
So what I'm wondering if there is some way to bypass the speech recognition step using some amazon api so I can just send the text.
I thought about implementing the ai myself but it would be nice to use all the available skills for Alexa.
No chance.
For your own skills you can do that by calling them directly. Finally it's a simple HTTPS call with a JSON Payload. But it's not possible for other skills except the owner publish is as HTTP Endpoint.
But you have to handle also the user sessions etc.
For a "CLI - Echo" have a look at the different Bot Frameworks. Most of the Companies with an Alexa App have also a documented REST Backend which you can use directly. See Twitter, Facebook etc.