Amazon Transcribe/Lex with feedback during the call - amazon-web-services

I am a new in AWS services and we want to build a simple demo that detect a special word and: [1] trigger an action [2] responses (as speech during the call).
For example, if the user say: "Help" I want to reply "OK" and make an operation (AWS lambda).
We're using Twilio, and Twilio should streaming the audio.
As I understand I have two options, Android Lex and Transcribe, when Lex is for bots and transcribe just translate the speech and can't get involved in conversation.
So the questions are:
What Services should I use to trigger an action when the special word is recognize AND involved in the conversation?
Can I streaming the call directly to AWS service via Twilio?
Edit
To be more clear: The communication will be with two persons in real time, and I want to make interject during their call when someone say "Help" I want to add a bot voice to the conversation and say "OK", for example"
[Person 1]: Hi, how are you
[Person 2]: HELP ...
[BOT]: OK (like a third person in a conference call..).

I am not fully clear on the interaction taking place with the user, before they interject with help. Are they listening to a bot, media file, TTS, or communicating with another person in real time?
For realtime analysis, you would need to use Twilio Media Streams, which streams the voice conversation to a service that could then convert the speech to text in near real time, looking for keywords, and then programmatically perform some action based on those keywords.
An example of using Twilio Media streams with Lex:
Use Amazon Lex as a conversational interface with Twilio Media Streams

Related

Aws lex fulfillment with aws lambda

I have a problem to Play audio message from aws lex code hook .is there any option can return audio file instead of text response on content .guys any ideas please share me.
Amazon LEX does not talk. If you want speaking functionality, look at using Amazon Polly, which is a service that turns text into lifelike speech.
Amazon Lex uses Polly to deliver audio responses.
You'll find the output voice setting under the general settings tab of your Lex bot in the Amazon Lex Console.
Programatically you need to invoke the PostContent method instead of PostText. The PostContent method accepts an audio stream and in turn returns an audio stream.
This page from the Developer Guide describes the main points to consider when sending and receiving voice streams to and from the Lex runtime API.
Amazon Lex Developer Guide | PostContent

Get User Input From Lambda in AWS Connect

I was wondering if anybody has ever experimented with this issue I'm having and could give me any input on the subject.
As it stands right now I'm trying to see if there is a way to grab a users input through the AWS Connect. I understand that there is already a "Get User Input" block in the GUI that is available for me to use, unfortunately it does not offer the fine grain control I am looking for with requests and responses from Lex.
Right now I am able to Post Content to Lex and get responses just fine, as well as output speech using Amazon Polly via my Lambda. This works great for things that do not require a user to have to give feedback for a question.
For example if a client asks
"What time is my appointment?"
and we give back
"Your appointment is for X at X time, would you like an email with
this confirmation?"
I want to be able to capture what the user says back within that same lambda.
So the interaction would go like so:
User asks a question.
Lambda POST's it to Lex and gets a response
Amazon Polly says the response - i.e: 'Would you like an email to confirm?'
Lambda then picks up if the user says yes or no - POST's info to Lex
Gets response and outputs voice through Polly.
If anybody has any information on this please let me know, thank you!
Why do you make so much complications to implement IVR system using Amazon Connect. I have done the complete IVR automated system to one of my biggest US banking client. Use the below procedure to achieve what you desire.
Build a complete interactive lex bot(So that you can avoid amazon poly & using lex post content api). It is advised to build each bot has only one intent in it.
In connect using "Get User Input" node map the lex bot which you have created earlier with the question to be asked "What time is my appointment?". Once this question has been played the complete control goes to lex and then you fulfilled your intent from lex side, you can come back to connect as like that.
Refer AWS contact center for the clear idea.

AWS Lex storage of audio

I’ve created a Lex bot that is integrated with an Amazon Connect work flow. The bot is invoked when the user calls the phone number specified in the Connect instance, and the bot itself invokes a Lambda function for initialisation & validation and fulfilment. The bot asks several questions that require the caller to provide simple responses. It all works OK, so far so good. I would like to add a final question that asks the caller for their comments. This could be any spoken text, including non-English words. I would like to be able to capture this Comment slot value as an audio stream or file, perhaps for storage in S3, with the goal of emailing a call centre administrator and providing the audio file as an MP3 or WAV attachment. Is there any way of doing this in Lex?
I’ve seen mention of ‘User utterance storage’ here: https://aws.amazon.com/blogs/contact-center/amazon-connect-with-amazon-lex-press-or-say-input/, but there’s no such setting visible in my Lex console.
I’m aware that Connect can be configured to store a recording in S3, but I need to be able to access the recording for the current phone call from within the Lambda function in order to attach it to an email. Any advice on how to achieve this, or suggestions for a workaround, would be much appreciated.
Thanks
Amazon Connect call recording can only record conversations once an agent accepts the call. Currently Connect cannot record voice in the Contact Flows. So in regards to getting the raw audio from Connect, that is not possible.
However, it looks like you can get it from lex if you developed an external application (could be lambda) that gets utterances: https://docs.aws.amazon.com/lex/latest/dg/API_GetUtterancesView.html
I also do not see the option to enable or disable user utterance storage in Lex, but this makes me think that by default, all are recorded: https://docs.aws.amazon.com/lex/latest/dg/API_DeleteUtterances.html

how to speed up google cloud speech

I am using a microphone which records sound through a browser, converts it into a file and sends the file to a java server. Then, my java server sends the file to the cloud speech api and gives me the transcription. The problem is that the transcription is super long (around 3.7sec for 2sec of dialog).
So I would like to speed up the transcription. The first thing to do is to stream the data (if I start the transcription at the beginning of the record. The problem is that I don't really understand the api. For instance if I want to transcript my audio stream from the source (browser/microphone) I need to use some kind of JS api, but I can't find anything I can use in a browser (we can't use node like this can we?).
Else I need to stream my data from my js to my java (not sure how to do it without breaking the data...) and then push it through streamingRecognizeFile from there : https://github.com/GoogleCloudPlatform/java-docs-samples/blob/master/speech/cloud-client/src/main/java/com/example/speech/Recognize.java
But it takes a file as the input, so how am I supposed to use it? I cannot really tell the system I finished or not the record... How will it understand it is the end of the transcription?
I would like to create something in my web browser just like the google demo there :
https://cloud.google.com/speech/
I think there is some fundamental stuff I do not understand about the way to use the streaming api. If someone can explain a bit how I should process about this, it would be owesome.
Thank you.
Google "Speech-to-Text typically processes audio faster than real-time, processing 30 seconds of audio in 15 seconds on average" [1]. You can use Google APIs Explorer to test exactly how long your each request would take [2].
To speed up the transcribing you may try to add recognition metadata to your request [3]. You can provide phrase hints if you are aware of the context of the speech [4]. Or use enhanced models to use special set of machine learning models [5]. All these suggestions would improve the accuracy and might have effects on transcribing speed.
When using the streaming recognition, in config you can set singleUtterance option to True. This will detect if user pause speaking and cease the recognition. If not streaming request will continue until to the content limit, which is 1 minute of audio length for streaming request [6].

Is there a way to find out what Amazon Lex hears?

I have been doing a bit of experimentation with Amazon Lex but I can't get voice to work in the console at all.
I'm using the Flower bot demo with the associated Python Lambda function connected and working with text on Chrome browser running on a Mac (10.13.1).
I am able to log any text entered into the test bot on the console from the Lambda function along with the rest of the event.
By going to the monitoring tab of the bot in the console I can see utterances from previous days (seems to be a one day delay on utterances appearing wether missed or detected, no idea why…).
I made a bunch of attempts to use voice yesterday that appear in the utterance table as a single blank entry with a count of 13 now that it is the next day. I'm not sure if this means that audio isn't getting to Lex or if Lex can't understand me.
I'm a native English speaker with a generic American accent (very few people can identify where I'm from more specifically than the U.S.) and Siri has no trouble understanding me.
My suspicion is that something is either blocking or garbling the audio before it gets to Lex but I don't know how to find what Lex is hearing to check that.
Are there troubleshooting tools I haven't found yet? Is there a way to get a live feed of what is being fed to a bot under test? (All I see for the test bot is the inspect response section, nothing for inspecting the request.)
Regarding the one day delay in appearance of utterances, according to AWS documentation:
Utterance statistics are generated once a day, generally in the
evening. You can see the utterance that was not recognized, how many
times it was heard, and the last date and time that the utterance was
heard. It can take up to 24 hours for missed utterances to appear in
the console.
In addition to #sid8491's answer, you can get the message that Lex parsed from your speech in the response it returns. This is in the field data.inputTranscript when using the Node SDK.
CoffeeScript example:
AWS = require 'aws-sdk'
lexruntime = new AWS.LexRuntime
accessKeyId: awsLexAccessKey
secretAccessKey: awsLexSecretAccessKey
region: awsLexRegion
endpoint: "https://runtime.lex.us-east-1.amazonaws.com"
params =
botAlias: awsLexAlias
botName: awsLexBot
contentType: 'audio/x-l16; sample-rate=16000; channels=1'
inputStream: speechData
accept: 'audio/mpeg'
lexruntime.postContent params, (err, data) ->
if err?
log.error err
else
log.debug "Lex heard: #{data.inputTranscript}"
Go to Monitoring tab of your Bot in Amazon Lex console, click "Utterances", there you can find a list of "Missed" and "Detected" utterance. From the missed utterances table, you can add them to any intent.