Recursive Alexa Response with DynamoDB - amazon-web-services

so I am basically trying to tell an interactive story with Alexa. However, I am not sure how to edit an intent response while being said by Alexa. This should happen in a way, that keeps it updating while Alexa tells the story.
In my scenario Alexa is only one daemon fetching strings from DynamoDB and while new story lines are being generated she is supposed to read and then tell them as soon as they're processed. It seems that Alexa needs completed Strings as a return value for its response.
Here an example:
User: Alexa, tell me a story.
Alexa: (checks DynamoDB table for a new sentence, if found) says sentence
Other Device: updates story
Alexa: (checks DynamoDB table for a new sentence, if found) says sentence
...
This would keep on going until the other Device puts an end-signifier to the DynamoDB table making Alexa respond a final This is how the story ends output.
Has anyone experience with such a model or an idea of how to solve it? If possible, I do not want a user to interact more then once for a story.
I am thinking of a solution where I would 'fake' user-intents by simply producing JSON Strings and pushing them through the speechlet requesting the new story-sentences hidden from the user... Anyhow, I am not sure wether this is even possible not to think of the messy solution this would be.. :D
Thanks in regard! :)

The Alexa skills programming model was definitely not designed for this. As you can tell, there is no way to know, from a skill, when a text to speech utterance has finished, in order to be able to determine when to send the next.
Alexa also puts a restriction on how long a skill may take to respond, which I believe is somewhere in the 5-8 seconds range. So that is also a complication.
The only way I can think of accomplishing what you want to accomplish would be to use the GameEngine interface and use input handlers to call back into your skill after you send each TTS. The only catch is that you have to time the responses and hope no extra delays happen.
Today, the GameEngine interface requires you to declare support for Echo Buttons but that is just for metadata purposes. You can certainly use the GameEngine based skills without buttons.
Have a look at: the GameEngine interface docs and the time out recognizer to handle the time out from an input handler.
When you start your story, you’ll start an input handler. Send your TTS and sent a timeout in the input handler of however long you expect the TTS to take Alexa to say. Your skill will receive an event when the timeout expires and you can continue with the story.
You’ll probably want to experiment with setting the playBehavior to ENQUEUE.

Related

Google Action showing Invocation Error instead of triggering Fallback intent

The scenario:
I have a google action that is used to deliver voice surveys. It is controlled by dialogflow ES and has two main intents. A welcome intent and a fallback intent. The welcome intent is used to detect the name of the survey that user would like to open and this is stored in a parameter called "surveyname". "Surveyname" is then passed to our webhook where the survey is opened, the user is welcomed, and initial question is asked. All other subsequent interactions are picked up by the fallback intent, which calls our webhook which controls the flow of the survey and provides the google action with the subsequent questions. The subsequent interactions could include any phrase as I could have a survey asking any question on any topic.
The problem:
Up until extremely recently, my google action worked perfectly fine. But I have encountered a problem where the Google Assistant App will on occasion forcefully leave the action and exit the conversation. For example a user may input "yoga", and Google Assistant will leave the conversation and do a google search for yoga. When I test this phrase in the "Test" page of the action console I can see no request or response body, only "Invocation Error". Along with the message "You cannot use standard Google Assistant features in the Simulator. If you want to try them, use Google Assistant on your phone or other compatible devices." When I test in the "Try it now" box in Dialogflow ES itself, I can see the correct fallback intent, webhook request, and response. But I cannot see the phrase that was said in the Google Assistant app in the "History" tab of Dialogflow ES; it looks like it never made it that far. This suggests to me the problem is with either Google Assistant or the action itself, rather than Dialogflow.
Invocation Error
The current (less than ideal) work around:
I understand that fallback intents have a lower priority that regular intents. I believe there is an internal tussle going on between the fallback intent and Google's implicit invocation. My current temporary solution is to create a new intent called ActiveSurvey, and with this custom intent hope to capture some of the input phrases that are being missed by the fallback intent. This appears to work somewhat, but I can't ever hope to capture all input this way, as the user could quite literally say absolutely anything. Considering it used to work, in my mind this shouldn't be necessary.
The questions:
Why has this happened now?
Is there some setting I'm missing that has caused this to happen?
Or is the design of the action incorrect?
Any help much appreciated thank you.
Starting in October 2020, and expanding further in January 2021, Google began implementing a feature called no-match yielding, although this was not listed in the documentation until February 2021.
Under no-match yielding, Google will close the Action and handle it itself if both of these conditions are true:
You are handling it through a Fallback Intent in Dialogflow or through a No Match Intent in Action Builder and
The phrase is one that the Assistant can handle itself
The workaround for this under Dialogflow is to have an Intent with a single phrase with a parameter that matches the #sys.any System Entity type and to use this Intent (and the parameter) for your processing instead of a Fallback Intent.
Fallback Intents should only be used in cases where the user input cannot be routinely handled (ie - you want to say that you didn't understand, or it is an error).

Follow up questions in AWS Lex

I'm trying to create a chatbot using Amazon Lex to display results from a database. The designed conversational flow is to show 10 results at first, and then provide an option for the user to "See more results?", which would be a Yes/No question. This would give an additional 10 results from the database.
I have searched through the documentation and forums on the internet, to understand a way to add this follow-up Yes/No question, and have been unsuccessful.
I'm relatively new to LEX and am unable to model this conversational flow.
Can someone explain this/direct me to the right documentation?
Any help/links are highly appreciated.
You can create your own Yes/No custom slot type in the Lex Console.
I've built one as an example here:
I named the slot type affirmation
Then I restricted a list of synonyms to equate to either Yes or No values.
This allows a user to respond naturally in many different ways and the bot will respond appropriately. All you have to do is build your Lambda handling of any slot that uses this slot type to look for either "Yes" or "No".
You can easily monitor this slot too to log any input that was not on your synonym list in order to expand your list and improve your bot's recognition of affirmations and negations.
I even built a parser for this slot in Lambda to be able to recognize emoji's (thumbs up/down, smiley face, sad face, etc.) correctly as positive or negative answers to these type of questions in my bot.
It might be surprising that Lex doesn't have this built-in like Alexa, but it's not hard to build and you can customize it easily which you cannot do with built-in slot types.
Anyway, after making this SlotType, you can create multiple slots that use it in one intent.
Lets say you create a slot called 'moreResults' and another called 'resultsFeedback'. Both would be set to use this 'affirmation' slotType to detect Yes/No responses.
Then when you ElicitSlot either of these slots in the convo, you can form the question specifically for each slot. And you can check whether the slot is filled with values 'Yes' or 'No' in your Lambda on the next response.

Does Alexa uses machine learning to learn new utterances to trigger intents of your skill?

I have an important question, at the moment i am writing my last essay before starting with my bachelor thesis. It is about voice apps, which includes the alexa skills for sure.
But i need some informations about the word tolerance of the utterances. And I was not able to find some information on the internet yet. Does Alexa only recognize the utterances typed in by the developer or does Alexa uses machine learning like Google Assistant to learn about new utterances ? It is really important for my essay. So I would be very happy if you can help me with this question.
Thank you!
Alexa also recognize slightly different sentences than what you defined as utterances. But if your intent is matched also depends on how many intents you have and how similar they are.
So what happens on Amazon side is behind the scenes and I don't think they use machine learning to get your utterances to intent connection right. Because you would need to train the algorithm somehow what is right and what was a wrong connection from phrase to intent.
In their documentation they suggest to use as many utterances as possible:
It is better to provide too many samples than to provide too few
https://developer.amazon.com/de/docs/custom-skills/best-practices-for-sample-utterances-and-custom-slot-type-values.html
It will be too difficult to develop an alexa app if you need to configure all the possible variations of intent. Alexa learns from the phrases that you provide for an intent and uses machine learning to not just recognize the intents that you have configured but also the subtle variations too.
You can easily verify this by setting up a basic alexa app and testing it on online simulator.
Based on what I saw using the echo device to test the skill and not only the online simulator (they are way too different, so be sure to test the skill with the real device because the behaviour is completely different between simulator and echo) I think that yes, Alexa use ML to understand what you have say to "enforce" the understanding into something that you have put into the slot.
This is a strange behaviour, because yes, you can say something different to fill the slot but there is no guarantee that Alexa will understand correctly what you have say and will trigger the correct slot.
You can try this behaviour simply putting some random or non-real word into the slots. If you say to Alexa something similar to that word, even if it doesn't exists, you will get a match, but if you say something that is completely different, there is no guarantee that the intent will be triggered.
(eg. if you put in the slot the word "blues", even if you say "blue" Alexa try to enforce her understanding into "blues". Or even better, try putting a completely random string like "asdajhfjkak" and say to Alexa something that is similar to that and you will get a match)

Turn-based Alexa Skill

I'm trying to dive into building an Alexa skill and I have a specific game in mind that involves the Alexa and the user taking turns counting up. Alexa would begin saying one and the user would then say two. Alexa would in turn make sure the inputted number is correct before outputting the next number. I'm having a hard time understanding where to start. From what I read, it seems like each user input links to an intent. Is that the only way going about this? Sorry if this question isn't very clear due to my lack of understanding.
For this example, you can map an intent invocation to AMAZON.NUMBER. That way no matter what number
a user says will invoke the same intent.
Additionally you can keep track of the user's session using the sessionAttributes and conditionally handle the same intent that way.

Can Amazon Alexa Skills Kit (ASK) detect where it was interrupted (if it was)?

I want to write an Alexa skill that would read a list of items out to me and let me interrupt when I wanted and have the backend know where I was in the list that was interrupted.
For example:
Me: Find me a news story about pigs.
Alexa: I found 4 news stories about pigs. The first is titled 'James the pig goes to Mexico', the second is titled 'Pig Escapes Local Farm' [I interrupt]
Me: Tell me about that.
Alexa: The article is by James Watson, is dated today, and reads, "Johnny the Potbelly Pig found a hole in the fence and..."
I can't find anything to indicate that my code can know where an interruption occurs. Am I missing it?
I believe you are correct: the ASK does not provide any way to know when you were interrupted, however, this is all happening in real-time so you could figure it out by observing the amount of time that passes between doing the first ASK 'tell' (ie. where you call context.success( response )), and when you receive the "Tell me that" intent.
Note that the time it takes to read in US-en could be different then for US-gb, so you'll have to do separate calibrations. Also, you might have to add some pauses into your speech text to improve accuracy since there will of course be some variability in the results due to processing times.
If you are using a service like AWS Lambda or Google App Engine that add extra latency when there are no warm instances available, then you will probably need to take that into account.