Echo Spot sometimes takes minutes to start playing a video - amazon-web-services

I'm currently developing a custom skill for the echo spot. I'm using AWS Lamda functions in .net core, using the Alexa.NET SDK. One of the intents lets Alexa play a video, which are hosted on a S3 bucket, but sometimes (randomly - once after opening the skill, once after the 4th or 5th video), Alexa immediately understands the command, but takes ages to play the video. According to the cloudwatch logs, the command is parsed and the lambda function executed within a couple of milliseconds, but the video starts playing very delayed (up to two minutes).
REPORT RequestId: xyz Duration: 366.44 ms Billed Duration: 400 ms Memory Size: 576 MB Max Memory Used: 79 MB
The videos being returned by the lambda function are rather short (5-15 seconds) if that could affect the issue. The wifi itself is stable with more than 30mbit available, alexa is not too far away from the wifi router.
We've tried different video encodings (MP4, H264, ...), different audio codecs, samplerates and framerates - the issue remains. Any clues what could cause this issue? We've read the recommendations for videos and applied all the recommended settings to the video.
Can i somehow access the device's logs to see if there's another issue with the video?

Turns out, videos are being streamed when combined with a plain text output speech. If your output speech is empty, the echo spot will download the whole video and start playing once the video is completely loaded. Hence, i recommend adding a speech reply to all of your videos to ensure a smooth loading of the video.

Related

Record real time Audio from browser and stream to Amazon S3 for storage

I want to record audio from my browser and live stream it for storage in Amazon S3. I cannot wait till the recording is finished as client can close the browser, so I would like to store what has been spoken (or nearest 5-10 second).
The issue is multipart upload does not support less than 5Mib chunks, and audio files will be for most part less than 5Mib.
Ideally I would like to send the chunks in 5 seconds, so what has been said in last 5 seconds to be uploaded.
Can it be support by S3? or should I use any other AWS service to first hold the recording parts - heard about kinesis stream but not sure if it can serve the purpose.

Amazon Transcribe Medical - Sample Rate Not Supported

I have a handfull of files with a sample rate of 8000hz and 11025hz.
Amazon's own documentation indicates that the valid MediaSampleRateHertz is between 8000 - 48000 (inclusive, judging by examples).
However running a media transcribe job (both via boto3 and directly on the service in the AWS console) returns failures with reasons:
The audio sample rate 8000 Hz is not supported. Change the audio
sample rate of your media file and try your request again.
The audio sample rate 11025 Hz is not supported. Change the audio sample rate of your media file and try your request again.
This happens when specifiying the sample rate AND leaving AWS to determine it (which it does correctly).
Where am I going wrong?

Cloud Run 503 error due to high cpu usage

I just implemented cloud run to process/encode video for my mobile application. I have recently gotten an unknown 503 error: POST 503 Google-Cloud-Tasks: The request failed because the HTTP connection to the instance had an error.
My process starts when a user uploads a video to cloud storage, then a function is triggered and sends the video source path to cloud tasks to be enqueued for encoding. Finally cloud run downloads the video, processes it via ffmpeg, and uploads everything to a separate bucket (all downloaded temp files are deleted).
I know video encoding is a cpu heavy task, but my application only allows up to ~3 minute videos to be encoded (usually around 100 MB). It works perfectly fine for shorter videos, but ones on the longer end flag the 503 error after processing for 2+ minutes
My instances are only used for video encoding and only allow 1 concurrent request/instance. Here are my services settings:
CPU - 2 cpu
Memory - 2 Gb
Concurrency - 1
Request Timeout - 900 seconds (15 minutes)
The documentation states that it is because of heavy cpu tasks so it's clear it is caused by the processing of heavier files, but I'm unsure what I can do to fix this given the max settings. Is it possible to set a cap on the CPU so it doesn't go overboard? Or is cloud run not a good solution for this kind of task?

Face rekognition in streaming video returns just one frame for each second

I'm trying to run face recognition on live stream via amazon rekogntion and kinesis services. I've configured kinesis video stream for input video, stream processor for recognition and kinesis data stream to get results from the stream processor. All is working good, but I'm getting just one frame for each second in the stream.
I calculate frame timestamp accordignly:
https://docs.aws.amazon.com/rekognition/latest/dg/streaming-video-kinesis-output.html
by adding the ProducerTimestamp and FrameOffsetInSeconds field values together and get timestamps with defference 1 second.
For instance:
1528993313.0310001
1528993314.0310001
1528993314.0310001
I use demo app for video streaming from Java Producer SDK
https://github.com/awslabs/amazon-kinesis-video-streams-producer-sdk-java.git
Total duration of data from stream processor is correct and equals the video file duration, but as I said I get just on frame for each second.
Answered my own question following some further research. As of today, Rekognition Streaming Video has restriction and analyses just one frame per second.

how to speed up google cloud speech

I am using a microphone which records sound through a browser, converts it into a file and sends the file to a java server. Then, my java server sends the file to the cloud speech api and gives me the transcription. The problem is that the transcription is super long (around 3.7sec for 2sec of dialog).
So I would like to speed up the transcription. The first thing to do is to stream the data (if I start the transcription at the beginning of the record. The problem is that I don't really understand the api. For instance if I want to transcript my audio stream from the source (browser/microphone) I need to use some kind of JS api, but I can't find anything I can use in a browser (we can't use node like this can we?).
Else I need to stream my data from my js to my java (not sure how to do it without breaking the data...) and then push it through streamingRecognizeFile from there : https://github.com/GoogleCloudPlatform/java-docs-samples/blob/master/speech/cloud-client/src/main/java/com/example/speech/Recognize.java
But it takes a file as the input, so how am I supposed to use it? I cannot really tell the system I finished or not the record... How will it understand it is the end of the transcription?
I would like to create something in my web browser just like the google demo there :
https://cloud.google.com/speech/
I think there is some fundamental stuff I do not understand about the way to use the streaming api. If someone can explain a bit how I should process about this, it would be owesome.
Thank you.
Google "Speech-to-Text typically processes audio faster than real-time, processing 30 seconds of audio in 15 seconds on average" [1]. You can use Google APIs Explorer to test exactly how long your each request would take [2].
To speed up the transcribing you may try to add recognition metadata to your request [3]. You can provide phrase hints if you are aware of the context of the speech [4]. Or use enhanced models to use special set of machine learning models [5]. All these suggestions would improve the accuracy and might have effects on transcribing speed.
When using the streaming recognition, in config you can set singleUtterance option to True. This will detect if user pause speaking and cease the recognition. If not streaming request will continue until to the content limit, which is 1 minute of audio length for streaming request [6].