How to get audio data in a specific format in real time from a Twilio MediaStreamTrack? - google-cloud-platform

I am using Twilio Programmable video, and trying to pipe remote participant's audio in real time to Google Cloud Media Translation client.
There is a sample code on how to use Google Cloud Media Translation client via microphone on here.
What I am trying to accomplish is that instead of using a microphone and node-record-lpcm16, I want to pipe what I am getting from Twilio's AudioTrack to Google Cloud Media Translation client. According to
this doc,
Tracks represent the individual audio, data, and video media streams that are shared within a Room.
Also, according to this doc, AudioTrack contains an audio MediaStreamTrack. I am guessing this can be used to extract the audio and pipe it to somewhere else.
What's the best way of tackling this problem?

Twilio developer evangelist here.
With the MediaStreamTrack you can compose it back into a MediaStream object and then pass it to a MediaRecorder. When you start the MediaRecorder it will receive dataavailable events which will be a chunk of audio in the webm format. You can then pipe those chunks elsewhere to do the translation. I wrote a blog post on recording using the MediaRecorder, which should give you a better idea how the MediaRecorder works, but you will have to complete the work to stream the audio chunks to the server to be translated.

Related

How do I stream audio files to my Icecast server running on an EC2 instance?

I am trying to loop audio from my Icecast server 24/7.
I have seen examples where people talk about storing their audio files on the EC2 instance or in an S3 bucket.
Do I also need a source client running on my EC2 Instance to be able to stream audio to the server? Or is there a way to play static files from Icecast?
Icecast and SHOUTcast servers work by passing a live audio stream from a source on to the users. You need something to produce a single audio stream in realtime from those source files.
The flow looks something like this:
Basically, you'll need to do everything you would in a normal radio studio, but automated. You'll stream the files from your bucket, play them to a raw audio stream, send that stream to your encoder to be compressed with the codec, and then sent to your streaming servers for distribution.
You can't simply push your audio files as-is to the Icecast server, for a few reasons:
Stream must be realtimeThe server doesn't really know or care about the timing of the stream. It takes the data its given and sends that off to the client. Therefore, if you push data faster than realtime, the server will attempt to deliver it to the client at this faster rate. Some clients will attempt to buffer this fast stream, but most will put backpressure on the stream, causing the TCP window to close, causing the client to eventually get far enough behind that the server drops the connection.
Consistent format is requiredChances are, your source files have varying sample rate, channel count, and even codec. Most clients are unable to take a change in sample rate or channel count mid-stream. I don't know of any client that supports a codec change mid-stream. (Theoretically possible with Ogg and Matroska/WebM, but yeah... not worth messing with.)
Stream should be free of ID3 tags and other file format cruftIf you simply PUT your files directly to your Icecast server, the output stream will contain more than just the audio data. At a minimum, you'd want to remove all that. Depending on your container format, you'll need to deal with timestamps as well.
Solutions
There are a handful of ways to solve this:
Radio automation softwareMany folks simply run something like RadioDJ on cloud-based servers. If you already have a radio station that uses automation, this might be a good solution. It can be expensive though, and not as flexible. You could even go as low as VLC or something for playout, but then you wouldn't have music transitions and what not.
Custom playout script (recommended)I use a browser engine, such as Chromium, and script my channels with normal JavaScript. From there, I take the output stream and pass it off to FFmpeg to encode and send to the streaming servers. This works really well, as I can do all my work in a language everybody knows, and I have easy access to data on cloud-hosted services. I can use the Web Audio API to mix and blend audio based on what's happening in realtime. As an alternative, there is Liquidsoap, but I do not recommend it these days as its language is difficult to deal with and it is not as flexible as a browser engine.

Stream Amazon Connect audio in real time with KVS

I have a contactflow in AWS Connect with customer audio streaming enabled. I get the customer audio steam in KVS and can read bytes from the stream and convert it to an audio file when the call is completed in Java with the examples provided by AWS.
But I want to steam the audio in a web page for real-time monitoring exactly like the AWS provides real-time monitoring in built-in CCP.
I get the steam ARN and other contact data. How can I use that stream for real-time monitoring/streaming?
Any heads up will be appreciated.
You're going to want to use a WebRTC client in the browser/page you want to use monitoring and controlling the the stream. AWS provides a WebRTC SDK for Kinesis Video Streams that can be used for this. The SDK documentation can be found here, which includes a link to samples and config details on GitHub

Web LiveStreaming WebRTC and Sockets (Flask Backend)

I want to build a live streaming app.
My thought process:
Get the Video/Audio data from the
navigator.mediaDevices.getUserMedia(constraints); [client-streamer]
create rooms using sockets(Socket.IO or WebSockets from flask) [backend]
Send the data in 1 to the room members using sockets.
display the media on the client-side.
Is that correct? How should I do it?
how do I broadcast data to specific room members and not to everyone? (flask)
How to consistently send data from the streamer -> server -> room members. the stream is given from 1 is an object, where is the data?
any other better ideas will be great! thanks.
I need to implement the server-side by myself without help from libraries that will do the work for me.
Implementing a streaming platform is not trivial. Unfortunately, it is not as simple as emitting chunks received from the MediaRecorder with onndatavailable and forwarding them to users using a WebSocket server - this is not scalable nor efficient nor reliable.
Below are some strategies you can try for different types of scenarios:
P2P: If you want to have simple peer-to-peer streaming, you can use WebRTC to achieve that with a simple socket.io server for signaling purposes.
Conference: Here things start to get more complicated. You will need a media server if you want to be somewhat scalable. One approach is to route your stream to the users using an SFU or MCU. This will take care of forwarding/processing media to different peers efficiently.
Broadcast: Here things are also non-trivial. Common WebRTC-based architectures include ingesting the WebRTC stream and forward that to an HLS server which will let your stream chunks available for clients through a CDN, or perform RTP forwarding of the WebRTC stream, convert it to RTMP using something like FFmpeg and deliver it through Youtube Live or Twitch to leverage from their infrastructure.
Be aware that the last 2 items are resource-intensive and will certainly not be cheap to maintain.
Below are some open source projects that could help you along the way:
Janus
MediaSoup
AntMedia
Jitsi
Good luck!
Explaining all this is far beyond the scope of a Stack Overflow answer.
Here are a few hints:
You need to use the MediaRecorder API to capture compressed data from your gUM (getUserMedia) stream. MediaRecorder support is inconsistent between makes and models of browser. though.
It kicks a Blob into its onndatavailable handler every so often.
They're compressed as a webm data stream.
You can push those Blobs to a server with socket.io, and the server can turn around and push them to whatever clients you want to.
Playing the webm on the clients is tricky. You may, on some makes and models of browsers, be able to feed the webm stream to the Media Source API using appendBuffer(). But some browsers cannot consume the webm streams.
These webm streams are useless to a player without all their Blob data in order. You can't just start sending a new client the Blobs of the stream when they sign in; you have to restart the MediaRecorder.
(You may be able to make it work without a MediaRecorder restart if you send the first few k bytes of the stream to each new client before sending the current Blob. Extracting those bytes is an intricate programming job involving the ebml package to parse the webm stream and extract the prologue. I have not proven this concept.)
Because getting all this to work -- originator -- server -- viewer is such a pain in the xxx neck, you may want to investigate using something like mediasoup instead. It uses WebRTC transport rather than socket.io, and works cross-platform.

stream audio from browser to WebRTC native C++ application

I manged to run WebRTC peerconnection example, but it is not running on the browser.
I'm trying to find a way to stream both video and audio from browser to my native program.
Is there any way?
It can be done. WebRTC is designed to work in a peer-to-peer manner between two WebRTC agents (typically a Web Browser). Your native program needs to become the second peer.
If you need to rely on open source components a good starting point is:
OpenSSL for the DTLS key exchange.
libsrtp to encrypt the RTP packets.
ffmpeg to decode the PCM audio from the browser (libvpx if you need to do video).
You'll also need to handle the ICE negotiation which requires processing STUN messages. Also extract the media payloads from the RTP packets. All these steps are also after you've determined a signalling method to exchange the SDP offer and answer between you app and the browser.
As you've probably realised starting from scratch it's a major task. There are probably some commercial libraries that will do the job and save you a lot of pain.
If that doesn't scare you and you do still want to make an attempt using open source components this example "may" help. The sample is doing the reverse of what you've asked and is sending a video stream to Chrome rather than receiving an audio stream. The useful aspect is the connection negotiation. The sample program is able to get RTP packets flowing which is often the main problem.
The example is also using Windows Media Foundation which is Windows specific. It also has lots of shortcuts particularly with the RTP and STUN packet processing.

Video streaming through web service and rendering - Any Issues?

We have a web service that sends the video content in the response as binary (in different formats asx, asf, ram, mpeg, mpg, mpe, qt, mov, avi, movie, wmv, smil, mp4, mxf, gxf, flv, 3gp, f4v, mj2, omf, dv, vob).
Do you see any issue with performance, if I have an intermediate application which makes a request to web service to retrieve video content and render in browser?
Thanks
As long as the web service returns binary data directly, then there will be no performance hit. If this is an XML or SOAP web service that wraps the whole thing in a SOAP envelope and bae64 encodes it to make it all text, then you will not be able to play it directly and it will have a big impact on bandwidth, cpu, and memory.
Also note that by serving the video directly instead of using a true streaming protocol the user will only be able to seek within the portion downloaded so far. A streaming protocol like RTSP, RTMP, or the many varieties of HTTP Streaming allow seeking to any part of the file and only downloading the part seeked to.