My goal is to create a website that can stream Audio data from the microphone to the backend for processing and realtime responses (i.e. something like real-time transcription as an example). Currently, my project has a React.js frontend with a Flask backend (all my preprocessing is in python) and I found this great tutorial from medium about this specific task here:
https://medium.com/google-cloud/building-a-client-side-web-app-which-streams-audio-from-a-browser-microphone-to-a-server-part-ii-df20ddb47d4e
Now, I have accomplished replicating the frontend code. The relevant code for this task would be:
const socketio = io('http://localhost:5000');
##Some other code ##
navigator.getUserMedia({
audio: true
}, function (stream) {
//5)
recordAudio = RecordRTC(stream, {
type: 'audio',
//6)
mimeType: 'audio/webm',
sampleRate: 44100,
// used by StereoAudioRecorder
// the range 22050 to 96000.
// let us force 16khz recording:
desiredSampRate: 16000,
// MediaStreamRecorder, StereoAudioRecorder, WebAssemblyRecorder
// CanvasRecorder, GifRecorder, WhammyRecorder
recorderType: StereoAudioRecorder,
// Dialogflow / STT requires mono audio
numberOfAudioChannels: 1,
timeSlice: 100,
ondataavailable: function (blob) {
// 3
// making use of socket.io-stream for bi-directional
// streaming, create a stream
var stream = ss.createStream();
// stream directly to server
// it will be temp. stored locally
ss(socket).emit('stream', stream, {
name: 'stream.wav',
size: blob.size
});
// pipe the audio blob to the read stream
ss.createBlobReadStream(blob).pipe(stream);
console.log("Sent some data hopefully")
}
});
Now, my Flask backend is capable of getting a connection from the frontend, but it never sees any of the emits from the stream of the audio data. Basically, my goal is to replicate the tutorial from the next part of the tutorial:
https://medium.com/google-cloud/building-a-web-server-which-receives-a-browser-microphone-stream-and-uses-dialogflow-or-the-speech-62b47499fc71
which creates an Express server and does some NLP tasks. My goal is to run the stream through google cloud speech-to-text on the Flask backend and emit the transcription results in realtime to the React frontend. I looked, and Google does have a tutorial for both Node.js and Python located here:
https://cloud.google.com/speech-to-text/docs/streaming-recognize#speech-streaming-mic-recognize-python
Where the python code uses a MicrophoneStream using pyAudio as a stream/ generator and passes it into the google cloud client.
with MicrophoneStream(RATE, CHUNK) as stream:
audio_generator = stream.generator()
requests = (
speech.StreamingRecognizeRequest(audio_content=content)
for content in audio_generator
)
responses = client.streaming_recognize(streaming_config, requests)
# Now, put the transcription responses to use.
listen_print_loop(responses)
My question is how can I have Flask accept the BlobReadStream data from the frontend and create a python generator so that I can input the data into google cloud? One thing that I have thought about is using async or threads to generate a queue of blobs like in the google cloud tutorial while the other thread runs them asynchronously through google cloud.
Related
I want to stream the microphone audio from the web browser to AWS S3.
Got it working
this.recorder = new window.MediaRecorder(...);
this.recorder.addEventListener('dataavailable', (e) => {
this.chunks.push(e.data);
});
and then when user clicks on stop upload the chunks new Blob(this.chunks, { type: 'audio/wav' }) as multiparts to AWS S3.
But the problem is if the recording is 2-3 hours longer then it might take exceptionally longer and user might close the browser before waiting for the recording to complete uploading.
Is there a way we can stream the web audio directly to S3 while it's going on?
Things I tried but can't get a working example:
Kineses video streams, looks like it's only for real time streaming between multiple clients and I have to write my own client which will then save it to S3.
Thought to use kinesis data firehose but couldn't find any client data producer from brower.
Even tried to find any resource using aws lex or aws ivs but I think they are just over engineering for my use case.
Any help will be appreciated.
You can set the timeslice parameter when calling start() on the MediaRecorder. The MediaRecorder will then emit chunks which roughly match the length of the timeslice parameter.
You could upload those chunks using S3's multipart upload feature as you already mentioned.
Please note that you need a library like extendable-media-recorder if you want to record a WAV file since no browser supports that out of the box.
I am trying to connect lots of iot objects to an eventhub and save them to a blob storage(also an sql database). I want to do this with python(and I am not sure if this is a recommended practice). The documentation about python was confusing. I tried a few examples but they create an entry to blob storage but entries seems to be irrelevant.
Things like this:
Objavro.codecnullavro.schema\EC{"type":"record","name":"EventData","namespace":"Microsoft.ServiceBus.Messaging","fields":[{"name":"SequenceNumber","type":"long"}...
which is not what I send. How can I solve this?
You could use the azure-eventhub Python SDK to send messages to Event Hub which is available on pypi.
And there is a send sample showing how to send messages:
import os
from azure.eventhub import EventHubProducerClient, EventData
producer = EventHubProducerClient.from_connection_string(
conn_str=CONNECTION_STR,
eventhub_name=EVENTHUB_NAME
)
with producer:
event_data_batch = producer.create_batch()
event_data_batch.add(EventData('Single message'))
producer.send_batch(event_data_batch)
I'm interested in The documentation about python was confusing. I tried a few examples but they create an entry to blob storage but entries seems to be irrelevant.
Could you share your code with me? I'm wondering what's the input/output for Event Hub and Storage Blob and how's the data processing flow.
btw, for Azure Storage Blob Python SDK usage, you could check the repo and [blob samples] for more information.
This is the connection string format for inserting new messages in eventhub using kafka-python. If you were using kafka and want to replace you just have to change this connection string.
import ssl
context = ssl.create_default_context()
context.options &= ssl.OP_NO_TLSv1
context.options &= ssl.OP_NO_TLSv1_1
self.kafka = KafkaProducer(bootstrap_servers=KAFKA_HOST,connections_max_idle_ms=5400000,security_protocol='SASL_SSL',value_serializer=lambda v: json.dumps(v).encode('utf-8'),sasl_mechanism='PLAIN',sasl_plain_username='$ConnectionString',sasl_plain_password={YOUR_KAFKA_ENDPOINT},api_version = (0,10),retries=5,ssl_context = context)
KAFKA_HOST = "{your_eventhub}.servicebus.windows.net:9093"
KAFKA_ENDPOINT="Endpoint=sb://{your_eventhub}.servicebus.windows.net/;SharedAccessKeyName=RootSendAccessKey;SharedAccessKey={youraccesskey}"
You can find KAFKA_HOST and KAFKA_ENDPOING from your Azure Console.
I'm a newbie in GCP.
While I'm reading the document of google speech api, it says that "Asynchronous Recognition (REST and gRPC) sends audio data to the Speech API and initiates a Long Running Operation. Using this operation, you can periodically poll for recognition results."
But what does "a Long Running Operation" actually means? And what's the difference between the process of synchronous & asynchronous recognition?
I've searched on the internet and found an answer about this: https://www.quora.com/What-is-the-difference-between-synchronous-and-asynchronous-speech-recognition
But I still can't get the idea. Can anyone explain more specifically?
I'll very appreciate for your answer:)
Asynchronous cloud requests usually return an id that request has been en-queued for processing, and later you can use that id to check on status and retrieve results when done.
Synchronous requests return results as part of response, but they may block for longer amounts of time.
You can use gcloud command line tool to try both. Sync requests for audio less than 60 sec
gcloud ml speech recognize AUDIO_FILE ...
and async for audio longer that is longer that 60sec
gcloud ml speech recognize-long-running AUDIO_FILE ...
the latter instead of transcript will return OPERATION_ID which later you can run
gcloud ml speech operations describe OPERATION_ID
to obtain results.
TIP: You can add --log-http flag to see what API requests gcloud is making to get more insight into what is going on at api level.
As I am aware of the limitations listed here, I need to receive some clarifications on the quota limit.
I'm using the Node.js library to make a simple asynchronous speech-to-text API, using a .raw file which is stored in my bucket.
After the request is done, when checking the API Manager Traffic, the daily requests counter is increased by 50 to 100 requests.
I am not using any kind of requests libraries or other frameworks. Just the code from the gCloud docs.
var file = URL.bucket + "audio.raw"; //require from upload.
speech.startRecognition(file, config).then((data) => {
var operation = data[0];
operation.on('complete', function(transcript) {
console.log(transcript);
});
})
I believe this has to do with the operation.on call, which registers a listener for the operation to complete, but continues to poll the service until it does finally finish.
I think, based on looking at some code, that you can change the interval at which the service will poll using the longrunning.initialRetryDelayMillis setting, which should reduce the number of requests you see in your quota consumption.
Some places to look:
Speech client constructor: https://github.com/GoogleCloudPlatform/google-cloud-node/blob/master/packages/speech/src/index.js#L70
GAX Speech Client's constructor: https://github.com/GoogleCloudPlatform/google-cloud-node/blob/master/packages/speech/src/v1/speech_client.js#L67
Operations client: https://github.com/googleapis/gax-nodejs/blob/master/lib/operations_client.js#L74
GAX's Operation: https://github.com/googleapis/gax-nodejs/blob/master/lib/longrunning.js#L304
I have the following setup:
The ember frontend is connected to a websocket server
The backend pushes records (real-time data) via websocket to the clients, as stringified JSON
The client receives the data, and must now update the store with this new data received
The problem that I have is that I do not know to process the raw JSON data to make it compatible to what is in the store. I can of course parse the json (JSON.parse), but this is just a part of what the REST adapter is doing.
When doing a normal REST request, more or less what happens is that:
server generates reply -> REST adapter converts it -> it gets pushed to the store
But now, since I am not using the REST adapter to process this data (because this is not a request triggered in the client side, but a notification coming from the server side), I do not know how to trigger the normal processing that the REST adapter performs.
How can I trigger the REST adapter programmatically? Can I pass it the stringified JSON coming from the websockets server?
Is it possible to hook the REST adapter to a generic websockets callback, where the only thing I have is the stringified JSON coming from the websockets server?
This is the code that I have (inspired in web2py)
function connect_websocket() {
console.log('connect_websocket > connecting to server');
var callback = function(e) {
var data = JSON.parse(e.data);
console.log('Data received > data=%o', data);
// TODO;
// - process the data as the REST adapter would do, and push new / updated records to the store.
// - handle record deletes too (how?)
};
if(!$.web2py.web2py_websocket('ws://127.0.0.1:8888/realtime/mygroup', callback)) {
alert('html5 websocket not supported by your browser, try Google Chrome');
}
}
I have taken a look at EmberSockets, but as far as I understand that does not offer a generic method of updating records in the store, but just a very specialized way of updating properties in the controllers (which requires a lot of configuration too).
What I am looking for is a generic method of triggering the ember REST adapter from a websockets server. Is there such a thing?