Routing Audio from microphone to network using QT 6.4.x - c++

With QT 6.4.x (Windows), how can I capture microphone audio and repackage it and forward the repackaged audio to a QUdpSocket.
The repackaging will involve changing the captured audio format from its typical 16 bit little endian format and converting to 24 bit big endian format where each packet will have a constant size potentially different size payload to that from the microphone. I am not sure but somehow I think I need to replace the QAudioSink with a QAudioDecoder as the description indicates:
The QAudioDecoder class is a high level class for decoding audio media files. It is similar to the QMediaPlayer class except that audio is provided back through this API rather than routed directly to audio hardware.
I have a partially working example that contains a mixture of sending synthesized audio directly to the speaker. This functionality is based off the 'Audio Output Example' that ships with Qt 6 (my modified example sends a sine wave generated tone to the speakers).
Also in this RtpWorker thread, using the 'Audio Source Example' for inspiration, I was also able to capture and intercept audio packets from the microphone, but I do not know how to send these packets (repackaged per the above) to a UDP socket in a fixed size datagrams, instead I just log the captured packets. I think I need an intermediate circular buffer (the write part of which fills it with captured microphone audio while the read part gets called by a QAudioSink or QAudioDecoder in pull mode).
Per my comment above I think I might need to send them to a QAudioDevice so I can handle the packaging and sending over the network myself.
My code is contained in 2 attachment in the following QTBUG-108383.
It would be great if someone could point to some useful examples that try to do something similar.

try to run Mac OS or Linux its seems Windows bug

Related

embed video stream with custom meta data

I have an optical system that provides a UDP video stream.
From device specification FAQ:
Both single metadata (KLV) stream and compressed video (H.264) with metadata (KLV) are available on Ethernet link. Compressed video and metadata are coupled in the same stream compliant with STANAG 4609 standard. Each encoded video stream is encapsulated with the associated metadata within an MPEG-TS single program stream over Ethernet UDP/IP/ The video and metadata are synchronized through the use of timestamps.
Also there are other devices that provide data about the state of an aircraft (velocity, coords, etc). This data should be displayed on a client GUI display alongside with video. Of course it has to be synchronized with the current video frame.
One of the approaches I though of is to embed this data into the video stream. But I am not sure if it is possible or should I use another (than UDP) protocol for this purpose.
Is it possible/reasonable to use such approach? Is ffmpeg library suitable in this case?
If not, what are the other ways to synchronize data with a video frame.
Latency is crucial. Although bandwidth is limited to 2-5 Mbps.
It seems to be possible using ffmpeg: AVPacket can be provided with additional data using function av_packet_add_side_data which takes a preallocated buffer, size and a type AVPacketSideDataType.
However, I am not sure for now, which enum value of AVPacketSideDataType can be used for custom user-provided binary data.
Something similar that might be used for my needs:
How do I encode KLV packets to an H.264 video using libav*
The quote sounds like you have a transport stream containing two elementary streams (the H.264 video in one, and the KLV data in another). The transport stream is sent over UDP (or TCP, or is just a file, whatever you want - its mostly independent of the transport).
There is a discussion of implementing this kind of thing in the Motion Imagery handbook (which you can download from the MISB part of the NSG Registry at https://nsgreg.nga.mil/misb.jsp - its towards the bottom of the Non-cited Standards Documents table) and in detail in ST 1402 (which you can find in the same table). I'm avoiding providing direct links because the versions change - just look for whatever is current.
The short version is that you can embed the timestamp in the video (see ST 0603 and ST 0604), and then correlate that to the metadata timestamp (Precision Time Stamp, see ST 0601). You don't want to do that at the AVPacket level though. Instead, you need to put side data into AVFrame, with the AV_FRAME_DATA_SEI_UNREGISTERED key (https://ffmpeg.org/doxygen/trunk/group__lavu__frame.html#ggae01fa7e427274293aacdf2adc17076bca4f2dcaee18e5ffed8ff4ab1cc3b326aa). You will need a fairly recent FFmpeg version.
Note: if all you want to do is see the UDP data stream - video on one side, and decoded KLV on the other, then you might like to check out the jMISB Viewer application: https://github.com/WestRidgeSystems/jmisb
It also provides an example of encoding (generator example). Disclaimer: I contribute to the project.

VoIP: How to capture the live audio/video streaming bytes from Camera in Qt multimedia?

The intention here is to capture those audio + video bytes from Camera, then optimise with an appropriate Qt class (suggestions welcome) and send them over TCP to a server. The server sends back those bytes to another client to be played. This is how we intend to establish basic VoIP (Voice/Video over Internet Protocol).
Checked many Qt APIs, but couldn't find any which gives a ready made utility for the same. Some of the Qt forums suggest that we should use 3rd party libraries. Other SO Qns, don't address my specific issue.
Don't want to capture those bytes first in a temporary file and then read from there, as it's not efficient compared to getting in-memory bytes.
Questions:
Are there any APIs available in Qt, which allow the capturing of live streaming bytes?
If not, then what are the alternatives in C++ for cross platforms?
Founded in Qt documentation. Seems that your case.
The QCamera class provides interface for system camera devices.
QCamera can be used with QCameraViewfinder for viewfinder display,
QMediaRecorder for video recording and QCameraImageCapture for image
taking. You can use QCameraInfo to list available cameras and choose
which one to use.

Full quality MP3 streaming via webRTC

I'm interested in webRTC's ability to P2P livestream an mp3 audio from user's machine. Only example, that I found is this: https://webrtc-mp3-stream.herokuapp.com/ from this article http://servicelab.org/2013/07/24/streaming-audio-between-browsers-with-webrtc-and-webaudio/
But, as you can see, the audio quality on receiving side is pretty poor (45kb\sec), is there any way to get a full quality MP3 streaming + ability to manipulate this stream's data (like adjusting frequencies with equalizer) on the each user's sides?
If impossible through webRTC, is there any other flash-plugin or pluginless options for this?
Edit: also I stumbled upon this 'shoutcast kinda' guys http://unltd.fm/ , declaring, that they are using webRTC to deliver top quality radio broadcasting including streaming mp3. If they are, then how?
WebRTC supports 2 audio codecs: OPUS (max bitrate 510kbit/s) and G711. You stick with OPUS, it is modern and more promising, introduced in 2012.
Main files in webrtc-mp3-stream are outdated by 2 years (Jul 18, 2013). I couldn't find OPUS preference in the code, so possibly demo runs via G711.
The webrtc-mp3-stream demo does the encoding job (MP3 as a media source), then it transmits the data over UPD/TCP via WebRTC. I do not think you need to decode it to MP3 on receiver side, this would be an overkill. Just try to enable OPUS to make the code of webrtc-mp3-stream more up-to-date.
Please refer to Is there a way to choose codecs in WebRTC PeerConnection? to enable OPUS to see the difference.
I'm the founder of unltd.fm.
igorpavlov is right but I can't comment answer. We also use OPUS (Stereo / 48Khz) codec over WebRTC.
Decoding mp3 ( or any other audio format ) using webaudio then encoding it in OPUS is the way to go. You "just" need to force SDP negotiations to use OPUS.
You should have send us an email you would have saved your 50 points ;)
You can increase the quality of a stream by setting the SDP to be stereo and increase the maxaveragebitrate:
let answer = await peer.conn.createAnswer(offerOptions);
answer.sdp = answer.sdp.replace('useinbandfec=1', 'useinbandfec=1; stereo=1; maxaveragebitrate=510000');
await peer.conn.setLocalDescription(answer);
This should output a SDP string which looks like this:
a=fmtp:111 minptime=10;useinbandfec=1; stereo=1; maxaveragebitrate=510000
This gives a potential maximum bitrate of 520kb/s for stereo, which is 260kps per channel. Actual bitrate depends on the speed of your network and strength of your signal.
You can read more about the other available SDP attributes at: https://www.rfc-editor.org/rfc/rfc7587

Reading audio stream to output device

I was curious if there is a way to read the data that is being sent to an audio output. My end goal is to capture the audio and then send it over serial for audio processing. I'm using a Windows computer.
The thing that seems to be making this more difficult is that I'm not reading the captured microphone input, but rather the streamed speaker output.
Can anybody help me out?
A more or less easy way is to take advantage of Stereo Mix device, where available. This way you have an audio capture device, which makes you available device audio output mixed down. You can read from this device as if it were a real audio input device such as Line In, or a microphone, using standard and well documented APIs or audio libraries.
Other options are more sophisticated and require both hooking into system and deeper understanding of the internals: you either hook audio APIs to intercept what applications send to audio outputs, or you install a virtual audio device the applications use and you have the data available from.

streaming video to and from multiple sources

I wanted to get some ideas one how some of you would approach this problem.
I've got a robot, that is running linux and uses a webcam (with a v4l2 driver) as one of its sensors. I've written a control panel with gtkmm. Both the server and client are written in C++. The server is the robot, client is the "control panel". The image analysis is happening on the robot, and I'd like to stream back the video from the camera to the control panel for two reasons:
A) for fun
B) to overlay image analysis results
So my question is, what are some good ways to stream video from the webcam to the control panel as well as giving priority to the robot code to process it? I'm not interested it writing my own video compression scheme and putting it through the existing networking port, a new network port (dedicated to video data) would be best I think. The second part of the problem is how do I display video in gtkmm? The video data arrives asynchronously and I don't have control over main() in gtkmm so I think that would be tricky.
I'm open to using things like vlc, gstreamer or any other general compression libraries I don't know about.
thanks!
EDIT:
The robot has a 1GHz processor, running a desktop like version of linux, but no X11.
Gstreamer solves nearly all of this for you, with very little effort, and also integrates nicely with the Glib event system. GStreamer includes V4L source plugins, gtk+ output widgets, various filters to resize / encode / decode the video, and best of all, network sink and sources to move the data between machines.
For prototype, you can use the 'gst-launch' tool to assemble video pipelines and test them, then it's fairly simply to create pipelines programatically in your code. Search for 'GStreamer network streaming' to see examples of people doing this with webcams and the like.
I'm not sure about the actual technologies used, but this can end up being a huge synchronization ***** if you want to avoid dropped frames. I was streaming a video to a file and network at the same time. What I eventually ended up doing was using a big circular buffer with three pointers: one write and two read. There were three control threads (and some additional encoding threads): one writing to the buffer which would pause if it reached a point in the buffer not read by both of the others, and two reader threads that would read from the buffer and write to the file/network (and pause if they got ahead of the producer). Since everything was written and read as frames, sync overhead could be kept to a minimum.
My producer was a transcoder (from another file source), but in your case, you may want the camera to produce whole frames in whatever format it normally does and only do the transcoding (with something like ffmpeg) for the server, while the robot processes the image.
Your problem is a bit more complex, though, since the robot needs real-time feedback so can't pause and wait for the streaming server to catch up. So you might want to get frames to the control system as fast as possible and buffer some up in a circular buffer separately for streaming to the "control panel". Certain codecs handle dropped frames better than others, so if the network gets behind you can start overwriting frames at the end of the buffer (taking care they're not being read).
When you say 'a new video port' and then start talking about vlc/gstreaming i'm finding it hard to work out what you want. Obviously these software packages will assist in streaming and compressing via a number of protocols but clearly you'll need a 'network port' not a 'video port' to send the stream.
If what you really mean is sending display output via wireless video/tv feed that's another matter, however you'll need advice from hardware experts rather than software experts on that.
Moving on. I've done plenty of streaming over MMS/UDP protocols and vlc handles it very well (as server and client). However it's designed for desktops and may not be as lightweight as you want. Something like gstreamer, mencoder or ffmpeg on the over hand is going to be better I think. What kind of CPU does the robot have? You'll need a bit of grunt if you're planning real-time compression.
On the client side I think you'll find a number of widgets to handle video in GTK. I would look into that before worrying about interface details.