Audio mixing with alsa's dmix plugin in c++

Audio mixing with alsa's dmix plugin in c++ - c++

I trying to play two wav files at the same time using alsa. Note that the wav files have a different sample rate. This is possible, and audio streams are mixed and send to the audio chip. (I'm developing on an embedded linux device.) But one stream is being played a couple times faster then normal. So I guess there's a problem with resampling.
I have a default device with dmix plugin enabled in /etc/asound.conf and set the sample rate to 44100Hz. But to my understanding ALSA resamples all streams internally to 48khz and mixes them before downsampling them again to my desired output rate, in my case 44.1khz.
Is this correct?
When using the alsa-lib for playing audio files, do I need to set all parameters for that specific wav file?
For example: 8000hz mono 16-bits
set snd_pcm_hw_params_set_rate() to 8000hz
snd_pcm_hw_params_set_format to 16bits LE/BE/signed/unsigned
snd_pcm_hw_params_set_channels for mono
Does this change the hardware settings for the device or only for this specific audio stream?
Any clarification would be appreciated.
EDIT:
I might have misinterpreted the following: [ALSA]
When software mixing is enabled, ALSA is forced to resample everything to the same frequency (48000 by default when supported). dmix uses a poor resampling algorithm which produces noticeable sound quality loss.
So to be clear, if I change the rate in asound.conf of the dmix device to 44100, everything should be automagically be resampled to 44100 and mixed?
Thus the reason that one of my two mixed audio files has a incorrect speed is probably caused by incorrect stream settings using alsa-lib?
Because if I play one wav file at a time , both streams seem correct.
It's only when the first one is playing and at the same time I mix the other one in the stream, the speed of the first wav file is changed. Note that hw settings are the same at this time. Why does setting hw parameters (and playing) of stream2 changes something in stream1?

ALSA does not have a fixed 48 kHz resampling.
A dmix device uses a fixed sample rate and format, but all the devices using it typically use the plug plugin to enable automatic conversions.
When using alsa-lib, you must set all parameters that are important to you; for any parameters not explicitly set, alsa-lib chooses a somewhat random value.
Different streams can use different parameters.

Related

Routing Audio from microphone to network using QT 6.4.x

With QT 6.4.x (Windows), how can I capture microphone audio and repackage it and forward the repackaged audio to a QUdpSocket.
The repackaging will involve changing the captured audio format from its typical 16 bit little endian format and converting to 24 bit big endian format where each packet will have a constant size potentially different size payload to that from the microphone. I am not sure but somehow I think I need to replace the QAudioSink with a QAudioDecoder as the description indicates:
The QAudioDecoder class is a high level class for decoding audio media files. It is similar to the QMediaPlayer class except that audio is provided back through this API rather than routed directly to audio hardware.
I have a partially working example that contains a mixture of sending synthesized audio directly to the speaker. This functionality is based off the 'Audio Output Example' that ships with Qt 6 (my modified example sends a sine wave generated tone to the speakers).
Also in this RtpWorker thread, using the 'Audio Source Example' for inspiration, I was also able to capture and intercept audio packets from the microphone, but I do not know how to send these packets (repackaged per the above) to a UDP socket in a fixed size datagrams, instead I just log the captured packets. I think I need an intermediate circular buffer (the write part of which fills it with captured microphone audio while the read part gets called by a QAudioSink or QAudioDecoder in pull mode).
Per my comment above I think I might need to send them to a QAudioDevice so I can handle the packaging and sending over the network myself.
My code is contained in 2 attachment in the following QTBUG-108383.
It would be great if someone could point to some useful examples that try to do something similar.

try to run Mac OS or Linux its seems Windows bug

No audio output from one of two streams when rendering directly to WASAPI

I've been stuck on this problem for weeks now and Google is no help, so hopefully some here can help me.
I am programming a software sound mixer in C++, getting audio packets from the network and Windows microphones, mixing them together as PCM, and then sending them back out over the network and to speakers/USB headsets. This works. I have a working setup using the PortAudio library to handle the interface with Windows. However, my supervisors think the latency could be reduced between this software and our system, so in an attempt to lower latency (and better handle USB headset disconnects) I'm now rewriting the Windows interface layer to directly use WASAPI. I can eliminate some buffers and callbacks doing this, and theoretically use the super low latency interface if that's still not fast enough for the higher ups.
I have it only partially working now, and the partially part is what is killing me here. Our system has the speaker and headphones as three separate mono audio streams. The speaker is mono, and the headset is combined from two streams to be stereo. I'm outputting this to windows as two streams, one for a device of the user's choice for speaker, and one of another device of the user's choice for headset. For testing, they're both outputting to the default general stereo mix on my system.
I can hear the speaker perfectly fine, but I cannot hear the headset, no matter what I try. They both use the same code path, they both go through a WMF resampler to convert to 2 channel audio at the sample rate Windows wants. But I can hear the speaker, but never the headset stream.
It's not an exclusive mode problem: I'm using shared mode on all streams, and I've even specifically tried cutting down the streams to only the headset, in case one was stomping the other or something, and still the headset has no audio output.
It's not a mixer problem upstream, as I haven't modified any code from when it worked with PortAudio streams. I can see the audio passing through the mixer and to the output via my debug visualizers.
I can see the data going into the buffer I get from the system, when the system calls back to ask for audio. I should be hearing something, static even, but I'm getting nothing. (At one point, I bypassed the ring buffer entirely and put random numbers directly into the buffer in the callback and I still got no sound.)
What am I doing wrong here? It seems like Windows itself is the problem or something, but I don't have the expertise on Windows APIs to know what, and I'm apparently the most expert for this stuff in my company. I haven't even looked yet as to why the microphone input isn't working, and I've been stuck on this for weeks now. If anyone has any suggestions, it'd be much appreciated.

Check the re-sampled streams: output the stereo stream to the speaker, and output the mono stream to the handset.
Use IAudioClient::IsFormatSupported to check supported formats for the handset.
Verify your code using an mp3 file. Use two media players to play different files with different devices simultaneously.

Saving video from an RTP stream to a file

I'm trying to get and store the data from an IP camera and I'd appreciate some high level advice as to the best way to do this.
So far I've successfully initiated an RTSP conversation with the camera and have it sending me UDP packets with RTP payloads. But I'm unsure where to go from here.
I'm very happy to do the work, I'd just appreciate some pointers / a high level overview of the steps so I can deconstruct the project!

There is no direct answer to the OPs question here for his question is a bit broad, and without further information that pertains to what the OP intends to do with that information it is difficult to give a precise answer. What I can do here is to suggest to the OP steps that maybe taken and what problems to consider.
OP had stated:
So far I've successfully initiated an RTSP conversation with the camera and have it sending me UDP packets with RTP payloads. But I'm unsure where to go from here.
Now that you have an established communication with the camera and are able to receive data packets via video stream it is now a matter of being able to understand what the RTP payloads are, or how to interpret that data. So at this point you will have to do your research on the RTP protocol which appears to me to be a type of a Network Protocol. Once you have written your structure and functions to work successfully with this protocol it is a matter of breaking down the UPD packets into useful bytes of information. Normally in many cases when it comes to processing either graphic, video or audio data either from a file directly or a stream object, they are usually accompanied with some type of header information. Next, it is a matter of understanding this Header information which is normal in a form of a structure that gives information about the type of content this file or stream holds, so that you know how many bytes of information to extract from it.
I know it's not going to just be a case of saving the RTP payload directly to a file, but what other steps are involved?
The steps involved may vary depending on your needs and what you intend to do with the information: Are you trying to write the properties or the general information about the video content to a file such as: its compression type, its audio - video codec type, its resolution and frame rate information, its byte rate etc.? Or are you trying to write the actual video content itself to a file that your application will use either for play back or for editing purposes? This all depends on your intentions.
Is the data compressed, in which case I have to decompress it?
At this point once you have successfully been able to interpret the RTP Protocol and parsed the data packets by understanding their header information and saving it to a proper structure, it is then a matter of using that header information to determine what is actually within that stream object. For example and according to the PDF about the properties of the Video Camera that you have supplied the Video Compression can be saved in 2 types, H.264 or MJPEG, this you will have to determine by the information that was provided in the header, from here you would have to branch your code and be able to read and parse each type of compression or, accept the one that you are willing to work with and disregard the other. Next is the Audio Compression if you are concerned about the audio and the types available are AAC(Encoding only), G.711 A-Law, & G.711 U-Law and the same mechanisms apply here. The once are you able to get past the audio and video compression you will then need vital information about the video information itself such as what Resolution and Frame Rate (buffer sizes) were stored from the header information so you know how many bytes to read from the stream and how far to move your pointer through the stream. If you notice the Resolution And Frame Rate there are different acceptable formats available from each type of Compression that is being used:
H.26
1920 x 180 (2.1MP) # 30 fps (1080p)
1280 x 720 # 60 fps (720p)*
720 x 480/576 # 30/25 fps (D1)
704 x 480/576 # 30/20 fps (4CIF)
352 x 240/288 # 30/25 fps (CIF)
MJPEG
720 x 480/576 # 30/25 fps (D1)
740 x 480/578 # 30/25 fps (4CIF)
352 x 240/288 # 30/25 fps (CIF)
Now this is for the resolution & frame rate but the next thing to consider is
you are working with video stream so the above may not apply here in your case and according to the properties about Video-Stream capabilities from the Video Camera These are the types available that you will have to take into account for:
Single-stream H.264 up to 1080p (1920 x 1080) # 30 fps
Dual-stream H.264 and MJPEG
H.264: Primary stream programmable up to 1280 x 720 # 25/20 fps
MJPEG: Secondary stream programmable up to 720 x 576 # 25/20 fps
With these different types available for your Video Camera to use you have to take all these into consideration. Now this also depends on your intentions of your application and what you intend to do with the information. You can write your program to accept all of these types or you can program it to accept only one type with a specific format of that type. This depends on you.
Do I have to do any other modifications?
I don't think you would have any modifications to do unless if your intentions within your actual application is to modify the video - audio information itself. If your intentions within your application are to just read the file for simple playback then the answer would be no as long as all the appropriate information was saved properly and your file parser for reading your custom file structure is able to read in your file's contents and is able to parse the data appropriately for general playback.
Where can I learn about what I'll need to do specific to this camera?
I don't think you need to much more information about the camera itself, for the PDF that you provided in the link within your question has already given you enough information to go on with. What you would need from here is information and documentation about the specific Protocols, Packet Types, Compression & Stream types which a general search of these should suffice.
UDP
Do a Google search for c++ programming UDP Sockets for either Linux or Winsock.
RTP
Do a Google search for c++ programming RTP Packets
Video Compression
Do a Goggle search for both H.26 & MJPEG compression and structure information on stream objects.
Audio Compression
Do a Google search for each of AAC(encoding only), G.711 A-Law, G.711 U-Law if you are interested in the audio as well.
From there once you have the valid specifications for these data structures as a stream object and have required the appropriate header information to determine which type and format this video content is saved as then you should be able to easily parse the Data Packets appropriately. Now as to how you save them or write them to a file all depends on your intentions.
I have provided this as a guideline to follow in order to help lead you in the right direction in a similar manner that a chemist, physicist, scientist, or engineer would approach any typical problem.
The general steps are by following a scientific approach about the current problem. These typically are:
Assessing the Situation
Create either a Hypothesis or a Thesis about the Situation.
Gather the Known Facts
Determine the Unknowns
Draft a Model that Shows a Relationship Between the Known and Unknowns.
Perform both Research and Experimentation
Record or Log Events and Data
Analyze the Data
Draw a Conclusion
Now in the case of writing software application the concept is similar but the approaches may be different or vary as not all of the steps above may be needed and or some additional steps may be required. One such step in the Application Development Cycle that is not found in the Scientific approach would be the process of Debugging an Application. But the general guideline still applies. If you can keep to this type of strategy I am sure that you will be able to have the confidence in gathering what you will need and how to use it from a step by step process to achieve your goals.

I'm trying to get and store the data from a Cisco IPC camera, and I'd appreciate some high level advice as to the best way to do this.
You can probably use openRTSP to do this, which can also output to file. For this approach you would have to write NO code. Implementing RTP, RTSP and RTCP correctly is complex and a lot of work. Should you have requirements that openRTSP doesn't meet, you can use the live555 libraries to for RTSP/RTP/RTCP and write some minimal code to do something with the received video. The mailing list is very responsive provided that you ask "good" questions, and make sure you read the FAQ first.
I know it's not going to just be a case of saving the RTP payload directly to a file, but what other steps are involved?
You don't need to know this if you use openRTSP. If you use the live555 libraries directly, you'll be passed entire video frames that you would then either have to decode and write to file yourself depending on what you want to achieve. If you DO need/want to know about RTP and RTP payload formats, read the corresponding RFCs, e.g. RFC2326, RFC3550, RFC6184.
Is the data compressed, in which case I have to decompress it?
Generally you want to store compressed media in a file, and use media player software to decode it on playback (Otherwise you end up with huge files).
Where can I learn about what I'll need to do specific to this camera?
If you just want to save the video you ideally don't need to know anything about the camera, other than what standards it implements (which you already do).

How to stream live audio and video while keeping low latency

I'm writing a program similar to StreamMyGame with the difference of the client being free and more importantly, open source, so I can port it to other devices (in my case an OpenPandora), or even make an html5 or flash client.
Because the objective of the program is to stream video games, latency should be reduced to a minimum.
Right now I can capture video of Direct 3D 9 games at a fixed frame rate, encode it using libx264 and dumping it to disk, and send input remotely, but I'm stumped at sending the video and eventually the audio through the network.
I don't want to implement a way just to discover that it introduces several seconds of delay and I don't care how it is done as long as it is done.
Off of my head I can think several ways:
My current way, encode video with libx264 and audio with lame or as ac3 and send them with live555 as a RTSP feed, though the library is not playing nice with MSVC and I’m still trying to understand its functioning.
Have the ffmpeg library do all the grunt work, where it encodes and sends (I guess I'll have to use ffserver to get an idea on how to do it)
Same but using libvlc, perhaps hurting encoding configurability in the process.
Using several pipes with the independent programs (ie: piping data to x264.exe or ffmpeg.exe)
Use other libraries such as pjsip or JRTPLIB that might simplify the process.
The hard way, sending video and audio through an UDP channel and figuring out how to synchronizing everything at the client (though the reason to use RTSP is to avoid this).
Your way, if I didn't think of something.
The second option would really be the best as it would reduce the number of libraries (integrate swscale, libx264, the audio codec and the sender library), simplify the development and bringing more codec variety (CELT looks promising) but I worry about latency as it might have a longer pipeline.
100 ms would already be too much, especially when you consider you might be adding another 150 ms of latency when it is used trough broadband.
Does any of you have experience with these libraries, to recommend me to switch to ffmpeg, keep wrestling live555 or do anything else (even if I didn’t mentioned it)?

I had very good results of streaming large blocks of data with low latency using UDT4 library. But first I would suggest checking ffmpegs network capabilities, so you have a native solution in all operations.

Portable library to play samples on individual 5.1 channels with C/C++?

I'm looking for a free, portable C or C++ library which allows me to play mono sound samples on specific channels in a 5.1 setup. For example the sound should be played with the left front speaker whereby all other speakers remain silent. Is there any library capable of doing this?
I had a look at OpenAL. However, I can only specify the position from which the sound should come, but it seems to me that I cannot say something like "use only the front left channel to play this sound".
Any hints are welcome!

I had a look at OpenAL. However, I can only specify the position from which the sound should come, but it seems to me that I cannot say something like "use only the front left channel to play this sound".
I don't think this is quite true. I think you can do it with OpenAL, although it's not trivial. OpenAL only does the positional stuff if you feed it mono format data. If you give it stereo or higher, it plays the data the way it was provided. However, you're only guaranteed stereo support. You'll need to check to see if the 5.1 channel format extension is available on your system (AL_FORMAT_51CHN16). If so, then, I think that you feed your sound to the channel you want and feed zeroes to all the others channels when you buffer the samples. Note that you need hardware support for this on the sound card. A "generic software" device won't cut it.
See this discussion from the OpenAL mailing list.
Alternatively, I think that PortAudio is Open, cross-platform, and supports multiple channel output. You do still have to interleave the data so that if you're sending a sound to a single channel, you have to send zeroes to all the others. You'll also still need to do some checking when opening a stream on a device to make sure the device supports 6 channels of output.

A long time ago I used RTAudio. But I cannot say if this lib can do what you want to archive, but maybe it helps.

http://fmod.org could do the trick too

I use the BASS Audio Library http://www.un4seen.com for all my audio, sound and music projects. I am very happy with it.
BASS is an audio library to provide developers with powerful and efficient sample, stream (MP3, MP2, MP1, OGG, WAV, AIFF, custom generated, and more via add-ons), MOD music (XM, IT, S3M, MOD, MTM, UMX), MO3 music (MP3/OGG compressed MODs), and recording functions. All in a tiny DLL, under 100KB* in size. C/C++, Delphi, Visual Basic, MASM, .Net and other APIs are available. BASS is available for the Windows, Mac, Win64, WinCE, Linux, and iOS platforms.
I have never used it to play different samples in a 5.1 configuration. But, according their own documentation, it should be possible.
Main features
Samples Support for WAV/AIFF/MP3/MP2/MP1/OGG and custom generated samples
Sample streams Stream any sample data in 8/16/32 bit, with both "push" and "pull" systems. File streams MP3/MP2/MP1/OGG/WAV/AIFF file streaming. Internet file streaming. Stream data from HTTP and FTP servers (inc. Shoutcast, Icecast & Icecast2), with IDN and proxy server support and adjustable buffering. ** Custom file streaming ** Stream data from anywhere using any delivery method, with both "push" and "pull" systems
Multi-channel Support for more than plain stereo, including multi-channel OGG/WAV/AIFF files
...
Multiple outputs Simultaneously use multiple soundcards, and move channels between them
Speaker assignment Assign streams and MOD musics to specific speakers to take advantage of hardware capable of more than plain stereo (up to 4 separate stereo outputs with a 7.1 soundcard)
3D sound Play samples/streams/musics in any 3D position
Licensing
BASS is free for non-commercial use. If you are a non-commercial entity (eg. an individual) and you are not making any money from your product (through sales, advertising, etc), then you can use BASS in it for free. Otherwise, one of the following licences will be required.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js