I'm currently building a project on C++ using Visual Studio on Windows 8. This application captures video from camera and triggers some virtual animations in real-time, with some sounds being played along with the animations.
The user has the option to record the experience in video and sound. I already am able to record video, now I want to create a audio track of the sounds that are being played by the application, to later fuse both video and audio files.
So, which is the best way to record audio output from an application in windows?
Let me stress that I do NOT want to record audio from any input devices (such as a microphone), only from the application itself.
Best regards.
There is no recording of application output. If you generate audio on your own, you make a copy for the recording purposes, mix if you have multiple sources, and then use one of the APIs to produce a file depending on your preferences: directly writing a WAV file, Windows Media audio files (ASF/WMA), DriectShow, Media Foundation, third party libraries.
Real playback audio data is being mixed and sent for further playback. Sometimes you can enable loopback recording to capture fully mixed output (not just of specific application through) as if it is a capture from realtime audio input device.
Related
I'm right now reading the microsoft documentation about drivers and core audio apis. At the moment I'm still confuse which way to go to achieve what I need.
I have an audio application which is Standalone and coded with framework JUCE in C++. And I need to build a Windows solution that would capture the audio stream that is going to an audio endpoint device to use it as an input of my audio application.
This stream must have an unaltered volume: always 1.0 (no matter if the hardware volume is changed or muted).
I must be able to choose between the different endpoint devices, for exemple if I have an external soundcard that is plugged, my audio application should be able to intercept and copy the stream that is going to that external soundcard, or do the same for the stream that is going to the built-in speakers.
The idea is to capture the output streams before they are modified by hardware volume modifications, and make a copy of them routed to my application without changing the output routing and behaviour.
The microsoft documentation is very furnished, but even if the WASAPI provides a lot of ways to capture and stream from audio endpoint devices, I'm not sure it is possible to get an unaltered volume, as it will always capture what's exactly coming out of the speakers.
This is why I don't know If I can implement a feature directly in my audio application that will get the streams I want with WASAPIs or if I have to code a proper Audio Driver that would make a copy of the streams I want for my application to be able to use these streams.
The documentations I refer to:
Audio Drivers design guide
Core Audio APIs / WASAPI
Thanks for the help,
Best,
Maxime
Sometimes the volume control is implemented in software, and sometimes it is implemented in hardware. You can call IAudioEndpointVolume::QueryHardwareSupport to see if the volume control for the audio endpoint you're working with is implemented in hardware or software.
Sometimes the audio loopback is implemented in software, and sometimes it is implemented in hardware. There is no API to tell which.
If the audio loopback is implemented in software, and the volume control is implemented in hardware, then you will get back the data you want.
If the audio loopback is implemented in hardware, or the volume control is implemented in software, the the audio data you get back has already had the volume adjustment applied.
What does your application do with the audio data it receives? The primary use case for audio loopback data is echo cancelation, where you usually WANT the volume to be applied.
I'm a beginner when it comes to programming and I wanted to do a personal project in C++ to develop my skills. The project I had in mind involves playing audio on my laptop (running Windows 10), analyzing it, and sending data to an arduino that will change the color and brightness of LED lights in sync with the audio that's playing. I would like it so that I can simply, for example, just play a song on Spotify or a music video on Youtube etc. and the program will get data from that audio stream as an input. Elsewhere I've seen programs use audio from recorded WAV files or streams from a microphone as input, but not what I have in mind. I want to use this program for parties, so using a microphone as a workaround wouldn't be ideal.
Is this even possible? And if so how should I approach this problem? Are there certain APIs I should look to or what? If the program gets audio as the input, would I still be able to play music on something like a bluetooth speaker as well? Or can it only send data to one place at a time?
My roommate who is much better at programming than me accomplished this on Mac using Swift, and while I don't have a Mac, would using Linux instead make this easier?
Modern windows has “Stereo Mix” recording device, just for that. Here’s how to enable: https://technicalustad.com/enable-stereo-mix-in-windows-10/
After that setup, in your C++ program use any recording API you want.
Here’s a sample that does what you ask for, opens a recording device, starts recording, and sends audio samples to the class provided in the argument: https://learn.microsoft.com/en-us/windows/win32/coreaudio/capturing-a-stream You probably want to trade CPU time for latency for your application, i.e. don’t Sleep for hnsActualDuration/REFTIMES_PER_MILLISEC/2, change into Sleep( 0 ) or Sleep( 1 )
In my Android app, I use Android NDK to play music by doing the following:
extract audio samples from an OGG file using the Vorbis library
process the audio samples
redirect the processed samples to the audio output using the Oboe library
In order to avoid underruns, I do the first 2 steps in a separate thread, to extract and process the sound a little bit in advance (it adds a bit of latency, but this is not a problem for my app). That solution works great on every device I've tested so far.
But for some reason, when I pair my device with a bluetooth speaker and when I play music, I have what seems to be underruns on some devices like Samsung S7 or Nokia 1 (but not on every device).
This bug looks so random to me that I don't know where to start. It acts like the bluetooth connection is using quite a lot of CPU so my app doesn't have enough resource to run properly.
Does anyone have experienced something similar? Should I do anything in my code to handle the bluetooth connection so it doesn't use CPU (for example to avoid audio resampling)?
Thanks for your help.
Android + Bluetooth audio is a world of pain. The major thing to appreciate about Bluetooth is the audio sink runs at a rate independent of other audio devices, which is why the native mediaplayer will do things like display video in accordance with whatever rate the attached audio device consumes samples, essentially slaving itself to the clock of the BT audio device. If you want to drive the speed from Android (i.e. SystemClock timebase) you'll need to use a timestretching AudioTrack. (This can be done, but driver support is erratic and overall system stability tanks).
Firstly, you want to eliminate the devices themselves being problems. Can you play the ogg files in a media player to a Bluetooth speaker from the S7 or Nokia 1 without problems? If so, it's your code!
It sounds to me like the speaker is consuming samples faster than the device is producing them, for whatever reason. Basically check your callbacks to make sure whenever the audio subsystem requests more data you are actually providing it. Be sure to drive your decoding pipeline according to the callbacks being made and not the system clock or any other assumptions about timing.
Finally, Bluetooth audio, at least A2DP, as opposed to directly streaming MP3, is going to require some processing to recompress the audio as it is sent out, but those devices should have plenty of headroom for this, maybe even special DSPs. I've done it with 1080P video playback at the same time before, and it starts to fall apart with two videos at once!
I have a Windows native desktop app (C++/Delphi), and I'm successfully using Directshow to display live video in it from a 'local' video capture device.
The next thing I want to do is display video from a 'remote' capture device, streamed over the LAN.
To stream the video, I guess I can use something like Expression Encoder or VLC, but I'm not sure what's the easiest way to receive/decode the streamed video. Inserting an ActiveX VLC or Flash player might be one option (although the licensing may be an issue then), but I was wondering if there's any way to achieve this with Directshow...
Application needs to run on XP, and the video decoding should ideally be royalty free.
Suggestions, please!
Using Directshow for receiving and displaying your video can work, but the simplicity, "openness" and performances will depend on the video format and streaming method you'll be using.
A lot of open/free source filters exist for RTSP (e.g. based on live555), but you may also find that creating your own source filter is a better fit.
The best solution won't be the same for H264 diffusion through RTP/RTSP and for MJPEG diffusion through simple UDP.
I would like to know is there any way I create an application which can intercept all the audio that is being played back on the computer, so I can process the audio (apply some effect) and then pass it further to the Windows audio subsystem?
I just had a glimpse in Vista/7 WASAPI, there is this sAPO:
http://www.microsoft.com/whdc/device/audio/sysfx.mspx
but it seems that I cannot create my sAPO and install it anywhere I like - I need a WHQL drivers for this.
Is there any universal way to do that?
I have an experience with DirectSound but I haven't seen any useful info about intercepting the audio streams.
If you're loading a custom sAPO, you're globally affecting the sound for a system. This is going to require signing. From this article:
The audio engine does not load
unsigned sAPOs into the audio
processing graph. So while you are
testing your sAPO, you must disable
the protected process for Audiodg.exe.
To disable the protected process, set
the value of the
DisableProtectedAudioDG registry key
to '1'.