Can I directly interact with audio endpoints on windows? - c++

I am trying to write a pro music/audio processing application, and I would like to be able to interact with the audio inputs/outputs at a very low level - ideally something allowing me to apply effects to the audio inputs and output this in real-time, similar to programs like Logic, Ableton etc.
I have written a pretty basic program that detects audio endpoint devices and can change their volumes using the MMDevice interface, but this is nowhere near the functionality I would like.
I have learned from the Microsoft docs that the four core-audio APIs are:
MMDevice
WASAPI
DeviceTopology
EndpointVolume
but it doesn't seem like any of these have the capabilities that I need. I'm thinking that I will need to be able to interact with the speakers at the level of setting the position of the membrane at a given time.
Is this even possible? If so, what can I use to do this?

The Windows Audio Session API (WASAPI) is the best bet for this purpose. It allows interaction with audio endpoints and setting up audio streams (which are streams of data that you can send or receive in real time). A good example is here.

Related

Windows Audio Driver vs. WASAPI

I'm right now reading the microsoft documentation about drivers and core audio apis. At the moment I'm still confuse which way to go to achieve what I need.
I have an audio application which is Standalone and coded with framework JUCE in C++. And I need to build a Windows solution that would capture the audio stream that is going to an audio endpoint device to use it as an input of my audio application.
This stream must have an unaltered volume: always 1.0 (no matter if the hardware volume is changed or muted).
I must be able to choose between the different endpoint devices, for exemple if I have an external soundcard that is plugged, my audio application should be able to intercept and copy the stream that is going to that external soundcard, or do the same for the stream that is going to the built-in speakers.
The idea is to capture the output streams before they are modified by hardware volume modifications, and make a copy of them routed to my application without changing the output routing and behaviour.
The microsoft documentation is very furnished, but even if the WASAPI provides a lot of ways to capture and stream from audio endpoint devices, I'm not sure it is possible to get an unaltered volume, as it will always capture what's exactly coming out of the speakers.
This is why I don't know If I can implement a feature directly in my audio application that will get the streams I want with WASAPIs or if I have to code a proper Audio Driver that would make a copy of the streams I want for my application to be able to use these streams.
The documentations I refer to:
Audio Drivers design guide
Core Audio APIs / WASAPI
Thanks for the help,
Best,
Maxime
Sometimes the volume control is implemented in software, and sometimes it is implemented in hardware. You can call IAudioEndpointVolume::QueryHardwareSupport to see if the volume control for the audio endpoint you're working with is implemented in hardware or software.
Sometimes the audio loopback is implemented in software, and sometimes it is implemented in hardware. There is no API to tell which.
If the audio loopback is implemented in software, and the volume control is implemented in hardware, then you will get back the data you want.
If the audio loopback is implemented in hardware, or the volume control is implemented in software, the the audio data you get back has already had the volume adjustment applied.
What does your application do with the audio data it receives? The primary use case for audio loopback data is echo cancelation, where you usually WANT the volume to be applied.

capture audio from specific program [duplicate]

I am taking my first dives in to the WASAPI system of windows and I do not know if what I want is even possible with the windows API.
I am attempting to write program that will record the sound from various programs and break each in to a separate recorded track/audio file. From the reseacrch I have done I know the unit I need to record is the various audio sessions being rendered to a endpoint, and the normal way of recording is by taking the render endpoint and performing a loopback. However from what I have read so far in the MSDN the only interaction with sessions I can do is through IAudioSessionControl and that does not provide me with a way to get a copy of the stream for the session.
Am I missing something that would allow me to do this with the WASAPI (or some other windows API) and get the individual sessions (or individual streams) before they are mixed together to form the endpoint or is this a imposable goal?
The mixing takes place inside the API (WASAPI) and you don't have access to buffers of other audio clients, esp. that they don't exist in the context of the current process in first place. Perhaps one's best (not so good, but there are no better alternatives) way would be to hook the API calls and intercept data on its way to WASAPI, if the task in question permits dirty tricks like this.

Create audio buffer from application's audio interface

Using PortAudio, how can I access running applications' audio interface so that I can capture the audio they produce in real time? The goal would be then to send this audio as UDP packet to a server.
I've had a look at PortAudio's code samples but can't find anything similar.
Maybe PortAudio is not the right library for me?
I'm working mainly on Mac OS.
Core Audio does not have the sort of functionality you're looking for. Processes are sandboxed/isolated from one another.
You could probably achieve this using library injection, but there are a number of complications. OSX has added System Integrity Protection which disables injections. If you're willing to disable SIP (which is dangerous! Proceed at your own risk!) then you could try something like mach_inject and intercepting the target processes' calls to Core Audio. But you'd never be able to ship something like this, since asking users to disable SIP is not reasonable.

Kernel Streaming User Mode driver

I want to write a Kernel Streaming driver (that operates in user mode). I've been doing some research and it looks like for USB devices I'll want to use AVStream and for other devices I'll use Portcls. I can communicate with AVStream (I think) through some of the functionality defined in the KSProxy. However, I have not found a way to communicate with Portcls. Can anybody offer some insight?
Also I don't know a whole lot about Kernel Streaming. I just know what I have researched so any advice would be helpful.
The first functionality I want to implement is device management. I want to be able to manage devices (ex. get devices properties and take control of devices). I'm not sure of the best way to do this. Also, I want to be able to choose which device I use for audio streaming.
Also eventually I want to be able to support playback and recording but I'm sure by the time I'm ready to implement that I will know the path to take.

Is there a simple and direct way of using audio as an output for a program?

I want to try some C and C++ programming with audio processing, such as synthesizers, chorus, delay etc, but I only know working with a console as output. I wish to have, instead of a console application, a window that would be capable of sending an audio signal to the speakers, running the code at the background and working with it in a similar way as it goes with printf: every time I would call the "output function", it would send to the speakers (or sound card) a sample value, indicating current oscilator position. This output operation could be executed every time it is requested or in the end of a built in loop. Doing all this with a high sample rate would be just great.
I think I could do all this using AudioWorker on Web Audio API, plus a flexible GUI on HTML5 canvas, but I'm new at this API and I'm not sure whether its resulting sound quality is good enough.
Thanks in advance.
Edit: I use Windows 8.1, but any answer for other platform is welcome.
Edit2: Any programming languages other than C, C++ or JavaScript suggestions are also welcome.
I do lots of sound synthesis with the Web Audio API and I think it sounds great. Javascript is really all you need. Well, if you want to use audio files, you need a web server to serve those audio files, but the audio synthesis all happens in javascript.
It doesn't matter so much what OS you use, but different browsers have different levels of support for the web audio API. Chrome tends to have the best support, and Internet Explorer definitely has the worst support.