Detect audio output + suggestion for a speech synthesis library - c++

I would like to detect if my PC is playing any kind of audio (music/movie anything).
I wrote an app which notifies me of my twitter updates. Now I would like to add a speech synthesis to it but I only want to play out the messages if there is no music or movie playing.
Although I'm using C# I don't mind doing the detection using C++ and then integrating it.
So the questions are:
1) How can I detect audio output?
2) What is the best free speech synthesis library for windows?

After some time looking through the MSDN I have found the solution.
Using the loopback recording you can listen to what's being outputted to the audio output device.
http://msdn.microsoft.com/en-gb/library/windows/desktop/dd316551(v=vs.85).aspx
This link also refers to an example of how to capture the stream:
http://msdn.microsoft.com/en-gb/library/windows/desktop/dd370800(v=vs.85).aspx
In here you can get the buffer data as shown in the example by calling:
pCaptureClient->GetBuffer(...)
All you have to do then is to check the value of those bytes. If they are all 0s then there is nothing playin..
For the speech synthesis I used the SpeechSynthesizer .NET class
http://msdn.microsoft.com/en-us/library/system.speech.synthesis.speechsynthesizer.aspx

Related

How to send audio data playing on PC to C++ program as input

I'm a beginner when it comes to programming and I wanted to do a personal project in C++ to develop my skills. The project I had in mind involves playing audio on my laptop (running Windows 10), analyzing it, and sending data to an arduino that will change the color and brightness of LED lights in sync with the audio that's playing. I would like it so that I can simply, for example, just play a song on Spotify or a music video on Youtube etc. and the program will get data from that audio stream as an input. Elsewhere I've seen programs use audio from recorded WAV files or streams from a microphone as input, but not what I have in mind. I want to use this program for parties, so using a microphone as a workaround wouldn't be ideal.
Is this even possible? And if so how should I approach this problem? Are there certain APIs I should look to or what? If the program gets audio as the input, would I still be able to play music on something like a bluetooth speaker as well? Or can it only send data to one place at a time?
My roommate who is much better at programming than me accomplished this on Mac using Swift, and while I don't have a Mac, would using Linux instead make this easier?
Modern windows has “Stereo Mix” recording device, just for that. Here’s how to enable: https://technicalustad.com/enable-stereo-mix-in-windows-10/
After that setup, in your C++ program use any recording API you want.
Here’s a sample that does what you ask for, opens a recording device, starts recording, and sends audio samples to the class provided in the argument: https://learn.microsoft.com/en-us/windows/win32/coreaudio/capturing-a-stream You probably want to trade CPU time for latency for your application, i.e. don’t Sleep for hnsActualDuration/REFTIMES_PER_MILLISEC/2, change into Sleep( 0 ) or Sleep( 1 )

Pi 3 USB USB Microphone; accessing programmtically in C/C++

I am doing a speech processing project with a Raspberry Pi 3 (running Raspbian) using a USB Microphone. I can see the Microphone show up as a selectable audio device for the Pi and it produces/captures sound perfectly.
I cannot figure out how to use this in my code; I have done a ton of research regarding this and have found some tutorials but nothing that making sense. I come from more of a hardware background and have done something like this with controllers where I hook up an actual Mic and process the Analog Signal into Digital on IO Pins; I am so frustrated with this that I am about to pump data over from an Arduino using a Mic and A2D conversion.
-------------------------------------------------------My questions----------------------------------------------------
1) I want to know how to access a USB data stream or USB device in C or C++. My Linux abilities are not the best. Do I open a Serial Connection or open a filestream in "/dev/USB/...."? Can you provide a code example?
2) Regardless of the fidelity of the USB Mic Input, I want to know how to access its Input in C/C++. I have been looking at ALSA but cannot really understand a lot of its complexity. Is there something that gives me access to a raw input signal on the USB Port that I can process ( where I extrapolate out frequency, amplitude, etc.)?
I have already gone through a lot of the similar posts on here. I am really stuck on this one. I 'm really looking to understand what is going on from the OS perspective; I'll use a library given but I want to understand how it works.
Thanks!
So an update:
So I did all of my code in C with some .sh scripts. I went ahead and figured out how to use the Alsa asoundlib (asound.h specifically). As of now, I am able to generate and record sound via USB Mic/Headset with my Pi 3. Doing so is rather arduous but here is a useful link (1).
For my project, I also found a CMU tutorial/repos for their PocketSphinx Audio Recognition Device at link (2) and video link (3). This project utilizes the Alsa asoundlib as well and was a great help for me. It takes a while to download and you need to crawl through its .sh scripts to figure out its gcc linking. But I am able to now give audio cues which are interpreted by my Pi 3 and pushed to speaker output and GPIO pins.
LINKS(1)http://www.alsa-project.org/alsa-doc/alsa-lib/_2test_2pcm_8c-example.html
(2)https://wolfpaulus.com/embedded/raspberrypi2-sr/
(3)https://www.youtube.com/watch?v=5kp5qpwVh_8

Grabber for splitting in UWP

I need your advice. I'd like to develop the app for audio/video splitting using Metro interface.
Usually I use DirectShow for it using the follow schema: create a grabber, add it to DS graph, capture by it the audio/video streams and pass them to my AVstream drivers for splitting. But in new program I want to use Media Foundation and insert it into UWP.
How I see my new app. It must have Metro interface for common control: choice of sources, adding parameters, changing modes and etc. I'd like to use MediaCapture class for capture of streams and rendering them too. Here I don't see any problems, MSDN has many samples for it. But I have no ideas how to insert a grabber between source and render.
Which operations a grabber will do:
Receive input stream from MediaCapture.
Stream converting : YUV -> RGB, adding effects and etc.
Send output stream for rendering (MediaCapture) and to my AVstream driver for splitting with any apps (Skype, Adobe Flash Player, Edge, ....).
How to make a grabber. In MSDN I found three ways:
Sample Grabber Sink (https://msdn.microsoft.com/en-us/library/windows/desktop/hh184779(v=vs.85).aspx). No problem to receive/control/send stream in MF dll. But I don't know how to link that dll with MediaCapture?
Source Reader (https://msdn.microsoft.com/en-us/library/windows/desktop/dd940436(v=vs.85).aspx). The same problems, plus the Source Reader doesn't work for playback.
Custom MFT? Any case MediaCapture allows to connect to MFT by AddEffectAsync().
My environment: MS Windows 10, MS Visual Studio Community 2015.
Thank you for any ideas.
This question and UWP are not actual for me at all. I found the following:
"Some apps can work intensively in background, for example it maybe video converting, online financial data processing and more.
Now UWP application will suspended when it go offscreen."
https://wpdev.uservoice.com/forums/110705-universal-windows-platform/suggestions/9950598-exclude-suspending-in-desktop
So if the user minimizes the program window, then the program stops a video stream.

Record directshow audio device to file

I've stumbled through some code to enumerate my microphone devices (with some help), and am able to grab the "friendly name" and "clsid" information from each device.
I've done some tinkering with GraphEd.exe to try and figure out how I can take audio from directshow and write it to a file (I'm not currently concerned about the format, wav should be fine), and can't seem to find the right combination.
One of the articles I've read linked to this Windows SDK sample, but when I examined the code, I ended up getting pretty confused at how to use that code, ie. setting the output file, or specifying which audio capture device to use.
I also came across a codeguru article that has a nicely featured audio recorder, but it does not have an interface for selecting the audio device, and I can't seem to find where it statically picks which recording device to use.
I think I'd be most interested in figuring out how to use the Windows SDK sample, but any explanation on either of the two approaches would be fantastic.
Edit: I should mention my knowledge and ability as a win32 COM programmer is very low on the scale, so if this is easy, just explain it to me like I'm five, please.
Recording audio into file with DirectShow needs you to build the right filter graph, as you should have figured out already. The parts include:
The device itself, which you instantiate via moniker (not CLSID!), it is typically PCM format
Multiplexer component that converts streams into container format
File Writer Filter that takes file-compatible stream and writes into a file
The tricky moment is #2 since there is not standard component available. Windows SDK samples however contains the missing part - WavDest Filter Sample. Building it and making it ready for use, you can build a graph that records from device into .WAV file.
Your graph will look like this, and it's built easily programmatically as well:
I noticed that I have a variation of WavDest installed with Google Earth - for the case you have troubles building it yourself and you will be looking for prebuilt binary.
You can instruct ffmpeg to record from a directshow device, and output to a file.

Retrieve the peak of an audio stream from OpenNI

I'm attempting to grab the peak level (decibal level or whatnot) from an audio stream from a Kinect using OpenNI.
I've found these:
http://openni.org/docs2/Reference/classxn_1_1_audio_meta_data.html
http://openni.org/docs2/tutorial/smpl_audio.html
But I'm having a hard time piecing it together. I just need an integer of some sort to figure out how loud the surrounding area for the kinect is.
Thanks!
I've tried using the Audio API provided by OpenNI with no luck, not even the provided sample. I had the same question a while back.
If you're on osx/linux you can use both openni/libfreenect side by side(as admin only on linux) and libfreenect has some experimental audio support.
In the mean time I've found two other projects that look interesting:
HARK
MS Kinect-OpenNI Bridge
I haven't used both, although the second looks promising, so it should be possible to access audio via MS SDK and process data using OpenNI as well.