I am doing a speech processing project with a Raspberry Pi 3 (running Raspbian) using a USB Microphone. I can see the Microphone show up as a selectable audio device for the Pi and it produces/captures sound perfectly.
I cannot figure out how to use this in my code; I have done a ton of research regarding this and have found some tutorials but nothing that making sense. I come from more of a hardware background and have done something like this with controllers where I hook up an actual Mic and process the Analog Signal into Digital on IO Pins; I am so frustrated with this that I am about to pump data over from an Arduino using a Mic and A2D conversion.
-------------------------------------------------------My questions----------------------------------------------------
1) I want to know how to access a USB data stream or USB device in C or C++. My Linux abilities are not the best. Do I open a Serial Connection or open a filestream in "/dev/USB/...."? Can you provide a code example?
2) Regardless of the fidelity of the USB Mic Input, I want to know how to access its Input in C/C++. I have been looking at ALSA but cannot really understand a lot of its complexity. Is there something that gives me access to a raw input signal on the USB Port that I can process ( where I extrapolate out frequency, amplitude, etc.)?
I have already gone through a lot of the similar posts on here. I am really stuck on this one. I 'm really looking to understand what is going on from the OS perspective; I'll use a library given but I want to understand how it works.
Thanks!
So an update:
So I did all of my code in C with some .sh scripts. I went ahead and figured out how to use the Alsa asoundlib (asound.h specifically). As of now, I am able to generate and record sound via USB Mic/Headset with my Pi 3. Doing so is rather arduous but here is a useful link (1).
For my project, I also found a CMU tutorial/repos for their PocketSphinx Audio Recognition Device at link (2) and video link (3). This project utilizes the Alsa asoundlib as well and was a great help for me. It takes a while to download and you need to crawl through its .sh scripts to figure out its gcc linking. But I am able to now give audio cues which are interpreted by my Pi 3 and pushed to speaker output and GPIO pins.
LINKS(1)http://www.alsa-project.org/alsa-doc/alsa-lib/_2test_2pcm_8c-example.html
(2)https://wolfpaulus.com/embedded/raspberrypi2-sr/
(3)https://www.youtube.com/watch?v=5kp5qpwVh_8
As a guitarist I have always wanted to develop my own recording, mixing software. I have some experience in Direct Sound, Windows Multimedia (waveOutOpen, etc). I realise that this will be a complex project, but is purely for my own use and learning, i.e. no deadlines! I intend to use C++ but as yet am unsure as the best SDK/API to use. I want the software to be extensible as I may wish to add effects in the future. A few prerequisites...
To run on Windows XP
Minimal latency
VU meter (on all tracks)
This caused me to shy away from Direct Sound as there doesn't appear to be a way to read audio data from the primary buffer.
Overdubbing (i.e. record a new track whilst playing existing tracks).
Include a metronome
My initial thoughts are to use WMM and use the waveOutWrite function to play audio data. I guess this is essentially an audio streaming player. To try and keep things simpler, I will hard-code the sample rate to 16-bit, 44.1kHZ (the best sampling rate my sound card supports). What I need are some ideas, guidance on an overall architecture.
For example, assume my tempo is 60 BPM and time signature is 4/4. I want the metronome to play a click at the start of every bar/measure. Now assume that I have recorded a rhythm track. Upon playback I need to orchestrate (pun intended) what data is sent to the primary sound buffer. I may also, at some point, want to add instruments, drums (mainly). Again, I need to know how to send the correct audio data, at the correct time to the primary audio buffer. I appreciate timing is key here. What I am unsure of is how to grab correct data from individual tracks to send to the primary sound buffer.
My initial thoughts are to have a timing thread which periodically asks each track, "I need data to cover N milliseconds of play". Where N depends upon the primary buffer size.
I appreciate that this is a complex question, I just need some guidance as to how I might approach some of the above problems.
An additional question is WMM or DirectSound better suited for my needs. Maybe even ASIO? However, the main question is how, using a streaming mechanism, do I gather the correct track data (from multiple tracks) to send to a primary buffer, and keep minimal latency?
Any help is appreciated,
Many thanks
Karl
Thanks for the responses. However, my main question is how to time all of this, to ensure that each track writes appropriate data to the primary buffer, at the correct time. I am of course open to (free) libraries that will help me achieve my main goals.
As you intend to support XP (which I would not recommend, as even the extended support will end next year) you really have no choice but to use ASIO. The appropriate SDK can be downloaded from Steinberg. In Windows Vista and above WASAPI Exclusive Mode might be a better option due to wider availability, however the documentation is severely lacking IMO. In any case, you should have a look at PortAudio which helps wrap these APIs (and unlike Juce is free.
Neither WMM nor DirectSound nor XAudio 2 will be able to achieve sufficiently low latencies for realtime monitoring. Low-latency APIs usually periodically call a callback for each block of data.
As every callback processes a given number of samples, you can calculate the time from the sample rate and a sample counter (simply accumulate across callback calls). Tip: do not accumulate with floating point. That way lies madness. Use a 64 bit sample counter, as the smallest increment is always 1./sampleRate.
Effectively your callback function would (for each track) call a getSamples(size_t n, float* out) (or similar) method and sum up the results (i.e. mix them). Each individual track could would then have an integrated sample time to compute what is currently required. For periodic things (infinite waves, loops, metronomes) you can easily calculate the number of samples per period and have a modulo counter. That would lead to rounded periods, but as mentioned before, floating point accumulators are a no-no, they can work ok for periodic signals though.
In the case of the metronome example you might have a waveform "click.wav" with n samples and a period of m samples. Your counter periodically goes from 0 to m-1 and as long as the counter is less than n you play the corresponding sample of your waveform. For example a simple metronome that plays a click each beat could look something like this:
class Metronome
{
std::vector<float> waveform;
size_t counter, period;
public:
Metronome(std::vector<float> const & waveform, float bpm, float sampleRate) : waveform(waveform), counter(0)
{
float secondsPerBeat = 60.f/bpm; // bpm/60 = bps
float samplesPerBeat = sampleRate * secondsPerBeat;
period = (size_t)round(samplesPerBeat);
}
void getSamples(size_t n, float* out)
{
while(n--)
{
*out++ = counter < waveform.size() ? waveform[counter] : 0.f;
counter += 1;
counter -= counter >= period ? period : 0;
}
}
};
Furthermore you could check the internet for VST/AU Plugin programming tutorials, as these have the same "problem" of determining time from the number of samples.
As you've discovered, you are entering a world of pain. If you're really building audio software for Windows XP and expect low latency, you'll definitely want to avoid any audio API provided by the operating system, and do as almost all commercial software does and use ASIO. Whilst things got better, ASIO isn't going anyway any time soon.
To ease you pain considerably, I would recommend having a look at Juce, which is a cross-platform framework for building both audio host software and plugins. It's been used to build many commercial products.
They've got many of the really nasty architectural hazards covered, and it comes with examples of both host applications and plug-ins to play with.
Is it possible to change the tempo of a MIDI or WAV/MP3 file using FMOD? I am using C++ alongside FMOD and cannot seem to find a function which will let me control the tempo of an audio file from variables received in the C part of the application. I am using audio that I have written myself, so I'm going to make the tempo of all tracks the same so I don't need to worry about using/writing a function to calculate the bpm of anything.
To change playback speed you can use Channel::setFrequency, however this will affect the pitch also. You can then use an FMOD Pitch Shifter DSP to correct the pitch difference. This will work for any sound type in FMOD.
For MIDI you could try Sound::setMusicSpeed, this will control the MIDI speed directly without needing to use the DSP.
I am in need for some assistance/guidance with using Core Audio to extract floats from the sound out device. I have read similar posts regarding the extraction of floats from AIFF. My end goal is something along the lines of:
iTunes is playing a song
C/C++ program using Core Audio
extracts float values from the sound
device (in real-time)
Use resulting float vector to perform
Fourier Transformation on a array of
floats (probably using vDSP from
Apple's Accelerate Library) - This part I have somewhat figured out :)
Note: I am developing on Mac OS X (10.6+).
Any help will be much appreciated.
This question comes up frequently on the Core Audio mailing list. There is no easy way to accomplish what you want to do. See:
http://lists.apple.com/archives/coreaudio-api/2007/Jul/msg00066.html
http://lists.apple.com/archives/coreaudio-api/2009/Nov/msg00308.html
You'll need to write either a kext or a user-land driver.
I want to get sound level, so I can display it in my SDL application (the platform is Linux) when recording sound. How can I do that? I use FMOD API in my app, but for recording, I'm using SoX (forking and using exec() to set it up - probably this could be done better but I don't know how :( ). Should I use some function of SoX, FMOD API, or maybe directly access /dev/dsp to get sound data?
You can do recording in FMOD if you like. FMOD APIs such as System::recordStart and System::getRecordDriverInfo can be used. FMOD ships examples of recording which you can use as a basis for your solution.
Specifically for getting the sound level, if you wanted to do it as a runtime thing you could use Channel::getWaveData which will give you a snapshot of the current playing audio, for this you would need to play the recording data.
Or alternatively you could use Sound::lock / Sound::unlock to get access to the recording sound data if it isn't playing.
Once you have access to the sound data through either method you can read through the values to get sound level / peak information.
No, at the very least you should use the "safe" ALSA API. But you should consider using something higher up such as Gstreamer or PulseAudio.