I need to increase tempo of voice in sound file. (This effect will speed up the playing but leave the original tempo). Any ways to do this with c++ and Qt media library? Thanks.
Any links are excepted.
You may want to try SoundTouch.
Qt has hardly any sound processing capabilities, so nothing as advanced as what the original poster wants exists in it.
Qt cannot do what you want. Moreover, pitch-shifting is computationally intensive task involving fast Fourier transforms and then some trickery to counteract the phase shifting. Couple FFTW3 with this excellent guide, and you can do it.
Related
I need some advice on this idea that I've had for an UNI project.
I was wondering if it's possible to split an audio file into different "streams" from different audio sources.
For example, split the audio file into: engine noise, train noise, voices, different sounds that are not there all the time, etc.
I wouldn't necessarily need to do this from a programming language(although it would be ideal) but manually as well, by using some sound processing software like Sound Forge. I need to know if this is possible first, though. I know nothing about sound processing.
After the first stage is complete(separating the sounds) I want to determine if one of the processed sounds exists in another audio recording. The purpose would be sound detection. For (an ideal) example, take the car engine sound and match it against another file and determine that the audio is a recording of a car's engine or not. It doesn't need to be THAT precise, I guess detecting a sound that is not constant, like a honk! would be alright as well.
I will do the programming part, I just need some pointers on what to look for(software, math, etc). As I am no sound expert, this would really be an interesting project, if it's possible.
Thanks.
This problem of splitting sounds based on source is known in research as (Audio) Source Separation or Audio Signal Separation. If there is no more information about the sound sources or how they have been mixed, it is a Blind Source Separation problem. There are hundreds of papers on these topics.
However for the purpose of sound detection, it is not typically necessary to separate sounds at the audio level. Very often one can (and will) do detection on features computed on the mixed signal. Search literature for Acoustic Event Detection and Acoustic Event Classification.
For a introduction to the subject, check out a book like Computational Analysis of Sound Scenes and Events
It's extremely difficult to do automated source separation from a single audio stream. Your brain is uncannily good at this task, and it also benefits from a stereo signal.
For instance. voice is full of signals that aren't there all the time. Car noise has components that are quite stationary, but gear changes are outliers.
Unfortunately, there are no simple answers.
Correlate reference signals against the audio stream. Correlation can be done efficiently using FFTs. The output of the correlation calculation can be thresholded and 'debounced' in time for signal identification.
My commercial embedded C++ Linux project requires playing wav files and tones at individual volume levels concurrently. A few examples of the sounds:
• “Click” sounds each time user presses screen played at a user-specified volume
• Warning sounds played at max-volume
• Warning tones requested by other applications at app-specified volume level (0-100%)
• Future support for MP3 player and/or video playback (with sound) at user-specified volume. All other sounds should continue while song/video is playing.
We're using Qt as our UI framework which has QtMultimedia and Phonon support. However, I heard the former has spotty sound support on Linux and the latter is an older version and may be deprecated in an upcoming Qt release.
I've done some research and here are a few APIs I've come across:
KDE Phonon
SFML
PortAudio
SDL_Mixer
OpenAL Soft
FMOD (though I'd prefer to avoid license fees)
ALSA (perhaps a bit too low-level...)
Other considerations:
Cross-platform isn't required but preferred. We'd like to limit dependencies as much as possible. There is no need for advanced features like 3D audio or special effects in the foreseeable future. My team doesn't have much audio experience so ease-of-use is important.
Are any of these overkill for my application? Which seems like the best fit?
Update:
It turns out we were already dependent on SDL for other reasons so we decided on SDL_Mixer. For other Embedded applications, however, I'd take a long at the PortAudio/libsndfile combo as well due to their minimal dependencies.
libao is simple, cross-platform, Xiphy goodness.
There's documentation too!
Usage is outlined here - simple usage goes like this:
Initialize (ao_initialize())
Call ao_open_live() or ao_open_file()
Play sound using ao_play()
Close device/file using ao_close() and then ao_shutdown() to clean up.
Go for PortAudio. For just plain audio without unneeded overhead such as complex streaming pipelines, or 3D, it is the best lib out there. In addition you have really nice cross-platform support. It is used by several professional audio programs and has really high quality.
i have used SDL_Mixer time and time again, lovely library, it should serve well for your needs, the license is flexible and its heavily documented. i have also experimented with SFML, while more modern and fairly documented, i find it a bit bulky and cumbersome to work with even tho both libraries are very similar. imo SDL_Mixer is the best.
however you might also want to check out this one i found a few weeks ago http://www.mpg123.de/, i haven't delved too much into it, but it is very lightweight and again the license is flexible.
There is a sound library called STK that would meet most of your requirements:
https://ccrma.stanford.edu/software/stk/faq.html
Don't forget about:
FFmpeg: is a complete, cross-platform solution to record, convert and stream audio and video.
GStreamer: is a library for constructing graphs of media-handling components. The applications it supports range from simple Ogg/Vorbis playback, audio/video streaming to complex audio (mixing) and video (non-linear editing) processing.
I want to develop an application that would take audio(.wav) as input and display its real time simultaneous frequency spectrum . From what i have looked upon the subject , this requires fourier transform of the waves . Can someone suggest where i should start with ? Possible references and books . I want to learn the details of the implementations of realtime frequency spetrum rather than the development of GUI which i am quite familiar with(in C# and in C++).
There are already many libraries to do FFTs for you. No reason to reinvent the wheel. DirectX has an implementation but it might only be in the most recent version. Here's an open source C library for it.
If you want to understand the math behind it, here's a simple explanation and here's a complicated explanation.
You should begin with opening the wav file, extracting the audio stream and decoding it. There are 3rd party libraries to help on this operation.
Take a look at FFTW.
As far as books go, the classic text book on signal processing is Oppenheim and Schafer's Digital Signal Processing. Its college level but it is quite through. You do need some knowledge of calculus in places.
One should understand a bit of the theory before going off and implementing an application to display something. Here are some free online resources on digital signal processing, which is the basis for understanding FFTs and frequency spectrums, and maybe how not to misuse them.
http://www.dspguide.com/pdfbook.htm
http://www.bores.com/courses/intro/index.htm
http://ccrma.stanford.edu/courses/320/Welcome.html
http://yehar.com/blog/?p=121/
Can you recommend a powerful audio lib?
I need it to timestrech & pitchshift independently, as well as give me full access to the raw audio data and let me stream bytes into its pipeline.
Other effects like eq, filtering, distortion are a plus.
Needs to be accessible from C++ / Linux.
Maybe gstreamer, xine or mplayer would work? Or what would you suggest.
I think FMod is widely recognized as one of the most powerful audio engine available for free until you do something commercial with it, and cross-platform, like in console-mac-pc cross-platform.
Now, OpenAL is worth giving a try.
OpenAL, PulseAudio, JACK, and Phonon, I believe, each have these features in some form.
If you willing to pay for it Miles is very nice. I can't recommend FMOD for much outside of hobby projects. It's had some truly nasty bugs, and I've seen new versions introduce as many as they fix.
I've used soundtouch in the past. Focused on changing speed/pitch/etc.
ALSA looks like the big one.
JACK for Linux also looks promising.
When I started using SoundEngine (from CrashLanding and TouchFighter), I had read about a few people recommending not to use it, for it was, according to them, not stable enough. Still it was the only solution I knew of to play sounds with pitch and position control without learning C++ and OpenAL, so I ignored the warnings and went on with it.
But now I'm starting to worry. The 2.2 SDK introduced AVFoundation. Using both SoundEngine from CrashLanding (for sounds) and AVAudioPlayer (for music), I found out SoundEngine behaves strangely when the only existing AVAudioPlayer is released (all sounds stop until a new AVAudioPlayer is initiated). Around the same time as the 2.2 SDK came out, the CrashLanding sample code was mysteriously removed from the ADC site. I'm worried there are more bad surprises to come.
My question is, is anyone aware of an Open Source alternative to SoundEngine? Maybe even a C++ library that uses OpenAL?
Look at this library, but i don't know is this what you need.
The Kowalski project provides a data driven and portable sound engine that currently runs on iOS, OS X and Windows. The engine is released under the zlib license and provides positional audio, pitch control etc.
ObjectAL for iPhone
Clone it. Use it. Love it. Enjoy the freedom.
Why not just use AVFoundation? It's pretty simple to handle and nicely flexible - apart from if you need exact timing (says the Apple documentation - but I've been testing it fairly extensively and yet to find any significant practical issues) I don't see any reason for not leveraging it.
AVFoundation lacks sound placement. This makes me sad.
I’ve written a simple sound engine around OpenAL. There are no position controls (I didn’t need them), but they would be trivial to add if you find the rest to your liking. And there is also some experimental sound code in the Cocos2D engine. It has both pitch and position controls and looks quite usable.