I'm looking for a C or C++ API that will give me real-time spectrum analysis of a waveform on Windows.
I'm not entirely sure how large a sample window it should need to determine frequency content, but the smaller the better. For example, if it can work with a 0.5 second long sample and determine frequency content to the Hz, that would be wicked-awesome.
I used FFTW a few years ago. It is supposedly fast (though I didn't use it for anything real-time myself) and was certainly pretty easy to use, even on Windows.
Regarding the window size, see the Nyquist-Shannon sampling theorem.
(I imagine there are other issues involved when using a window on the data, particularly for low frequencies, but I'm no expert and I couldn't find any useful-looking info about this, so maybe I'm wrong.)
For details of how to generate a power spectrum and how to determine frequency resolution of same, please see my answer to this question: How to extract semi-precise frequencies from a WAV file using Fourier Transforms
Related
Can I find a way to get frequency of each frame on a audio file like .mp3 or .wav or any other sound format using "fmod" or "cwave" libraries or even other libraries?
How can I find out this frequency in C/C++?
The FFTW library is a set of very fast implementations of different fourier transformations.
If you have a number of samples of digitized audio, you pretty much have, in total, as many frequencies and phases as you've got samples. Suppose you've got just two samples of audio. In order to faithfully represent them, you need one frequency and one phase -- so again, two values. There is no "single" frequency to represent multiple samples of digitized audio.
You can of course, akin to the question of "How can I get the color of a specific video frame?", ask what is the average frequency. Or you can ask what is the most prominent frequency (the one with highest amplitude). Or you can ask what is the frequency that with its harmonics carries the most energy in the signal (assuming the signal was physical, like electrical current sampled in time).
In all those cases, you'd probably want to use a premade library that internally uses the FFT or a similar discrete transform to get the signal from the time domain to a frequency or a similar domain (quefrency domain, for example, and it's not a typo). It's hard to get what you want from a plain FFT, you'd need some mathematical training to process raw FFT results into what you're after. I'm sure there are libraries for it, I just can't think of any right now. Perhaps someone who deals with such work can edit the answer.
I am a little overwhelmed by my task at hand. We have a toolkit which we use for TWAIN scanning. Some of our customers are complaining about slower scan speeds when the deskew option is set. This is because if their scanner does not support a hardware deskew, it is done in post-processing on the CPU. I was wondering if anyone knows of a good (i.e. fast) algorithm to achieve this. It is hard for me to say what algorithm we are using now. What algorithms are out there for this, and how do they rank as far as speed/accuracy? If I knew the names of the algorithms, it could be easier for me to do a google search on them.
Thank You.
-Tom
Are you scanning in Color or B/W ?
Deskew is processor intensive. A Group4 tiff or JPEG must be decompressed, skew angle determined, deskewed and then compressed.
There are many image processing algorithms out there with deskew and I have evaluated many over the years. There are some huge differences in processing speed between the different libraries and a lot of it comes down to how well it is coded rather than the algorithm used. There is a huge difference in commercial libraries just reading and writing images.
The fastest commerical deskew I have used by far comes from Unisoft Imaging (www.unisoftimaging.com). I assume much of it is written in assembler. Unisoft has been around for many years and is very fast and efficient. It supports different many different deskew options including black border removal, color and B/W deskew. The Group4 routines are very solid and very fast. The library comes with many other image processing options as well as TWAIN and native SCSI scanner support. It also supports Unix.
If you want a free deskew then you might want to have a look at Leptonica. It does not come with too much documentation but is very stable and well written. http://www.leptonica.com/
Developing code from scratch could be quite time consuming and may be quite buggy and prone to errors.
The other option is to process the document in a separate process so that scanning can run at the speed of the scanner. At the moment you are probably processing everything in a parallel fashion, one task after another, hence the slowdown.
Consider doing it as post-processing, because deskew cannot be done at real-time (unless it's hardware accelerated).
Deskew consists of two steps: skew detection and rotation. Detecting the skew angle can usually be done on a B&W (1-bit) image faster. Rotation speed depends on the quality of the interpolation. A good quality deskew will take a lot of time to run, much more than scanning pages.
A good high speed scanner can do 120 double-sided pages per minute, if it has hardware JPEG or TIFF Group 4 compression, and your TWAIN library takes advantage of it (hint: do not use native mode). You barely have enough time to save the file to the hard drive at that speed, let alone decompress, skew detect, rotate, re-compress. Quality deskew takes several seconds per page, unless you can use the video card's hardware accelerator to rotate and compress.
Do I correctly understand you already have such algorithm implemented? If so, are you sure there is no space for optimization? I'd start with profiling existing solution.
Anyway, I guess you should look for fast digital Radon transform algorithm.
Take a look at http://pagetools.sourceforge.net. They have deskew algorithm implementation.
Im trying to do a screen-flashing application, that flashes the screen according to the music(which will be frequencies, such as healing frequencies, etc...).
I already made the player and know how will I make the screen flash, but I need to make the screen flash super fast according to the music, for example if the music speeds up, the screen-flash will flash faster. I understand that I would achieve this by FFT or DSP(as I only need to know when the frequency raises from some Hz, lets say 20 to change the color, making the screen-flash).
But I've found that I understand NOTHING, even less try to implement it to my application.
Can somebody help me out to learn whichever both of them? My email is sismetic_chaos#hotmail.com. I really need help, I've been stucked for like 3 days not coding or doing anything at all, trying to understand, but I dont.
PS:My application is written in C++ and Qt.
PS:Thanks for taking the time to read this and the willingness to help.
Edit: Thanks to all for the answers, the problem is in no way solved yet, but I appreciate all the answers, I didnt thought I would get so many answers and info. Thanks to you all.
This is a difficult problem, requiring more than an FFT. I'll briefly describe how I implemented beat detection when I was writing software for professional DJ equipment.
First of all, you'll need to cut down the amount of data you're dealing with, since there are only two or three beats per second, but tens of thousands of samples. You'll also need to look at different frequency ranges, since some types of music carry the tempo in the bassline, and others in percussion or other instruments. So pass the signal through several band-pass filters (I chose 8 filters, each covering one octave, from low bass to high treble), and then downsample each band by averaging the power over a few hundred samples.
Every few seconds, you'll have a thousand or so samples in each band. Your next tool is an autocorrelation, to identify repetitive patterns in the music. The peaks of the autocorrelation tell you what the beat is more or less likely to be; but you'll need to make up some heuristics to compare all the frequency bands to find a beat that you can be confident in, and to avoid misleading syncopations. If you can manage that, then you'll have a reasonable guess at the tempo, but no idea of the phase (i.e. exactly when to flash the screen).
Now you can look at the a smoothed version of the audio data for peaks, some of which are likely to correspond to beats. Initially, look for the strongest peak over the course of a few seconds and take that as a downbeat. In conjunction with the tempo you estimated in the first stage, you can predict when the next beat is due, and measure where you actually saw something like a beat, and adjust your estimate to more closely match the data. You can also maintain a confidence level based on how well the predicted beats match the measured peaks; if that drops too low, then restart the beat detection from scratch.
There are a lot of fiddly details to this, and it took me some weeks to get it working nicely. It is a difficult problem.
Or for a simple visualisation effect, you could simply detect peaks and flash the screen for each one; it will probably look good enough.
The output of a FFT will give you the frequency spectrum of an audio sample, but extracting the tempo from the FFT output is probably not the way you want to go.
One thing you can do is to use peak detection to identify the volume "spikes" that typically occur on the "down-beats" of the music. If you can identify the down-beats, then you can use a resource like bpmdatabase.com to find the tempo of the song. The tempo will tell you how fast to flash and the peaks you detected will tell you when to start flashing. Have your app monitor your flashes to make sure that they generally occur at the same time as a peak (if the two start to diverge, then the tempo may have changed mid-song).
That may sound straightforward, but this is actually a very non-trivial thing to implement. You might want to read this SO question for more information. There are some quality links in the answers there.
If I'm completely mis-interpreting what you are trying to do and you need to do FFTs for something different, then you might want to look at using one of the existing FFT libraries to do the heavy lifting for you. Some examples are FFTW and KissFFT.
It sounds like maybe you're trying to get your visualizer to flash the screen in time with the
music somehow. I don't think calculating the FFT is going to help you here. At any
given instant, there will be many simultaneous frequency components, all over the audio spectrum (roughly 20 Hz to 20 kHz). But you're likely to be a lot more interested in the
musical tempo (beats per minute -- more like 5 Hz or below), and that's not going to show
up anywhere in an FFT of the raw audio signal.
You probably need something much simpler -- some sort of real-time peak detection.
Whenever you see a peak greater than some threshold above the average volume,
make your screen flash.
Of course, more complicated visualizations might well take advantage of the FFT,
but not the one you're describing.
My recommendation would be to find a library that does this for you. Unless you have a lot of mathematics to back you up, I think you will be wasting a ton of your time trying to learn FFTs when all you really want out is some sort of 'base hits per minute' number out which you can adjust your graphics to accordingly.
Check out this similar post:
here
It took me about three weeks to understand the mathematics behind FFTs and then another week to write something in Matlab using those concepts. If you are discouraged after three days, don't try and roll your own.
I hope this is helpful advice and not discouraging.
-Brian J. Stinar-
As previous answers have noted, an FFT is probably not the tool you need in order to solve your problem, which requires tempo detection rather than spectral analysis.
For an example of what can be done using FFT - and of how a particular FFT implementation was integrated into a Qt application, take a look at this blog post which describes the spectrum analyzer demo I developed. Code for the demo is shipped with Qt itself, in the demos/spectrum directory.
I'm writing a file compressor utility in C++ that I want support for PCM WAV files, however I want to keep it in PCM encoding and just convert it to a lower sample rate and change it from stereo to mono if applicable to yield a lower file size.
I understand the WAV file header, however I have no experience or knowledge of how the actual sound data works. So my question is, would it be relatively easy to programmatically manipulate the "data" sub-chunk in a WAV file to convert it to another sample rate and change the channel number, or would I be much better off using an existing library for it? If it is, then how would it be done? Thanks in advance.
PCM merely means that the value of the original signal is sampled at equidistant points in time.
For stereo, there are two sequences of these values. To convert them to mono, you merely take piecewise average of the two sequences.
Resampling the signal at lower sampling rate is a little bit more tricky -- you have to filter out high frequencies from the signal so as to prevent alias (spurious low-frequency signal) from being created.
I agree with avakar and nico, but I'd like to add a little more explanation. Lowering the sample rate of PCM audio is not trivial unless two things are true:
Your signal only contains significant frequencies lower than 1/2 the new sampling rate (Nyquist rate). In this case you do not need an anti-aliasing filter.
You are downsampling by an integer value. In this case, downampling by N just requires keeping every Nth sample and dropping the rest.
If these are true, you can just drop samples at a regular interval to downsample. However, they are both probably not true if you're dealing with anything other than a synthetic signal.
To address problem one, you will have to filter the audio samples with a low-pass filter to make sure the resulting signal only contains frequency content up to 1/2 the new sampling rate. If this is not done, high frequencies will not be accurately represented and will alias back into the frequencies that can be properly represented, causing major distortion. Check out the critical frequency section of this wikipedia article for an explanation of aliasing. Specifically, see figure 7 that shows 3 different signals that are indistinguishable by just the samples because the sampling rate is too low.
Addressing problem two can be done in multiple ways. Sometimes it is performed in two steps: an upsample followed by a downsample, therefore achieving rational change in the sampling rate. It may also be done using interpolation or other techniques. Basically the problem that must be solved is that the samples of the new signal do not line up in time with samples of the original signal.
As you can see, resampling audio can be quite involved, so I would take nico's advice and use an existing library. Getting the filter step right will require you to learn a lot about signal processing and frequency analysis. You won't have to be an expert, but it will take some time.
I don't think there's really the need of reinventing the wheel (unless you want to do it for your personal learning).
For instance you can try to use libsnd
Problem
Windows Mobile / Directdraw: Rotate video stream
The video preview is working, all I need now is a way to rotate the image. I think the only way to handle this is to write a custom filter based on CTransformFilter that will rotate the camera image for you. If you can help me to solve this problem, e.g. by helping me to develop this filter with my limited DirectDraw knowledge, the bounty is yours.
Background / Previous question
I'm currently developing an application for a mobile device (HTC HD2, Windows Mobile 6). One of things the program needs to do is to take pictures using the built-in camera. Previously I did this with the CameraCaptureDialog offered by the Windows Mobile 6 SDK, but our customer wants a more user-friendly solution.
The idea is to preview the camera's video stream in a control and take a high resolution picture (>= 2 megapixels) using the camera's photo function, when the control is clicked. We did some research on the topic and found out the best way to accomplish this seems to be using Direct Draw.
The downsides are that I never really used any native windows API and that my C++ is rather bad. In addition to this I read somewhere that the Direct Draw support of HTC phones is particularity bad and you will have to use undocumented native HTC libraries calls to take high quality pictures.
The good news is that a company offered us to develop a control that meets the specifications stated above. They estimated it would take them about 10 days, which lead to the discussion if we could develop this control ourself within a reasonable amount of time.
It's now my job to research which alternative is better. Needless to say it's far too less time to study the whole architecture and develop a demo, which lead me to the following questions:
Questions no longer relevant!
Does any of you have experience with similar projects? What are your recommendations?
Is there a good Direct Draw source code example that deals with video preview and image capturing?
Well if you look at the EZRGB24 sample you get the basics of a simple video transform filter.
There are 2 things you need to do to the sample to get it to do what you want.
1) You need to copy x,y to y,x.
2) You need to tell the media sample that the sample is now Height x Width instead of Width x Height.
Bear in mind that the final image will have exactly the same number of pixels.
To solve 1 is relatively simple. You can calculate the position of a pixel by doing "x + (y * Width)". So you step through each x and y calculate the position that way and then write it to "y + (x * Height)". This will transpose the image. Of course without step2 this will look completely wrong.
To solve 2 you need to get the AM_MEDIA_TYPE of the input sample. You then need to find out what the formatType is (Probably FormatType_VideoInfo or FormatType_VideoInfo2). You can thus cast the pbFormat member of AM_MEDIA_TYPE to either a VIDEOINFOHEADER or a VIDEOINFOHEADER2 (Depending on the FormatType). You need to now set VIDEOINFOHEADER[2]::bmiHeader.biWidth and biHeight to the biHeight and biWidth (respectively) of the input media sample. Everything else should be the same as the input AM_MEDIA_TYPE.
I hope that helps a bit.
This question will help you get some details about DirectDraw. I did some research about this some time ago and the best I could find was this blog post (also mentioned in the above question). The post presents an extension of the CameraCapture sample in the SDK.
However, don't have high expectations. It seems that the preview and the picture taken will only work in small resolution. Although DirectDraw does describe a way of configuring the resolution, there is no guarantee that this will be properly implemented by the driver.
So from my experience what you have read is true. The only way to do it will be to use HTC drivers. So, if you don't want to spend endless days in reverse engineering for a doubtful result, let someone else do the job for you. If you want to give it a shot, try xda-developers forum.