Programmatically convert WAV - c++

I'm writing a file compressor utility in C++ that I want support for PCM WAV files, however I want to keep it in PCM encoding and just convert it to a lower sample rate and change it from stereo to mono if applicable to yield a lower file size.
I understand the WAV file header, however I have no experience or knowledge of how the actual sound data works. So my question is, would it be relatively easy to programmatically manipulate the "data" sub-chunk in a WAV file to convert it to another sample rate and change the channel number, or would I be much better off using an existing library for it? If it is, then how would it be done? Thanks in advance.

PCM merely means that the value of the original signal is sampled at equidistant points in time.
For stereo, there are two sequences of these values. To convert them to mono, you merely take piecewise average of the two sequences.
Resampling the signal at lower sampling rate is a little bit more tricky -- you have to filter out high frequencies from the signal so as to prevent alias (spurious low-frequency signal) from being created.

I agree with avakar and nico, but I'd like to add a little more explanation. Lowering the sample rate of PCM audio is not trivial unless two things are true:
Your signal only contains significant frequencies lower than 1/2 the new sampling rate (Nyquist rate). In this case you do not need an anti-aliasing filter.
You are downsampling by an integer value. In this case, downampling by N just requires keeping every Nth sample and dropping the rest.
If these are true, you can just drop samples at a regular interval to downsample. However, they are both probably not true if you're dealing with anything other than a synthetic signal.
To address problem one, you will have to filter the audio samples with a low-pass filter to make sure the resulting signal only contains frequency content up to 1/2 the new sampling rate. If this is not done, high frequencies will not be accurately represented and will alias back into the frequencies that can be properly represented, causing major distortion. Check out the critical frequency section of this wikipedia article for an explanation of aliasing. Specifically, see figure 7 that shows 3 different signals that are indistinguishable by just the samples because the sampling rate is too low.
Addressing problem two can be done in multiple ways. Sometimes it is performed in two steps: an upsample followed by a downsample, therefore achieving rational change in the sampling rate. It may also be done using interpolation or other techniques. Basically the problem that must be solved is that the samples of the new signal do not line up in time with samples of the original signal.
As you can see, resampling audio can be quite involved, so I would take nico's advice and use an existing library. Getting the filter step right will require you to learn a lot about signal processing and frequency analysis. You won't have to be an expert, but it will take some time.

I don't think there's really the need of reinventing the wheel (unless you want to do it for your personal learning).
For instance you can try to use libsnd

Related

Audio frequency of each frame of a audio file like .mp3 .wav

Can I find a way to get frequency of each frame on a audio file like .mp3 or .wav or any other sound format using "fmod" or "cwave" libraries or even other libraries?
How can I find out this frequency in C/C++?
The FFTW library is a set of very fast implementations of different fourier transformations.
If you have a number of samples of digitized audio, you pretty much have, in total, as many frequencies and phases as you've got samples. Suppose you've got just two samples of audio. In order to faithfully represent them, you need one frequency and one phase -- so again, two values. There is no "single" frequency to represent multiple samples of digitized audio.
You can of course, akin to the question of "How can I get the color of a specific video frame?", ask what is the average frequency. Or you can ask what is the most prominent frequency (the one with highest amplitude). Or you can ask what is the frequency that with its harmonics carries the most energy in the signal (assuming the signal was physical, like electrical current sampled in time).
In all those cases, you'd probably want to use a premade library that internally uses the FFT or a similar discrete transform to get the signal from the time domain to a frequency or a similar domain (quefrency domain, for example, and it's not a typo). It's hard to get what you want from a plain FFT, you'd need some mathematical training to process raw FFT results into what you're after. I'm sure there are libraries for it, I just can't think of any right now. Perhaps someone who deals with such work can edit the answer.

How can I reduce or remove the noise created by changing the 'volume' of a sample from 16bit PCM

I'm currently working on a small project where I'm loading 16bit wave files with a sample rate of 44100Hz. In normal playback the audio seems fine but as soon as I start to play with things like amplitude size to change the volume it starts giving a little bit of static noise.
What I'm doing is getting a sample from the buffer in the case of this 16bit type a short, converting this to a float in the range of -1 to 1 to start doing mixing and other effects. In this I also change the volume, when I just multiply it by 1 giving the same output its fine but as soon as I start to change the volume I hear the static noise. It happens when going over 1.0 as well as going below 1.0. And it gets worse the bigger or smaller the scale.
Anyone an idea how to reduce or remove the noise ?
"Static", otherwise known as "clicks and pops" are the result of discontinuities in the output signal. Here is a perfect example of a discontinuity:
http://en.wikipedia.org/wiki/File:Discontinuity_jump.eps.png
If you send a buffer of audio to the system to play back, and then for the next buffer you multiply every sample by 1.1, you can create a discontinuity. For example, consider a buffer that contains a sine wave with values from [-0.5, 0.5]. You send a piece of this wave to the output device, and the last sample happens to be 0.5.
Now on your next buffer you try to adjust the volume by multiplying by 1.1. The first sample of the new buffer will be close to 0.5 (since the previous sample was 0.5). Multiply that by 1.1 and you get 0.55.
A change from one sample to the next of 0.05 will probably sound like a click or a pop. If you create enough of these, it will sound like static.
The solution is to "ramp" your volume change over the buffer. For example, if you want to apply a gain of 1.1 to a buffer of 100 samples, and the previous gain was 1.0, then you would loop over all 100 samples starting with gain 1 and smoothly increase the gain until you reach the last sample, at which point your gain should be 1.1.
If you want an example of this code look at juce::AudioSampleBuffer::applyGainRamp:
http://www.rawmaterialsoftware.com/api/classAudioSampleBuffer.html
I found the flaw, I was abstracting different bit data types by going to their data using char*, I did not cast the usage of it to the correct datatype pointer. This means bytes were cut off when giving it data. This created the noise and volume changing bugs amongst others.
A flaw of my implementation and me not thinking about this when working with the audio data. A tip for anyone doing the same kind of thing, keep a good eye when modifying data, check which type your data is when using abstractions.
Many thanks to the guys trying to help me, the links were really interesting and it did learn me more things about audio programming.

Appropriate image file format for losslessly compressing series of screenshots

I am building an application which takes a great many number of screenshots during the process of "recording" operations performed by the user on the windows desktop.
For obvious reasons I'd like to store this data in as efficient a manner as possible.
At first I thought about using the PNG format to get this done. But I stumbled upon this: http://www.olegkikin.com/png_optimizers/
The best algorithms only managed a 3 to 5 percent improvement on an image of GUI icons. This is highly discouraging and reveals that I'm going to need to do better because just using PNG will not allow me to use previous frames to help the compression ratio. The filesize will continue to grow linearly with time.
I thought about solving this with a bit of a hack: Just save the frames in groups of some number, side by side. For example I could just store the content of 10 1280x1024 captures in a single 1280x10240 image, then the compression should be able to take advantage of repetitions across adjacent images.
But the problem with this is that the algorithms used to compress PNG are not designed for this. I am arbitrarily placing images at 1024 pixel intervals from each other, and only 10 of them can be grouped together at a time. From what I have gathered after a few minutes scanning the PNG spec, the compression operates on individual scanlines (which are filtered) and then chunked together, so there is actually no way that info from 1024 pixels above could be referenced from down below.
So I've found the MNG format which extends PNG to allow animations. This is much more appropriate for what I am doing.
One thing that I am worried about is how much support there is for "extending" an image/animation with new frames. The nature of the data generation in my application is that new frames get added to a list periodically. But I do have a simple semi-solution to this problem, which is to cache a chunk of recently generated data and incrementally produce an "animation", say, every 10 frames. This will allow me to tie up only 10 frames' worth of uncompressed image data in RAM, not as good as offloading it to the filesystem immediately, but it's not terrible. After the entire process is complete (or even using free cycles in a free thread, during execution) I can easily go back and stitch the groups of 10 together, if it's even worth the effort to do it.
Here is my actual question that everything has been leading up to. Is MNG the best format for my requirements? Those reqs are: 1. C/C++ implementation available with a permissive license, 2. 24/32 bit color, 4+ megapixel (some folks run 30 inch monitors) resolution, 3. lossless or near-lossless (retains text clarity) compression with provisions to reference previous frames to aid that compression.
For example, here is another option that I have thought about: video codecs. I'd like to have lossless quality, but I have seen examples of h.264/x264 reproducing remarkably sharp stills, and its performance is such that I can capture at a much faster interval. I suspect that I will just need to implement both of these and do my own benchmarking to adequately satisfy my curiosity.
If you have access to a PNG compression implementation, you could easily optimize the compression without having to use the MNG format by just preprocessing the "next" image as a difference with the previous one. This is naive but effective if the screenshots don't change much, and compression of "almost empty" PNGs will decrease a lot the storage space required.

jpeg compression ratio

Is there a table that gives the compression ratio of a jpeg image at a given quality?
Something like the table given on the wiki page, except for more values.
A formula could also do the trick.
Bonus: Are the [compression ratio] values on the wiki page roughly true for all images? Does the ratio depend on what the image is and the size of the image?
Purpose of these questions: I am trying to determine the upper bound of the size of a compressed image for a given quality.
Note: I am not looking to make a table myself(I already have). I am looking for other data to check with my own.
I had exactly the same question and I was disappointed that no one created such table (studies based on a single classic Lena image or JPEG tombstone are looking ridiculous). That's why I made my own study. I cannot say that it is perfect, but it is definitely better than others.
I took 60 real life photos from different devices with different dimensions. I created a script which compress them with different JPEG quality values (it uses our company imaging library, but it is based on libjpeg, so it should be fine for other software as well) and saved results to CSV file. After some Excel magic, I came to the following values (note, I did not calculated anything for JPEG quality lower than 55 as they seem to be useless to me):
Q=55 43.27
Q=60 36.90
Q=65 34.24
Q=70 31.50
Q=75 26.00
Q=80 25.06
Q=85 19.08
Q=90 14.30
Q=95 9.88
Q=100 5.27
To tell the truth, the dispersion of the values is significant (e.g. for Q=55 min compression ratio is 22.91 while max value is 116.55) and the distribution is not normal. So it is not so easy to understand what value should be taken as typical for a specific JPEG quality. But I think these values are good as a rough estimate.
I wrote a blog post which explains how I received these numbers.
http://www.graphicsmill.com/blog/2014/11/06/Compression-ratio-for-different-JPEG-quality-values
Hopefully anyone will find it useful.
Browsing Wikipedia a little more led to http://en.wikipedia.org/wiki/Standard_test_image and Kodak's test suite. Although they're a little outdated and small, you could make your own table.
Alternately, pictures of stars and galaxies from NASA.gov should stress the compressor well, being large, almost exclusively composed of tiny speckled detail, and distributed in uncompressed format. In other words, HUBBLE GOTCHOO!
The compression you get will depend on what the image is of as well as the size. Obviously a larger image will produce a larger file even if it's of the same scene.
As an example, a random set of photos from my digital camera (a Canon EOS 450) range from 1.8MB to 3.6MB. Another set has even more variation - 1.5MB to 4.6MB.
If I understand correctly, one of the key mechanisms for attaining compression in JPEG is using frequency analysis on every 8x8 pixel block of the image and scaling the resulting amplitudes with a "quantization matrix" that varies with the specified compression quality.
The scaling of high frequency components often result in the block containing many zeros, which can be encoded at negligible cost.
From this we can deduce that in principle there is no relation between the quality and the final compression ratio that will be independent of the image. The number of frequency components that can be dropped from a block without perceptually altering its content significantly will necessarily depend on the intensity of those components, i.e. whether the block contains a sharp edge, highly variable content, noise, etc.

Visualizing volume of PCM samples

I have several chunks of PCM audio (G.711) in my C++ application. I would like to visualize the different audio volume in each of these chunks.
My first attempt was to calculate the average of the sample values for each chunk and use that as an a volume indicator, but this doesn't work well. I do get 0 for chunks with silence and differing values for chunks with audio, but the values only differ slighly and don't seem to resemble the actual volume.
What would be a better algorithem calculate the volume ?
I hear G.711 audio is logarithmic PCM. How should I take that into account ?
Note, I haven't worked with G.711 PCM audio myself, but I presume that you are performing the correct conversion from the encoded amplitude to an actual amplitude before processing the values.
You'd expect the average value of most samples to be approximately zero as sound waveforms oscillate either side of zero.
A crude volume calculation would be rms (root mean square), i.e. taking a rolling average of the square of the samples and take the square root of that average. This will give you a postive quantity when there is some sound; the quantity is related to the power represented in the waveform.
For something better related to human perception of volume you may want to investigate the sort of techniques used in Replay Gain.
If you're feeling ambitious, you can download G.711 from the ITU-web site, and spend the next few weeks (or maybe more) implementing it.
If you're lazier (or more sensible) than that, you can download G.191 instead -- it includes source code to compress and decompress G.711 encoded data.
Once you've decoded it, visualizing the volume should be a whole lot easier.