I need to create an Integer Sequence from an Audio file. I was checking the waveform libraries as that draw a linear graph. But I am searching for the key information, What is the source of the integer that is used to draw the graph ? is it amplitude ? frequency ? or something else ? There are libraries available but I need to know what unit of information I need to extract to have a data that I can feed to a graph. However drawing a graph is not my objective. I just want that raw integer array.
Of course, it's the amplitudes what you need to get a wave oscillogram, and it's the way PCM data are stored in wav files, for example (data which come directly after the file header). Note that there are 8-bit and 16-bit formats, the latter may be also big-endian or little-endian depending on the byte order (just to keep you aware of it).
Audio is simply a curve - when you plot it with time across the X axis then Y axis is amplitude - similar to plotting a sin math function - each point on the curve is a number which gets stored in the audio file - WAV format this number typically is a 16 bit unsigned integer - so ignoring the 44 byte header - the rest of the file is just a sequence of these integer numbers. When this curve varies up and down quickly over time the frequency is higher than if the curve varies more slowly over time. If you download the audio workbench application : Audacity you can view this curve of any audio file (WAV, mp3,...)
Related
I'm looking for an audio or image compression algorithm that can compress a torrent of 16-bit samples
by a fairly predictable amount (2-3x)
at very high speed (say, 60 cycles per sample at most: >100MB/s)
with lossiness being acceptable but, of course, undesirable
My data has characteristics of images and audio (2-dimensional, correlated in both dimensions and audiolike in one dimension) so algorithms for audio or images might both be appropriate.
An obvious thing to try would be this one-dimensional algorithm:
break up the data into segments of 64 samples
measure the range of values among those samples (as an example, the samples might be between 3101 and 9779 in one segment, a difference of 6678)
use 2 to 4 additional bytes to encode the range
linearly downsample each 16-bit sample to 8 bits in that segment.
For example, I could store 3101 in 16 bits, and store a scaling factor ceil(6678/256) = 27 in 8 bits, then convert each 16-bit sample to 8-bit as s8 = (s16 - base) / scale where base = 3101 + 27>>1, scale = 27, with the obvious decompression "algorithm" of s16 = s8 * 27 + 3101.) Compression ratio: 128/67 = 1.91.
I've played with some ideas to avoid the division operation, but hasn't someone by now invented a superfast algorithm that could preserve fidelity better than this one?
Note: this page says that FLAC compresses at 22 million samples per second (44MB/s) at -q6 which is pretty darn good (assuming its implementation is still single-threaded), if not quite enough for my application. Another page says FLAC has similar performance (40MB/s on a 3.4GHz i3-3240, -q5) as 3 other codecs, depending on quality level.
Take a look at the PNG filters for examples of how to tease out your correlations. The most obvious filter is "sub", which simply subtracts successive samples. The differences should be more clustered around zero. You can then run that through a fast compressor like lz4. Other filter choices may result in even better clustering around zero, if they can find advantage in the correlations in your other dimension.
For lossy compression, you can decimate the differences before compressing them, dropping a few low bits until you get the compression you want, and still retain the character of the data that you would like to preserve.
How can I draw an spectrum for an given audio file with Bass library?
I mean the chart similar to what Audacity generates:
I know that I can get the FFT data for given time t (when I play the audio) with:
float fft[1024];
BASS_ChannelGetData(chan, fft, BASS_DATA_FFT2048); // get the FFT data
That way I get 1024 values in array for each time t. Am I right that the values in that array are signal amplitudes (dB)? If so, how the frequency (Hz) is associated with those values? By the index?
I am an programmer, but I am not experienced with audio processing at all. So I don't know what to do, with the data I have, to plot the needed spectrum.
I am working with C++ version, but examples in other languages are just fine (I can convert them).
From the documentation, that flag will cause the FFT magnitude to be computed, and from the sounds of it, it is the linear magnitude.
dB = 10 * log10(intensity);
dB = 20 * log10(pressure);
(I'm not sure whether audio file samples are a measurement of intensity or pressure. What's a microphone output linearly related to?)
Also, it indicates the length of the input and the length of the FFT match, but half the FFT (corresponding to negative frequencies) is discarded. Therefore the highest FFT frequency will be one-half the sampling frequency. This occurs at N/2. The docs actually say
For example, with a 2048 sample FFT, there will be 1024 floating-point values returned. If the BASS_DATA_FIXED flag is used, then the FFT values will be in 8.24 fixed-point form rather than floating-point. Each value, or "bin", ranges from 0 to 1 (can actually go higher if the sample data is floating-point and not clipped). The 1st bin contains the DC component, the 2nd contains the amplitude at 1/2048 of the channel's sample rate, followed by the amplitude at 2/2048, 3/2048, etc.
That seems pretty clear.
I need something like a class or function or whatever that can take a wav file as an input an give me the vloums value.
if something like this is not Exist, how should I get those values ?
is there any way at all ?
I got the frequencies, I want the Volume.
by the way Im coding in C++.
Thank u for ur answers.
explanation: by volume I mean sound intensity and amplitude.
Volume is related to a measurable quantity : Root Mean Square (RMS). Once you gain access to the audio curve RMS can be calculated for a chosen audio buffer size (entire clip or some portion) Conceptually, you simply walk across a set of samples (each entry of a buffer) where for example the audio curve values may vary from -1 to +1 and you square each value then add to a running total then take the square root of this total. RMS is just the average value of the source values. Short of rolling your own algorithm checkout implementations : ReplayGain
Considering http://soundfile.sapp.org/doc/WaveFormat/ I dont think volume information can be retrieved from the wav file itself. Also, generally, "volume" is subject to the hardware playing the wav file and not the wav file itself if im not mistaken.
How might one generate audio at runtime using C++? I'm just looking for a starting point. Someone on a forum suggested I try to make a program play a square wave of a given frequency and amplitude.
I've heard that modern computers encode audio using PCM samples: At a give rate for a specific unit of time (eg. 48 kHz), the amplitude of a sound is recorded at a given resolution (eg. 16-bits). If I generate such a sample, how do I get my speakers to play it? I'm currently using windows. I'd prefer to avoid any additional libraries if at all possible but I'd settle for a very light one.
Here is my attempt to generate a square wave sample using this principal:
signed short* Generate_Square_Wave(
signed short a_amplitude ,
signed short a_frequency ,
signed short a_sample_rate )
{
signed short* sample = new signed short[a_sample_rate];
for( signed short c = 0; c == a_sample_rate; c++ )
{
if( c % a_frequency < a_frequency / 2 )
sample[c] = a_amplitude;
else
sample[c] = -a_amplitude;
}
return sample;
}
Am I doing this correctly? If so, what do I do with the generated sample to get my speakers to play it?
Your loop has to use c < a_sample_rate to avoid a buffer overrun.
To output the sound you call waveOutOpen and other waveOut... functions. They are all listed here:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd743834(v=vs.85).aspx
The code you are using generates a wave that is truly square, binary kind of square, in short the type of waveform that does not exist in real life. In reality most (pretty sure all) of the sounds you hear are a combination of sine waves at different frequencies.
Because your samples are created the way they are they will produce aliasing, where a higher frequency masquerades as a lower frequency causing audio artefacts. To demonstrate this to yourself write a little program which sweeps the frequency of your code from 20-20,000hz. You will hear that the sound does not go up smoothly as it raises in frequency. You will hear artefacts.
Wikipedia has an excellent article on square waves: https://en.m.wikipedia.org/wiki/Square_wave
One way to generate a square wave is to perform an inverse Fast Fourier Transform which transforms a series of frequency measurements into a series of time based samples. Then generating a square wave is a matter of supplying the routine with a collection of the measurements of sin waves at different frequencies that make up a square wave and the output is a buffer with a single cycle of the waveform.
To generate audio waves is computationally expensive so what is often done is to generate arrays of audio samples and play them back at varying speeds to play different frequencies. This is called wave table synthesis.
Have a look at the following link:
https://www.earlevel.com/main/2012/05/04/a-wavetable-oscillator%E2%80%94part-1/
And some more about band limiting a signal and why it’s necessary:
https://dsp.stackexchange.com/questions/22652/why-band-limit-a-signal
I've run into some nasty problem with my recorder. Some people are still using it with analog tuners, and analog tuners have a tendency to spit out 'snow' if there is no signal present.
The Problem is that when noise is fed into the encoder, it goes completely crazy and first consumes all CPU then ultimately freezes. Since main point od the recorder is to stay up and running no matter what, I have to figure out how to proceed with this, so encoder won't be exposed to the data it can't handle.
So, idea is to create 'entropy detector' - a simple and small routine that will go through the frame buffer data and calculate entropy index i.e. how the data in the picture is actually random.
Result from the routine would be a number, that will be 0 for completely back picture, and 1 for completely random picture - snow, that is.
Routine in itself should be forward scanning only, with few local variables that would fit into registers nicely.
I could use zlib or 7z api for such task, but I would really want to cook something on my own.
Any ideas?
PNG works this way (approximately): For each pixel, replace its value by the value that it had minus the value of the pixel left to it. Do this from right to left.
Then you can calculate the entropy (bits per character) by making a table of how often which value appears now, making relative values out of these absolute ones and adding the results of log2(n)*n for each element.
Oh, and you have to do this for each color channel (r, g, b) seperately.
For the result, take the average of the bits per character for the channels and divide it by 2^8 (assuming that you have 8 bit per color).