How can I compress cyclic data with minimal code? - compression

I need to collect data from a sensor and compress (lossy) about 2 to 1. I would like to aim for under 50 lines of C code. The signal is from a 4 bit A/D converter and is roughly a sine wave with slightly erratic amplitude and frequency. There are occasional times where the signal is erratic.

"Lossy" is pretty broad and allows for anything. Half the samples. Half the bits. Anything else is going to be a bit involved.
You would have to a) predict the next sample as best you can from the previous samples, b) subtract the prediction from the sample, and c) transmit that difference in two bits or less, on average. Doing this lossy will cause the result to drift, requiring periodic re-centering with the original four-bit sample.
A simple quadratic predictor would be a - 3b + 3c where a, b, c are the last three samples. A sine-wave predictor would be more complex, fitting the frequency and phase and adjusting as you go along.
If your data is noisy, and its only four-bits in resolution to begin with, it is doubtful that you will get any mileage from this.

Related

RF Divider function in SDR

I have what may be an odd question for the SDR gurus out there.
What would be the physical implementation (in software) of a broadband frequency divider?
For example, say I want to capture a signal at 1 GHz, with a 10 MHz bandwidth, then divide it by a factor of 10.
I would expect to get a down-sampled signal at 100 MHz with a 1 MHz bandwidth.
Yes, I know I would lose information, but assume this would be presented as a spectrum analysis, not full audio, video, etc.
Conceptually, could this be accomplished by sampling the RF at 2+times the highest frequency components, say at 2.5 GHz, then discarding 9 out of 10 samples - decimating the input stream?
Thanks,
Dave
Well, as soon as you've digitized your signal it loses the property "bandwidth", which is a real-world concept (and not one attached to the inherently meaningless stream of numbers that we're talking about in DSP and SDR). So, there's no signal with a bandwidth of 10MHz (without looking at the contents of the samples), but only a stream of numbers that we remember being produced by sampling an analog signal with a sampling rate of 20MS/s (if you're doing real sampling; if you have an I/Q downconverter and sample I and Q simultaneously, you'll get complex samples, of which 10MS/s will be enough to represent 10MHz of bandwidth).
Now, if you just throw away 9 out of 10 samples, which is decimation, you'll get aliasing, because now you can't tell whether a sine that took 10 samples in the original signal is actually a sine or just a constant; the same goes for any sine with a frequency higher than your new sampling rate's Nyquist bandwidth. That is a loss of information, so yes, that would work.
I think however, you have something specific in mind, which is scaling the signal in frequency direction. Let's have a quick excourse in to Fourier analysis:
there is the well known correspondence for frequency scaling.
let G be the Fourier transform of g, then
g(at) <--> 1/|a| G(t/a)
As you can see, compressing something in frequency domain actually means "speeding it up" in time domain, ie. decimation!
So, in order to do this meaningfully, you could imagine taking the DFT of length N of your signal, and set 9 out of 10 bins to zero, by multiplying it with a comb of 1's. Now, multiplication with a signal in frequency domain is convolution with the fourier transform of that signal in time domain. The fourier transform of such a Comb is, to little surprise, the complement of a Nyquist-M filter, and thus a filter itself; you will thus end up with a multi-band-passed version of your signal, which you then can decimate without aliasing.
Hope that was what you're after!

Compute frequency of sinusoidal signal, c++

i have a sinusoidal-like shaped signal,and i would like to compute the frequency.
I tried to implement something but looks very difficult, any idea?
So far i have a vector with timestep and value, how can i get the frequency from this?
thank you
If the input signal is a perfect sinusoid, you can calculate the frequency using the time between positive 0 crossings. Find 2 consecutive instances where the signal goes from negative to positive and measure the time between, then invert this number to convert from period to frequency. Note this is only as accurate as your sample interval and it does not account for any potential aliasing.
You could try auto correlating the signal. An auto correlation can be rapidly calculated by following these steps:
Perform FFT of the audio.
Multiply each complex value with its complex conjugate.
Perform the inverse FFT of the audio.
The left most peak will always be the highest (as the signal always correlates best with itself). The second highest peak, however, can be used to calculate the sinusoid's frequency.
For example if the second peak occurs at an offset (lag) of 50 points and the sample rate is 16kHz and the window is 1 second then the end frequency is 16000 / 50 or 320Hz. You can even use interpolation to get a more accurate estimation of the peak position and thus a more accurate sinusoid frequency. This method is quite intense but is very good for estimating the frequency after significant amounts of noise have been added!

Lowpass FIR Filter with FFT Convolution - Overlap add, why and how

First off, sorry for not posting the code here. For some reason all the code got messed upp when i tried to enter the code i had onto this page, and it probably was too much anyhow to post, to be acceptable. Here is my code: http://pastebin.com/bmMRehbd
Now from what im being told, the reason why i can't get a good result out of this code is because i'm not using overlap add. I have tried to read on several sources on the internet as to why i need to use overlap add, but i can't understand it. It seems like the actuall filter works, cause anything above the given cutoff, gets indeed cutoff.
I should mention this is code made to work for vst2-sdk.
Can someone tell me why i need to add it and how i can implement a overlap add code into the given code?
I should also mention that i'm pretty stupid when it comes to algoritms and maths. I'm one of those persons who need to visually get a grip of what i'm doing. That or getting stuff explained by code :), and then i mean the actual overlap.
Overlad add theory: http://en.wikipedia.org/wiki/Overlap%E2%80%93add_method
Thanks for all the help you can give!
The overlap-add method is needed to handle the boundaries of each fft buffer. The problem is that multiplication in the FFT domain results in circular convolution in the time domain. This means that after perfoming the IFFT, the results at the end of the frame wrap around and corrupt the output samples at the beginning of the frame.
It may be easier to think about it this way: Say you have a filter of length N. Linear convolution of this filter with M input samples actually returns M+N-1 output samples. However, the circular convolution done in the FFT domain results in the same number of input and output samples, M. The extra N-1 samples from linear convolution have "wrapped" around and corrupted the first N-1 output samples.
Here's an example (matlab or octave):
a = [1,2,3,4,5,6];
b = [1,2,1];
conv(a,b) %linear convolution
1 4 8 12 16 20 17 6
ifft(fft(a,6).*fft(b,6)) %circular convolution
18 10 8 12 16 20
Notice that the last 2 samples have wrapped around and added to the first 2 samples in the circular case.
The overlap-add/overlap-save methods are basically methods of handling this wraparound. The overlap of FFT buffers is needed since circular convolution returns fewer uncorrupted output samples than the number of input samples.
When you do a convolution (with a finite impulse response filter) by taking the inverse discrete Fourier transform of the product of the discrete Fourier transforms of two input signals, you are really implementing circular convolution. I'll hereby call this "convolution computed in the frequency domain." (If you don't know what a circular convolution is, look at this link. It's basically a convolution where you assume the domain is circular, i.e., shifting the signal off the sides makes it "wrap around" to the other side of the domain.)
You generally want to perform convolution by using fast Fourier transforms for large signals because it's computationally more efficient.
Overlap add (and its cousin Overlap save) are methods that work around the fact the convolutions done in the frequency domain are really circular convolutions, but that in reality we rarely ever want to do circular convolution, but typically rather linear convolutions.
Overlap add does it by "zero-padding" chunks of the input signal and then approrpriately using the portion of the circular convolutions (that were done in the frequency domain) appropriately. Overlap save does it by only keeping the portion of the signal that corresponds to linear convolution and tossing the part that was "corrupted" by the circular shifts.
Here are two links for from Wikipedia for both methods.
Overlap-add : This one has a nice figure explaining what's going on.
Overlap-save
This book by Orfanidis explains it well. See section 9.9.2. It's not the "de facto" standard on signal processing, but it's extremely well written and is a better introduction than other books, in my opinion.
First, understand that convolution in the time domain is equivalent to multiplication in the frequency domain. In convolution, you are at roughly O(n*m) where n is the FIR length and m is the number of samples to be filtered. In the frequency domain, using the FFT, you are running a O(n * log n). For large enough n, the cost of filtering is substantially less when doing it the frequency domain. If n is relatively small, however, the benefits decrease to the point its simpler to filter in the time domain. This breakpoint is subjective, however, figure 50 to 100 as being the point where you might switch.
Yes, a convolution filter will "work", in term of changing the frequency response. But this multiplication in the frequency domain will also contaminate time-domain data at one end with data from the other end, and vice-versa. Overlap add/save extends the FFT size and chops off the "contaminated" end, and then uses that end data to fix the beginning of the subsequent FFT window.

What is a correct formula of amplifying WaveForm audio?

I am wondering what a correct formula of amplifying WaveForm audio is from C++.
Let's say there's a 16 bit waveform data following:
0x0000 0x2000, 0x3000, 0x2000, 0x0000, (negative part), ...
Due to acoustic reason, just doubled the number won't make twice bigger audio like this:
0x0000 0x4000, 0x6000, 0x4000, 0x0000, (doubled negative part), ...
If there's someone who knows well about audio modification, please let me know.
If you double all the sample values it will sure sound "twice as loud", that is, 6dB louder. Of course, you need to be careful to avoid distortion due to clipping - that's the main reason why all professional audio processing software today uses float samples internally.
You may need to get back to integer when finally outputting the sound data. If you're just writing a plugin for some DAW (as I would recommend if you want to do program simple yet effective sound FX), it will do all this stuff for you: you just get a float, do something with it, and output a float again. But if you want to, for instance, directly output a .wav file, you need to first limit the output so that everything above 0dB (which is +-1 in a usual float stream) is clipped to just +-1. Then you can multiply by the maximum your desired integer type can reach -1, and just cast it into that type. Done.
Anyway, you're certainly right in that it's important to scale your volume knob logarithmically rather than linear (many consumer-grade programs don't, which is just stupid because you will end up using values very close to the left end of the knobs range most of the time), but that has nothing to do with the amplification calculation itself, it's just because we perceive the loudness of signals on a logarithmic scale. Still, the loudness itself is determined by a simple multiplication of a constant factor of the sound pressure, which in turn is proportional to the voltage in the analog circuitry and to the values of the digital samples in any DSP.
Another thing: I don't know how far you're intending to go, but if you want do do this really properly you should not just clip away peaks that are over 0dB (the clipping sounds very harsh), but implement a proper compressor/limiter. This would then automatically prevent clipping by reducing the level at the loudest parts. You don't want to overdo this either (popular music is usually over-compressed anyway, as a result a lot of the dynamic musical expression is lost), but it is still a "less dangerous" way of increasing the audio level.
I used linear multiplication for it every time and it never failed. It even worked for fade-outs for example...
so
float amp=1.2;
short sample;
short newSample=(short)amp*sample;
If you want your fade out to be linear, in a sample processing loop do
amp-=0.03;
and if you want to be logarithmic, in a sample processing loop do
amp*=0.97;
until amp reaches some small value (amp < 0.1)
This just may be a perception problem. Your ears (and eyes - look up gamma w.r.t. video), don't perceive loudness in a linear response to the input. A good model of it is that your ears respond to perceive a ln(n) increase for a n increase in volume. Look up the difference between linear pots and audio pots.
Anyway, I don't know if that matters here because your output amp may account for that, but if you want it to be perceived twice as loud you may have to make it e^2 times as loud. Which may mean you're in the realm of clipping now.

detecting pauses in a spoken word audio file using pymad, pcm, vad, etc

First I am going to broadly state what I'm trying to do and ask for advice. Then I will explain my current approach and ask for answers to my current problems.
Problem
I have an MP3 file of a person speaking. I'd like to split it up into segments roughly corresponding to a sentence or phrase. (I'd do it manually, but we are talking hours of data.)
If you have advice on how to do this programatically or for some existing utilities, I'd love to hear it. (I'm aware of voice activity detection and I've looked into it a bit, but I didn't see any freely available utilities.)
Current Approach
I thought the simplest thing would be to scan the MP3 at certain intervals and identify places where the average volume was below some threshold. Then I would use some existing utility to cut up the mp3 at those locations.
I've been playing around with pymad and I believe that I've successfully extracted the PCM (pulse code modulation) data for each frame of the mp3. Now I am stuck because I can't really seem to wrap my head around how the PCM data translates to relative volume. I'm also aware of other complicating factors like multiple channels, big endian vs little, etc.
Advice on how to map a group of pcm samples to relative volume would be key.
Thanks!
PCM is a time frame base encoding of sound. For each time frame, you get a peak level. (If you want a physical reference for this: The peak level corresponds to the distance the microphone membrane was moved out of it's resting position at that given time.)
Let's forget that PCM can uses unsigned values for 8 bit samples, and focus on
signed values. If the value is > 0, the membrane was on one side of it's resting position, if it is < 0 it was on the other side. The bigger the dislocation from rest (no matter to which side), the louder the sound.
Most voice classification methods start with one very simple step: They compare the peak level to a threshold level. If the peak level is below the threshold, the sound is considered background noise.
Looking at the parameters in Audacity's Silence Finder, the silence level should be that threshold. The next parameter, Minimum silence duration, is obviously the length of a silence period that is required to mark a break (or in your case, the end of a sentence).
If you want to code a similar tool yourself, I recommend the following approach:
Divide your sound sample in discrete sets of a specific duration. I would start with 1/10, 1/20 or 1/100 of a second.
For each of these sets, compute the maximum peak level
Compare this maximum peak to a threshold (the silence level in Audacity). The threshold is something you have to determine yourself, based on the specifics of your sound sample (loudnes, background noise etc). If the max peak is below your threshold, this set is silence.
Now analyse the series of classified sets: Calculate the length of silence in your recording. (length = number of silent sets * length of a set). If it is above your Minimum silence duration, assume that you have the end of a sentence here.
The main point in coding this yourself instead of continuing to use Audacity is that you can improve your classification by using advanced analysis methods. One very simple metric you can apply is called zero crossing rate, it just counts how often the sign switches in your given set of peak levels (i.e. your values cross the 0 line). There are many more, all of them more complex, but it may be worth the effort. Have a look at discrete cosine transformations for example...
Just wanted to update this. I'm having moderate success using Audacity's Silence Finder. However, I'm still interested in this problem. Thanks.
PCM is a way of encoding a sinusoidal wave. It will be encoded as a series of bits, where one of the bits (1, I'd guess) indicates an increase in the function, and 0 indicates a decrease. The function can stay roughly constant by alternating 1 and 0.
To estimate amplitude, plot the sin wave, then normalize it over the x axis. Then, you should be able to estimate the amplitude of the sin wave at different points. Once you've done that, you should be able to pick out the spots where amplitude is lower.
You may also try to use a Fourier transform to estimate where the signals are most distinct.