time-domain signal in MFCC - c++

I have read about MFCC and Speech Recognition, and I don't understand one point. According to the document in this page http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/, what is the "time-domain signal"?? Is that the float number in data sub-chunk which I read in header-file of a wave file?
P/s: Sorry for my poor English :D

Yes, you are right. And quoted from wiki:
Time domain is the analysis of mathematical functions, physical signals or time series of economic or environmental data, with respect to time. In the time domain, the signal or function's value is known for all real numbers, for the case of continuous time, or at various separate instants in the case of discrete time.

Related

Signal Input to Neural Network

Working on a new project using AWS Machine Learning, with the intent of detecting certain patterns in an input signal. That is to say, the input to my model (neural network, decision tree, etc.) is a discrete signal with an unknown number of values, and my output is a known number of values.
I understand the theory behind traditional ML models such as neural networks, where a function is derived to map a known number of inputs to a known number of outputs. This makes sense with the requirement that the data supplied to the AWS ML platform be rows of CSV attributes.
Is there a way use this platform, or ML models in general for this kind of signal processing, or is there a preprocessing technique I can use to derive a fixed number of input variables?
For example, one I had in mind was to take a fourier transform of the time signal, and describe the signal in the frequency domain band limited to a reasonable range (effectively cutting down the signal to a fixed number of values). Total shot in the dark though, I'm not an expert on ML or signal processing.
For audio signals, one possible (common?) method of data engineering is to use MFCCs (Mel Frequency Cepstrum Coefficients), for a set of short segments in time (windows) of audio data, as your ML input table.

How to process multiple inputs in c++ for artificial neural network

How does C++ process multiple inputs for an artificial neural network in real-time?
I'm assuming this is without using a spiking neural network, but a more traditional one (i.e. just a basic neural network as described here)
http://www.ai-junkie.com/ann/evolved/nnt1.html
Is this possible in a real-time world? I was thinking one would have to process either each input individually (which will always result in the same output, hence the dilemna), or accrue a certain # of inputs per time threshold and then process them at once...
then again, what does someone do with multiple instances of the same input? Process it twice?
I ask this because I'm looking at neuralbot, which I believe uses a normal neural network, but I'm trying to understand ANN's first before I delve into it, and am not sure how an ANN processes multiple inputs before processing target output(s).
Your question is not really clear but I'll try to answer. :)
You can see an ANN (Artificial Neural Network) like a particular case of Adaptive Filters.
There are three main elements:
A sequence of inputs x(n).
A Parametric Variable Filter. In this case the filter is the ANN and the parameters are the neurons weights.
An Update Algorithm that updates the filter parameters according to the error between desired an actual output. In ANN the most used update algorithm is the Backpropagation Algorithm.
In ANN there are two steps:
The Training Step. This is the hard part. You start with random neurons weights. You have a sequence of inputs and their desired outputs and you run the ANN with the update algorithm on. When the error is under a certain threshold you can say that your ANN is trained. This step is usually done off-line (not in real time).
The Execution Step. You have the trained ANN. Now just put in sequence the input in it and use the output. This is usually a fast operation and can be done in real-time (if it is what you mean).
Now.. what do you mean for "multiple inputs at once"? First of all standard computers can do only a very very small number of operations at once, standard PC has 4/8 cores so can do almost 4-8 operations at once. This number is too low for every real world ANN applications.
You said:
what does someone do with multiple instances of the same input? Process it twice?
The answer is Yes. The "Execution Step" is so fast that there are no reason to don't do it. In the "Training Step" duplicated inputs can be removed before training starts (cause the training inputs are known a priori). So there are no problem in this. :)

get loudness level from raw data recieved from microphone in DirectShow

How I can get loudness level from raw data received from microphone in DirectShow?
IMediaSample keep data in bytes. And how I can read this bytes and get something?
Loudness is an aural quality, not a physic formula. There are many many definitions for it.
It's a also a temporal value. As a consequence, this value changes during the time.
The simplest implementation I remember I had seen some years ago, was simply putting a time out on the maximum value of the amplitude. But the log of the amplitude is surely better to approximate the ear sensitivity much closer.
You can also consider the power of the signal ( signal * signal ... but there are also more definitions that takes into account the frequency spectrum components...).
It's kitchen recipes. Choose the simplest.
Edit: it seems my answer was too fast and fuzzy, I probably mistake Volume and Loudness. this wikipedia article states there are units for measuring loudness. Sone and Phon.
You need to process data to calculate loudness out of raw bytes. One of the method is defined in BS.1770 : Algorithms to measure audio programme loudness and true-peak audio level specification and describes the algorithm involved.

Is it possible to see if two MP3 files are the same song by analyzing the files' bytes?

This is to be done in C++ or C....
I know we can read the MP3s' meta data, but that information can be changed by anyone, can't it?
So is there a way to analyze a file's contents and compare it against another file and determine if it is in fact the same song?
edit
Lots of interesting things coming out that I hadn't thought of. Not at all a good idea to attempt this.
It's possible, but very hard.
Even the same original recording may well be encoded differently by different MP3 encoders or the same encoder with different settings... leading to different results when the MP3 is then decoded. You'd need to work out an aural model to "understand" how big the differences are, and make a judgement.
Then there's the matter of different recordings. If I sing "Once in Royal David's City" and Aled Jones sings it, are those the same song? What if there are two different versions of a song where one has slightly modified lyrics? The key could be different, it could be in a different vocal range - all kinds of things.
How different can two songs be but still count as "the same song"? Once you've decided that, then there's the small matter of implementing it ;)
If I really had to do this, my first attempt would be to take a Fourier transform of both songs and compare the histograms. You can use FFTW (http://www.fftw.org/) to take the Fourier transform, and then compare the histograms by summing the squares of the differences at each frequency. If the resultant sum is greater than some threshold (which you must determine by experimentation) then the songs are deemed to be different, otherwise they are the same.
No. Not SO simple.
You can check they contain the same encoded data, BUT:
Could be a different bitrate
Could be the same song, just a 1/100ths of a second off
In both cases the bytes would not match.
Basically, if a solution looks too simple to be true, it often is.
If you mean "same song" in the iTunes sense of "same recording", it would be possible to compares two audio files, but not by byte-by-byte comparison of an encoded file since even for the same format there are variables such as data rate and compression that are selected at time of encoding.
Also each encoding of the same recording may include different lead-in/lead-out timings, different amplitude and equalisation, and may have come from differing original sources (vinyl, CD, original master etc.). So you need a comparison method that takes all these variables into account, and even then you will end up with a 'likelihood' of a match rather than a definitive match.
If you genuinely mean "same song", i.e. any recording by any artist of the same composition and lyrics, then you are unlikely to get a high statistical correlation in most cases since pitch, tempo, range, instrumental arrangement will be very different.
In the "same recording" scenario, relatively simple signal processing and statistical techniques could be applied, in the "same song" scenario, AI techniques would need to be deployed, and even then the results I suspect would be poor.
If you want to compare MP3 files that originated from the same MP3, but have tagged with metadata differently, it would be straight forward to just compare the actual audio data. Since it originated from the same MP3 encoding, you should be able to do a byte by byte comparison. You would have to compare all byte. It should be sufficient to sample just a few to get a unique key that would be statistically almost impossible to find in another song.
If the files have been produced by different encoders, you would have to extract some "fuzzy" feature keys from the data and compare those keys. In a hurry I would probably construct an algorithm like this:
Decode audio to pulse-code modulation (wave) in a standard bit rate.
Find a fixed number of feature starting points using some dynamic location algorithm. For example find top 10 highest wave peaks ordered from beginning of wave or simply spread evenly across the wave (it would be a good idea to fix the first and last position dynamically though, since different encodings might not start and end at exactly the same point). An improvement would be to select feature points at positions in the wave that are not likely to be too repetitive.
Extract a set of one-dimensional feature key scalars from the feature points. For example, for each feature normalize the following n-sample values and count the number of zero-crossings, peak to average ratio, mean zero-crossing distance, signal-energy. The goal is to extract robust features that are relatively unique, while still characteristic even if some noise and distortion is added to the signal. This can obviously be improved almost infinitely.
Compare the extracted feature keys of the two files using some accuracy measurement (f.eks. 9 out of 10 feature extractions must match at least 99% on 4 out of 5 of their extracted feature keys).
The benefit of a feature extraction approach is that you can build a database of features for all your mp3-files and for a single file ask the question: What other media files have exactly or almost exactly the same feature as this one. The feature lookup could be implemented very efficiently with R*-trees or similar, which could be used to give you a fast distance measurement between the n-dimensional feature sets.
The above technique is essentially a variant of what is used in image search algorithms such as SIFT, which is probably the base of such application as Photosynth and Google Goggles. In image searching you filter the image for good candidate points for relatively unique features (such as corners of shapes), then you normalize the area around that feature to get normalized color, intensity, scale and direction of features. Finally you extract the features and search an n-dimensional database of features of other images and verify that found features in other images are geometrically positioned in the same pattern as in your search image. The technique for searching audio would be the same, only simpler, since audio is one dimensional.
Use the open source EchoPrint library to create a signature of the two audio files, and compare them with each other.
The library is very easy to use, and has clear examples on how to create the signatures.
http://echoprint.me/
You can even query their database with the signature and find matching song metadata (such as title, artist, etc).
I think the Fast Fourier-Transform (FFT) approach hinted by jstanley is pretty good for most use cases; in particular, it works for verifying that the two are the same release/ same recording by the same artist/ same bitrate / audio quality.
To be more explicit, sox and spek (via command line and GUI, respectively) can do this pretty painlessly.
Spek is pretty foolproof -- just open the software and point it to the two audio files in question.
sox can generate spectograms (FFTs) from the command line line so:
sox "$file" -n spectrogram -o "$outfile".
The result from either are two images; if they look basically identical, then for almost all intents and purposes, the two songs will be equivalent.
For example, I wanted to test if these two files:
Soundtrack to an imaginary film mixtape 2011.mp3
DJRUM - Sountrack to an imaginary film mixtape 2011 (for mary-anne hobbs).mp3
were the same. diff reported a difference in the binary files (perhaps due to metadata differences or minor encoding differences), but a quick glance at their spectrograms resolved it:

Real time plotting/data logging

I'm going to write a program that plots data from a sensor connected to the computer. The sensor value is going to be plotted as a function of the time (sensor value on the y-axis, time on the x-axis). I want to be able to add new values to the plot in real time. What would be best to do this with in C++?
Edit: And by the way, the program will be running on a Linux machine
Are you particularly concerned about the C++ aspect? I've done 10Hz or so rate data without breaking a sweat by putting gnuplot into a read/plot/refresh loop or with LiveGraph with no issues.
Write a function that can plot a std::deque in a way you like, then .push_back() values from the sensor onto the queue as they come available, and .pop_front() values from the queue if it becomes too long for nice plotting.
The exact nature of your plotting function depends on your platform, needs, sense of esthetics, etc.
You can use ring buffers. In such buffer you have read position and write position. This way one thread can write to buffer and other read and plot a graph. For efficiency you usually end up writing your own framework.
Size of such buffer can be estimated using eg.: data delivery speed from sensor (40KHz?), size of one probe and time span you would like to keep for plotting purposes.
It also depends whether you would like to store such data uncompressed, store rendered plot - all for further offline analysis. In non-RTOS environment your "real-time" depends on processing speed: how fast you can retrieve/store/process and plot data. Usually it is near-real time efficiency.
You might want to check out RRDtool to see whether it meets your requirements.
RRDtool is a high performance data logging and graphing system for time series data.
I did a similar thing for a device that had a permeability sensor attached via RS232.
package bytes received from sensor into packets
use a collection (mainly a list) to store them
prevent the collection to go over a fixed size by trashing least recent values before new ones arrive
find a suitable graphics library to draw with (maybe SDL if you wanna keep it easy and cross-platform), but this choice depends on what kind of graph you need (ncurses may be enough)
last but not least: since you are using a sensor I suppose your approach will be multi-threaded so think about it and use a synchronized collection or a collection that allows adding values when other threads are retrieving them (so forgot iterators, maybe an array is enough)
Btw I think there are so many libraries, just search for them:
first
second
...
I assume that you will deploy this application on a RTOS. But, what will be the data rate and what are real-time requirements! Therefore, as written above, a simple solution may be more than enough. But, if you have hard-real time constraints everything changes drastically. A multi-threaded design with data pipes may solve your real-time problems.