Signal Input to Neural Network - amazon-web-services

Working on a new project using AWS Machine Learning, with the intent of detecting certain patterns in an input signal. That is to say, the input to my model (neural network, decision tree, etc.) is a discrete signal with an unknown number of values, and my output is a known number of values.
I understand the theory behind traditional ML models such as neural networks, where a function is derived to map a known number of inputs to a known number of outputs. This makes sense with the requirement that the data supplied to the AWS ML platform be rows of CSV attributes.
Is there a way use this platform, or ML models in general for this kind of signal processing, or is there a preprocessing technique I can use to derive a fixed number of input variables?
For example, one I had in mind was to take a fourier transform of the time signal, and describe the signal in the frequency domain band limited to a reasonable range (effectively cutting down the signal to a fixed number of values). Total shot in the dark though, I'm not an expert on ML or signal processing.

For audio signals, one possible (common?) method of data engineering is to use MFCCs (Mel Frequency Cepstrum Coefficients), for a set of short segments in time (windows) of audio data, as your ML input table.

Related

time-domain signal in MFCC

I have read about MFCC and Speech Recognition, and I don't understand one point. According to the document in this page http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/, what is the "time-domain signal"?? Is that the float number in data sub-chunk which I read in header-file of a wave file?
P/s: Sorry for my poor English :D
Yes, you are right. And quoted from wiki:
Time domain is the analysis of mathematical functions, physical signals or time series of economic or environmental data, with respect to time. In the time domain, the signal or function's value is known for all real numbers, for the case of continuous time, or at various separate instants in the case of discrete time.

C++: Design data structure(s) to store multi-keyed object set to be able to collect and traverse a hyper-planes of that set

I'm trying to implement a storage for a large series (hundreds of millions) of results of alike scientific experiments. An experiment has many discrete-valued attributes (like lattice point of emitter, lattice point of receiver, time of emission event, temperature and elevation at the emission and at the receiver etc) and experiment results (like emission strength, received strength etc). The data is input as a huge input flow.
I'd like to design a storage structure that would allow organization of the input data in several discrete dimensions (representing attributes). I'd like also to be able to obtain all the experiments' results related to a certain attribute value (like, "all experiments from lattice point #10" or "all experiments where emission was received at elevation level 100m". Such a selection should form an array that would allow some en-mass processing (like averaging the results over the set of such a selection).
What could be the proper C++ data structures (preferably made of STL) that allow for such fast searches and combinations?
I've heard that the thing I want is somehow related to Filter (high-order function), but I'm not good at functional programming.
The problem you describe seems like a typical usecase for a database. In case there is a reasons why you don't want to use a database or want a c++ only solution maybe you can get some ideas from how popular DB-s are implemented. For instance some relational databases use B+ trees.

How to process multiple inputs in c++ for artificial neural network

How does C++ process multiple inputs for an artificial neural network in real-time?
I'm assuming this is without using a spiking neural network, but a more traditional one (i.e. just a basic neural network as described here)
http://www.ai-junkie.com/ann/evolved/nnt1.html
Is this possible in a real-time world? I was thinking one would have to process either each input individually (which will always result in the same output, hence the dilemna), or accrue a certain # of inputs per time threshold and then process them at once...
then again, what does someone do with multiple instances of the same input? Process it twice?
I ask this because I'm looking at neuralbot, which I believe uses a normal neural network, but I'm trying to understand ANN's first before I delve into it, and am not sure how an ANN processes multiple inputs before processing target output(s).
Your question is not really clear but I'll try to answer. :)
You can see an ANN (Artificial Neural Network) like a particular case of Adaptive Filters.
There are three main elements:
A sequence of inputs x(n).
A Parametric Variable Filter. In this case the filter is the ANN and the parameters are the neurons weights.
An Update Algorithm that updates the filter parameters according to the error between desired an actual output. In ANN the most used update algorithm is the Backpropagation Algorithm.
In ANN there are two steps:
The Training Step. This is the hard part. You start with random neurons weights. You have a sequence of inputs and their desired outputs and you run the ANN with the update algorithm on. When the error is under a certain threshold you can say that your ANN is trained. This step is usually done off-line (not in real time).
The Execution Step. You have the trained ANN. Now just put in sequence the input in it and use the output. This is usually a fast operation and can be done in real-time (if it is what you mean).
Now.. what do you mean for "multiple inputs at once"? First of all standard computers can do only a very very small number of operations at once, standard PC has 4/8 cores so can do almost 4-8 operations at once. This number is too low for every real world ANN applications.
You said:
what does someone do with multiple instances of the same input? Process it twice?
The answer is Yes. The "Execution Step" is so fast that there are no reason to don't do it. In the "Training Step" duplicated inputs can be removed before training starts (cause the training inputs are known a priori). So there are no problem in this. :)

Face Recognition Using Backpropagation Neural Network?

I'm very new in image processing and my first assignment is to make a working program which can recognize faces and their names.
Until now, I successfully make a project to detect, crop the detected image, make it to sobel and translate it to array of float.
But, I'm very confused how to implement the Backpropagation MLP to learn the image so it can recognize the correct name for the detected face.
It's a great honor for all experts in stackoverflow to give me some examples how to implement the Image array to be learned with the backpropagation.
It is standard machine learning algorithm. You have a number of arrays of floats (instances in ML or observations in statistics terms) and corresponding names (labels, class tags), one per array. This is enough for use in most ML algorithms. Specifically in ANN, elements of your array (i.e. features) are inputs of the network and labels (names) are its outputs.
If you are looking for theoretical description of backpropagation, take a look at Stanford's ml-class lectures (ANN section). If you need ready implementation, read this question.
You haven't specified what are elements of your arrays. If you use just pixels of original image, this should work, but not very well. If you need production level system (though still with the use of ANN), try to extract more high level features (e.g. Haar-like features, that OpenCV uses itself).
Have you tried writing your feature vectors to an arff file and to feed them to weka, just to see if your approach might work at all?
Weka has a lot of classifiers integrated, including MLP.
As I understood so far, I suspect the features and the classifier you have chosen not to work.
To your original question: Have you made any attempts to implement a neural network on your own? If so, where you got stuck? Note, that this is not the place to request a complete working implementation from the audience.
To provide a general answer on a general question:
Usually you have nodes in an MLP. Specifically input nodes, output nodes, and hidden nodes. These nodes are strictly organized in layers. The input layer at the bottom, the output layer on the top, hidden layers in between. The nodes are connected in a simple feed-forward fashion (output connections are allowed to the next higher layer only).
Then you go and connect each of your float to a single input node and feed the feature vectors to your network. For your backpropagation you need to supply an error signal that you specify for the output nodes. So if you have n names to distinguish, you may use n output nodes (i.e. one for each name). Make them for example return 1 in case of a match and 0 else. You could very well use one output node and let it return n different values for the names. Probably it would even be best to use n completely different perceptrons, i.e. one for each name, to avoid some side-effects (catastrophic interference).
Note, that the output of each node is a number, not a name. Therefore you need to use some sort of thresholds, to get a number-name relation.
Also note, that you need a lot of training data to train a large network (i.e. to obey the curse of dimensionality). It would be interesting to know the size of your float array.
Indeed, for a complex decision you may need a larger number of hidden nodes or even hidden layers.
Further note, that you may need to do a lot of evaluation (i.e. cross validation) to find the optimal configuration (number of layers, number of nodes per layer), or to find even any working configuration.
Good luck, any way!

Real time plotting/data logging

I'm going to write a program that plots data from a sensor connected to the computer. The sensor value is going to be plotted as a function of the time (sensor value on the y-axis, time on the x-axis). I want to be able to add new values to the plot in real time. What would be best to do this with in C++?
Edit: And by the way, the program will be running on a Linux machine
Are you particularly concerned about the C++ aspect? I've done 10Hz or so rate data without breaking a sweat by putting gnuplot into a read/plot/refresh loop or with LiveGraph with no issues.
Write a function that can plot a std::deque in a way you like, then .push_back() values from the sensor onto the queue as they come available, and .pop_front() values from the queue if it becomes too long for nice plotting.
The exact nature of your plotting function depends on your platform, needs, sense of esthetics, etc.
You can use ring buffers. In such buffer you have read position and write position. This way one thread can write to buffer and other read and plot a graph. For efficiency you usually end up writing your own framework.
Size of such buffer can be estimated using eg.: data delivery speed from sensor (40KHz?), size of one probe and time span you would like to keep for plotting purposes.
It also depends whether you would like to store such data uncompressed, store rendered plot - all for further offline analysis. In non-RTOS environment your "real-time" depends on processing speed: how fast you can retrieve/store/process and plot data. Usually it is near-real time efficiency.
You might want to check out RRDtool to see whether it meets your requirements.
RRDtool is a high performance data logging and graphing system for time series data.
I did a similar thing for a device that had a permeability sensor attached via RS232.
package bytes received from sensor into packets
use a collection (mainly a list) to store them
prevent the collection to go over a fixed size by trashing least recent values before new ones arrive
find a suitable graphics library to draw with (maybe SDL if you wanna keep it easy and cross-platform), but this choice depends on what kind of graph you need (ncurses may be enough)
last but not least: since you are using a sensor I suppose your approach will be multi-threaded so think about it and use a synchronized collection or a collection that allows adding values when other threads are retrieving them (so forgot iterators, maybe an array is enough)
Btw I think there are so many libraries, just search for them:
first
second
...
I assume that you will deploy this application on a RTOS. But, what will be the data rate and what are real-time requirements! Therefore, as written above, a simple solution may be more than enough. But, if you have hard-real time constraints everything changes drastically. A multi-threaded design with data pipes may solve your real-time problems.