Why is Sym2 (or Sym3) wavelet function different to Db2 (or Db3) wavelet function even they have same filter coefficients? - wavelet

I am wondering why the Sym2 (or Sym3) wavelet function is different from the Db2 (or Db3) wavelet function even though they have the same filter coefficients.
As I know, I can obtain a mother wavelet from filter coefficients by integrating over dilation equation.
And also as I know, the filter coefficients of Sym2 and Sym3 are exactly the same as Db2 and Db3, respectively.
Accordingly, I expect that the wavelet functions of Sym2 and Sym3 are the same as Db2 and Db3.
In fact, in some literature it is right. And also, SymN for the even number of N looks like a flip of DbN.
However, in other literature, it is wrong. Sym2 and Sym3 wavelet functions are exactly the same as Db2 and Db3, except for they are flipped. Also, SymN for the odd number of N is the flip of DbN. That is the absolute opposite way to the above.
I tried to figure out which explanation is right, but there was no clear answer.
Is it right that the wavelet function of Sym2 (or Sym3) is exactly the same as Db2 (or Db3)?
Which of N makes SymN be flipped from DbN? Even number or an odd number?
Thank you,

Related

Clarification re Principle Component Analysis

I do understand the principle component analysis. I know how to do it and what it actually does. I have applied PCA and my best result has shown to be two components. I do understand that each of my inputs are now contributing partially in each component. What I do not understand is how to feed the result of PCA (in my case 2 components ) to a machine learning model?
How do we input them?
For example when I want to run a NN on my features, I just can navigate to where they are stored and import them, but my PCA analysis has been run in SPSS and all it shows me is the contribution of my features on each component.
What should I import to my NN model?
PCA is a method of feature extraction, which is used to avoid the problem of co-linearity. For example, if several variables are highly correlated because "they measure the same thing", then PCA can extract a measure of "that thing" (technically: a component), which is called a score. Your data set of, say, 100 measured variables may reduce to, say, 10 significant components. Then you can use the scores your test persons have achieved in those 10 components to do for example a multi-dimensional regression, a cluster analysis or a discriminance analysis. This will result in more valid results than performing the analysis directly on the 100 variables.
So the procedure is to sort the eigenvalues (and -vectors) by size, identify the number of significant components p (e.g., by scree-plot), set up the projection matrix F (eigenvectors corresponding to the largest q eigenvalues in columns) and multiply it with the data matrix D. This will give you the score matrix C (dimension n times q, with n the number of test persons), which you can use as input for whatever method you want to use next.

How to use uncertainties to weight residuals in a Savitzky-Golay filter.

Is there a way to incorporate the uncertainties on my data set into the result of the Savitzky Golay fit? Since I am not passing this information into the function, I asume that it is simply calcuating the 'best fit' via an unweighted least-squares process. I am currently working with data that has non-uniform uncertainty, and so the fit of the data could be improved by including the errors that I have for my main dataset.
The wikipedia page for the Savitzky-Golay filter suggests how I might go about alter the process of calculating the coefficients of the fit, and I am staring at the code for scipy.signal.savgol_filter, but I cannot get my head around what I need to adjust so that this will do what I want it to.
Are there any ready-made weighted SG filters floating about? I find it hard to believe that no-one else has ever needed this tool in Python, but maybe I have missed something.
Check out this Python module: https://github.com/surhudm/savitzky_golay_with_errors
This python script improves upon the traditional Savitzky-Golay filter
by accounting for errors or covariance in the data. The inputs and
arguments are all modelled after scipy.signal.savgol_filter
Matlab function sgolayfilt supports weights. Check the documentation.

C++ FIR noise filter

I'm digging up some info about filtering the noise out of my IQ data samples in C++.
I have learned that this can be done by using a simple filter which calculates the average of last few data samples and applies it to the current sample.
Do you have any further experience with this kind of filtering or do you recommend using some existing FIR filtering library?
Thanks for your comments!
Unfortunately, it is not as simple as "just get some library and it will do all the work for you"; digital filters is a quite complicated subject.
It is easy to apply digital filter to your data only if your measurements come at fixed time intervals (known as "sample rate" in digital filters). Otherwise (if time intervals vary), it is not trivial to apply digital filters (and I suspect you might need FFT to do it, but I might be wrong here).
Digital filters (both IIR and FIR) are interesting in that as soon as you know coefficients, you don't really need a library, it is easy to write it yourself (see, for example, first picture here: https://en.wikipedia.org/wiki/Finite_impulse_response : looks simple, right?); it is finding coefficients which is tricky.
As a prerequisite to find out coefficients, you need to understand quite a lot about filters: you need to know what kind of filter you need (if it is after demodulation - you'll likely need low-pass, otherwise see comment by MSalters below), you need to understand what "corner frequency" is, and you need to realize how to map those frequencies to your samples (for example, you can say that your samples are coming once per second - or at any other rate, but this choice will affect your desired "corner frequency"). As soon as you've got this understanding of "what you need in terms of digital filters" - finding coefficients is quite easy, you can do it either in MatLab, or using online calculator, look for "digital filter calculator" in Google.

Regression Tree Forest in Weka

I'm using Weka and would like to perform regression with random forests. Specifically, I have a dataset:
Feature1,Feature2,...,FeatureN,Class
1.0,X,...,1.4,Good
1.2,Y,...,1.5,Good
1.2,F,...,1.6,Bad
1.1,R,...,1.5,Great
0.9,J,...,1.1,Horrible
0.5,K,...,1.5,Terrific
.
.
.
Rather than learning to predict the most likely class, I want to learn the probability distribution over the classes for a given feature vector. My intuition is that using just the RandomForest model in Weka would not be appropriate, since it would be attempting to minimize its absolute error (maximum likelihood) rather than its squared error (conditional probability distribution). Is that intuition right? Is there a better model to be using if I want to perform regression rather than classification?
Edit: I'm actually thinking now that in fact it may not be a problem. Presumably, classifiers are learning the conditional probability P(Class | Feature1,...,FeatureN) and the resulting classification is just finding the c in Class that maximizes that probability distribution. Therefore, a RandomForest classifier should be able to give me the conditional probability distribution. I just had to think about it some more. If that's wrong, please correct me.
If you want to predict the probabilities for each class explicitly, you need different input data. That is, you would need to replace the value to predict. Instead of one data set with the class label, you would need n data sets (for n different labels) with aggregated data for each unique feature vector. Your data would look something like
Feature1,...,Good
1.0,...,0.5
0.3,...,1.0
and
Feature1,...,Bad
1.0,...,0.8
0.3,...,0.1
and so on. You would need to learn one model for each class and run them separately on any data to be classified. That is, for each label you learn a model to predict a number that is the probability of being in that class, given a feature vector.
If you don't need the probabilities to be predicted explicitly, have a look at the Bayesian classifiers in Weka, which make use of probabilities in the models that they learn.

Simple Curve Fitting Implimentation in C++ (SVD Least Sqares Fit or similar)

I have been scouring the internet for quite some time now, trying to find a simple, intuitive, and fast way to approximate a 2nd degree polynomial using 5 data points.
I am using VC++ 2008.
I have come across many libraries, such as cminipack, cmpfit, lmfit, etc... but none of them seem very intuitive and I have had a hard time implementing the code.
Ultimately I have a set of discrete values put in a 1D array, and I am trying to find the 'virtual max point' by curve fitting the data and then finding the max point of that data at a non-integer value (where an integer value would be the highest accuracy just looking at the array).
Anyway, if someone has done something similar to this, and can point me to the package they used, and maybe a simple implementation of the package, that would be great!
I am happy to provide some test data and graphs to show you what kind of stuff I'm working with, but I feel my request is pretty straightforward. Thank you so much.
EDIT: Here is the code I wrote which works!
http://pastebin.com/tUvKmGPn
change size to change how many inputs are used
0 0
1 1
2 4
4 16
7 49
a: 1 b: 0 c: 0
Press any key to continue . . .
Thanks for the help!
Assuming that you want to fit a standard parabola of the form
y = ax^2 + bx + c
to your 5 data points, then all you will need is to solve a 3 x 3 matrix equation. Take a look at this example http://www.personal.psu.edu/jhm/f90/lectures/lsq2.html - it works through the same problem you seem to be describing (only using more data points). If you have a basic grasp of calculus and are able to invert a 3x3 matrix (or something nicer numerically - which I am guessing you do given you refer specifically to SVD in your question title) then this example will clarify what you need to do.
Look at this Wikipedia page on Poynomial Regression