Polynomial Regression with N features and degree M on C++ - c++

I'm new in ML and I was trying this problem https://www.hackerrank.com/challenges/predicting-office-space-price. One of the observation that they made is that
"The prices per square foot, are (approximately) a polynomial function
of the features in the observation table. This polynomial always has
an order less than 4"
So I guess the solution would be a applying a polynomial regression, and I have found a lot of (confusing info) about this but with only 2 features. But in this case they can be at most 5 features, so the answer can be a polynomial like: ax^5+bx^2*y^3+c*z^2*x...
So is seems more difficult to find a way for creating or evaluating this polynomial, in a function like:
float eval(vector<float> x, vector<float> o)
And with this I was hoping to use the same gradient decent that I use on linear regression to minimize the cost function.
Am I doing this right? I right to use polynomial regression? How should I go create and eval that polynomial?

Related

Factorization of univariate polynomials over some number field with Sympy

I am working on factoring multivariate polynomials over some extension fields, using Sympy.
If I can factor univariate polynomials over the reals, I think I would have a working code. For my code this bring me down to factoring a univariate polynomial over 'QQ', and if needed, over some number field.
My approach now is to define these univariate polynomials over 'QQ', then look at the roots and decide for each root if it is real or not. If it is real, I add needed terms to 'QQ' and then ask Sympy to factor. This means that I try to automate the following steps:
f=Poly((x^2-3)*(x^2-5),x,domain='QQ')
solve(f,x)
(gives [-sqrt(3),sqrt(3),-sqrt(5),sqrt(5)])
f.factor(f,extension=[sqrt(3),sqrt(5)])
(..or some other way, but with similar steps and runtime I think)
This ofcourse has a very long runtime, since you are sort of twice calculating the factors. And there are also a lot of exceptions I need to think about as well.
Long story short: is there a way to ask Sympy to factor a polynomial over 'QQ' and allowing it to make some extensions if needed?
Is there something like f.factor(numberfield=True)?
Thank you in advance!!
This is planned, but not implemented yet (as of version 1.2). See Factoring polynomials into linear factors (emphasis mine):
Currently SymPy can factor polynomials into irreducibles over various domains, which can result in a splitting factorization (into linear factors). However, there is currently no systematic way to infer a splitting field (algebraic number field) automatically. In future the following syntax will be implemented:
factor(x**3 + x**2 - 7, split=True)
Note this is different from extension=True, because the later only tells how expression parsing should be done, not what should be the domain of computation. One can simulate the split keyword for several classes of polynomials using solve() function.
... where the last sentence refers to what you are doing now.

Finding an optimal solution to a system of linear equations in c++

Here's the problem:
I am currently trying to create a control system which is required to find a solution to a series of complex linear equations without a unique solution.
My problem arises because there will ever only be six equations, while there may be upwards of 20 unknowns (usually way more than six unknowns). Of course, this will not yield an exact solution through the standard Gaussian elimination or by changing them in a matrix to reduced row echelon form.
However, I think that I may be able to optimize things further and get a more accurate solution because I know that each of the unknowns cannot have a value smaller than zero or greater than one, but it is free to take on any value in between them.
Of course, I am trying to create code that would find a correct solution, but in the case that there are multiple combinations that yield satisfactory results, I would want to minimize Sum of (value of unknown * efficiency constant) over all unknowns, i.e. Sigma[xI*eI] from I=0 to n, but finding an accurate solution is of a greater priority.
Performance is also important, due to the fact that this algorithm may need to be run several times per second.
So, does anyone have any ideas to help me on implementing this?
Edit: You might just want to stick to linear programming with equality and inequality constraints, but here's an interesting exact solution that does not incorporate the constraint that your unknowns are between 0 and 1.
Here's a powerpoint discussing your problem: http://see.stanford.edu/materials/lsoeldsee263/08-min-norm.pdf
I'll translate your problem into math to make things a bit easier to figure out:
you have a 6x20 matrix A and a vector x with 20 elements. You want to minimize (x^T)e subject to Ax=y. According to the slides, if you were just minimizing the sum of x, then the answer is A^T(AA^T)^(-1)y. I'll take another look at this as soon as I get the chance and see what the solution is to minimizing (x^T)e (ie your specific problem).
Edit: I looked in the powerpoint some more and near the end there's a slide entitled "General norm minimization with equality constraints". I am going to switch the notation to match the slide's:
Your problem is that you want to minimize ||Ax-b||, where b = 0 and A is your e vector and x is the 20 unknowns. This is subject to Cx=d. Apparently the answer is:
x=(A^T A)^-1 (A^T b -C^T(C(A^T A)^-1 C^T)^-1 (C(A^T A)^-1 A^Tb - d))
it's not pretty, but it's not as bad as you might think. There's really aren't that many calculations. For example (A^TA)^-1 only needs to be calculated once and then you can reuse the answer. And your matrices aren't that big.
Note that I didn't incorporate the constraint that the elements of x are within [0,1].
It looks like the solution for what I am doing is with Linear Programming. It is starting to come back to me, but if I have other problems I will post them in their own dedicated questions instead of turning this into an encyclopedia.

Weighted linear least square for 2D data point sets

My question is an extension of the discussion How to fit the 2D scatter data with a line with C++. Now I want to extend my question further: when estimating the line that fits 2D scatter data, it would be better if we can treat each 2D scatter data differently. That is to say, if the scatter point is far away from the line, we can give the point a low weighting, and vice versa. Therefore, the question then becomes: given an array of 2D scatter points as well as their weighting factors, how can we estimate the linear line that passes them? A good implementation of this method can be found in this article (weighted least regression). However, the implementation of the algorithm in that article is too complicated as it involves matrix calculation. I am therefore trying to find a method without matrix calculation. The algorithm is an extension of simple linear regression, and in order to illustrate the algorithm, I wrote the following MATLAB codes:
function line = weighted_least_squre_for_line(x,y,weighting);
part1 = sum(weighting.*x.*y)*sum(weighting(:));
part2 = sum((weighting.*x))*sum((weighting.*y));
part3 = sum( x.^2.*weighting)*sum(weighting(:));
part4 = sum(weighting.*x).^2;
beta = (part1-part2)/(part3-part4);
alpha = (sum(weighting.*y)-beta*sum(weighting.*x))/sum(weighting);
a = beta;
c = alpha;
b = -1;
line = [a b c];
In the above codes, x,y,weighting represent the x-coordinate, y-coordinate and the weighting factor respectively. I test the algorithm with several examples but still not sure whether it is right or not as this method gets a different result with Polyfit, which relies on matrix calculation. I am now posting the implementation here and for your advice. Do you think it is a right implementation? Thanks!
If you think it is a good idea to downweight points that are far from the line, you might be attracted by http://en.wikipedia.org/wiki/Least_absolute_deviations, because one way of calculating this is via http://en.wikipedia.org/wiki/Iteratively_re-weighted_least_squares, which will give less weight to points far from the line.
If you think all your points are "good data", then it would be a mistake to weight them naively according to their distance from your initial fit. However, it's a fairly common practice to discard "outliers": if a few data points are implausibly far from the fit, and you have reason to believe that there's an error mechanism that could generate a small subset of "bad" datapoints, you could simply remove the implausible points from the dataset to get a better fit.
As far as the math is concerned, I would recommend biting the bullet and trying to figure out the matrix math. Perhaps you could find a different article, or a book which has a better presentation. I will not comment on your Matlab code, except to say that it looks like you will have some precision problems when subtracting part4 from part3, and probably part2 from part1 as well.
Not exactly what you are asking for, but you should look into robust regression. MATLAB has the function robustfit (requires Statistics Toolbox).
There is even an interactive demo you can play with to compare regular linear regression vs. robust regression:
>> robustdemo
This shows that in the presence of outliers, robust regression tends to give better results.

Linear form of function (a/b) for ampl/cplex

I am trying to solve a minimisation problem and I want to minimise an expression
a/b
Where both a & b are variables. Hence this is not a linear problem...
How can I transform this function into an other one (being a linear one).
There is a detailed section on how to handle ratios in Linear Programming on the lpsolve site. It should be general enough to apply to AMPL and CPLEX as well.
There are several ways to do this, but the simplest to explain requires that you solve a series of linear programs. First, remove the objective and add a constraint
a <= c * b
Where c is a known upper bound on the solution. Then do a binary search on c you can a range where c_l, c_u where the problem is infeasible for
a <= c_l * b
but feasible for
a <= c_u * b
The general form of the obj should be a linear fractional function, something like f_{0}(x)=(c^Tx+d)/(e^Tx+f). For your case, X=(a,b),c=(1,0),(e=0,1),d=f=0.
To solve this kind of opt, something called linear fractional programming can be used. it's like linear constrainted version of linear fractional function and Charnes-Cooper transformation is applied to transform into a LP. You can find the main idea from wiki. Many OR books talk more about this such as pp53, pp165 in the Boyd's "convex optimization" (free to download).

How to implement Horner's scheme for multivariate polynomials?

Background
I need to solve polynomials in multiple variables using Horner's scheme in Fortran90/95. The main reason for doing this is the increased efficiency and accuracy that occurs when using Horner's scheme to evaluate polynomials.
I currently have an implementation of Horner's scheme for univariate/single variable polynomials. However, developing a function to evaluate multivariate polynomials using Horner's scheme is proving to be beyond me.
An example bivariate polynomial would be: 12x^2y^2+8x^2y+6xy^2+4xy+2x+2y which would factorised to x(x(y(12y+8))+y(6y+4)+2)+2y and then evaluated for particular values of x & y.
Research
I've done my research and found a number of papers such as:
staff.ustc.edu.cn/~xinmao/ISSAC05/pages/bulletins/articles/147/hornercorrected.pdf
citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.8637&rep=rep1&type=pdf
www.is.titech.ac.jp/~kojima/articles/B-433.pdf
Problem
However, I'm not a mathematician or computer scientist, so I'm having trouble with the mathematics used to convey the algorithms and ideas.
As far as I can tell the basic strategy is to turn a multivariate polynomial into separate univariate polynomials and compute it that way.
Can anyone help me? If anyone could help me turn the algorithms into pseudo-code that I can implement into Fortran myself, I would be very grateful.
For two variables one can store the polynomial coefficients in a rank=2 matrix K(n+1,n+1) where n is the order of the polynomial. Then observe the following pattern (in pseudo-code)
p(x,y) = (K(1,1)+y*(K(1,2)+y*(K(1,3)+...y*K(1,n+1))) +
x*(K(2,1)+y*(K(2,2)+y*(K(2,3)+...y*K(2,n+1))) +
x^2*(K(3,1)+y*(K(3,2)+y*(K(3,3)+...y*K(3,n+1))) +
...
x^n*(K(n+1,1)+y*(K(n+1,2)+y*(K(n+1,3)+...y*K(n+1,n+1)))
Each row is a separate homer's scheme in terms of y and all-together is a final homer's scheme in terms of x.
To code in FORTRAN or any language create an intermediate vector z(n+1) such that
z(i) = homers(y,K(i,1:n+1))
and
p = homers(x,z(1:n+1))
where homers(value,vector) is an implementation of the single variable evaluation with polynomial coefficients stored in vector.