OpenMDAOv1+: Is there a way to finite difference over a single variable in a component? - gradient

I'm pretty sure I already know the answer to this, but is there a way to finite difference over a single variable in a component that already provides the derivatives of all its other variables? The only way I can come up with is to hard code my own finite differenced gradient for the single variable within the component in question so that openmdao sees the result as a provided gradient.

You are right, that is the only way to do it right now. We have talked about adding an option to automatically fill in the missing derivatives with finite difference ones, but I think it will be a while before that is implemented.

Related

Is there a simple way to reduce the value of positive slack variables in MILP?

Recently, I have been learning optimization, and my optimization problem, (minimization), is encoded in a MILP solver which tells me it's infeasible for my model. Hence, I introduced a few positive/negative slack variables.
Now, I get a feasible solution, but the positive slack variables are way bigger than what I can accept.
So, I gave penalties/weights to those variables (multiplied by large numbers), hoping that the MILP solver would reduce the variables, but that didn't work (I got the same solution)
Is there any approach to follow, in general, when the slack is too large?
Is there a better way to pick the slack variables, in general?
A common pitfall for people new to mathematical programming/optimization is that variables are non-negative by default, that is, they always have an implied lower bound of 0. Your mathematical model may not specify this explicitly, so those variables might need to be declared as free (with a lower bound of -infinity).
In general, you should double-check your model (as LP file) and compare it to the mathematical formulation.
Add both to the objective with a penalty coefficient.
Or add some upper bounds to the slacks.

Best way to feature select using PCA (discussion)

Terminology:
Component: PC
loading-score[i,j]: the j feature in PC[i]
Question:
I know the question regarding feature selection is asked several times here at StackOverflow (SO) and on other tech-pages, and it proposes different answers/discussion. That is why I want to open a discussion for the different solutions, and not post it as a general question since that has been done.
Different methods are proposed for feature selection using PCA: For instance using the dot product between the original features and the components (here) to get their correlation, a discussion at SO here suggests that you can only talk about important features as loading-scores in a component (and not use that importance in the input space), and another discussion at SO (which I cannot find at the moment) suggest that the importance for feature[j] would be abs(sum(loading_score[:,j]) i.e the sum of the absolute value of loading_score[i,j] for all i components.
I personally would think that a way to get the importance of a feature would be an absolute sum where each loading_score[i,j] is weighted by the explained variance of component i i.e
imp_feature[j]=sum_i (abs(loading_score[i,j])*explained_variance[i].
Well, there is no universal way to select features; it totally depends on the dataset and some insights available about the dataset. I will provide some examples which might be helpful.
Since you asked about PCA, initially it separates the whole dataset into two sets under which the variances. On the other ICA (Independent Component Analysis) is able to extract multiple features simultaneously. Look at this example,
In this example, we mix three independent signals and try to separate out them using ICA and PCA. In this case, ICA is doing it a better way than PCA. In general, if you search Blind Souce Separation (BSS) you may find more information about this. Besides, in this example, we know the independent components thus, separation is easy. In general, we do not know the number of components. Hence, you may have to guess based on some prior information about the dataset. Also, you may use LDA (Linear Discriminate Analysis) to reduce the number of features.
Once you extract PC components using any of the techniques, following way we can visualize it. If we assume, those extracted components as random variables i.e., x, y, z
More information about you may refer to this original source where I took about two figures.
Coming back to your proposition,
imp_feature[j]=sum_i (abs(loading_score[i,j])*explained_variance[i]
I would not recommend this way due to the following reasons:
abs(loading_score[i,j]) when we get absolute values you may loose positive or negative correlations of considered features. explained_variance[i] may be used to find the correlation between features, but multiplying does not make any sense.
Edit:
In PCA, each component has its explained variance. Explained variance is the ratio between individual component variance and total variance(sum of all individual components variances). Feature significance can be measured by magnitude of explained variance.
All in all, what I want to say, feature selection totally depends on the dataset and the significance of features. PCA is just one technique. Frist understand the properties of features and the dataset. Then, try to extract features. Hope this helps. If you can provide us with an exact example, we may provide more insights.

Is there a way to quantify impact of independent variables with gradient boosting?

I've been asked to run a model using gradient boosting or random forest. So far so good, however, the only output that comes back in terms of variable importance is based on the number of times a variable was used as a branch rule. I've now been asked to basically get coefficients or somehow quantify the impact that the variables have on the target.
Is there a way to do this with a gradient boosting model? My other thoughts were to either use only the variables that were showed to be sued as branch rules in a regular decision tree or in a GLM or regular regression model.
Any help or ides would be appreciated!! Thanks so much!
Just to make certain there is not a misunderstanding: SAS implementation of decision tree/gradient boosting (at least in EM) uses Split-Based variable Importance.
Split-Based Importance does NOT count the number splits made.
It is the ratio of the reduction of sum-of-squares by one variable (specific the sum over all splits by this variable) in relation to the reduction of sum-of-squares achieved by all splits in the model.
If you are using surrogate rules, highly correlated variables will receive roughly the same value.

How can I choose the right numerical solution from NEQNF?

I'm using a function (NEQNF manual page here) which I call using
call neqnf(SYSTEM_OF_EQUATIONS, x, xguess=x_GUESS, itmax = 10000)
where SYSTEM_OF_EQUATIONS is the subroutine that contains equations
f(1)=...x(2)...x(1)...
f(2)=...x(1)...x(4)...
f(3)=...x(3)...x(4)...
f(4)=...x(1)...x(5)...
f(5)=...x(1)...x(5)...
from IMSL libraries on Fortran that lets me to solve a non-linear system with five unknowns in five equations. Because there exists more than one solution (couple of five numbers, real or complex, that solve my system), how can I choose which couple to "use" as solution?
I link an online solver with already entered a piece of my system (only two unknowns in two equations, other variables are constant in this example) which easily show you that there exists more than one solution.
example
To conclude my issue I can say that I have to choose the couple of variables which let other variables to be positive, so an easy check is the way to choose the couple.
I don't think the question has anything to do with programming, but I will show how I understand the problem.
You supply an initial guess. Then the method just converges to some solution by a modification of a Newton method.
You can choose the root by the placement of the initial guess. However, the convergence pattern can be very unpredictable (even fractal - https://en.wikipedia.org/wiki/Newton_fractal ) and it may be very difficult to choose the particular root using the initial guess.

how to implement a 'nested' cost function in Gecode?

I am new to gecode and constraint programming in general.
So far, I haven't had much trouble picking up gecode, it's great. But I was wondering what is the best way to perform a "nested" cost function. Specifically, I am looking to minimize X, but within the space of solutions for which X is equal, prefer solutions which minimize Y? I could probably hack it by defining a cost function that looks like X*large_number+Y, but I'd prefer to do this properly if there's a good solution.
If anyone can point me to explain how to implement this in Gecode, that would be really helpful. Thanks!
You can define any kind of optimization criteria using the constrain member in a space in Gecode. See Section 2.5 in Modeling and Programming with Gecode for an example. In your case, the straight forward way would be to add a constrain member that adds a lexicographic constraint between the previous best solutions answer and the current space.
That being said, in general optimizing based on a lexicographic order can be wasteful (too much searching). It may often be better to first run a search optimizing the first component (X in your case). After that, re-run the search with the first components value fixed (X set to best possible value), and optimize the second value (Y in your case). Iterate as needed for all elements in the cost.