I have recently learnt in lectures that there are three defining characteristic to a linear program:
There must be an objective function to be maximised or minimised.
There must be a system of linear constraints.
The variables in the system must be non-negative.
My question is, is there a limit on the number of constraints that must be present for the system to be a linear program, or can there be as few as one constraint?
My initial thought was that you would need as many constrains as you do variables in order to construct a feasible region, but the lecture slides don't talk on a minimum requirement for the number of constraints in the system.
Related
C++ tries to use the concept of time complexity in the specification of many library functions, but asymptotic complexity is a mathematical construct based on asymptotic behavior when the size of inputs and the values of numbers tend to infinity.
Obviously the size of scalars in any given C++ implementation is finite.
What is the official formalization of complexity in C++, compatible with the finite and bounded nature of C++ operations?
Remark: It goes without saying that for a container or algorithm based on a type parameter (as in the STL), complexity can only be expressed in term of number of user provided operations (say a comparison for sorted stuff), not in term of elementary C++ language operations. This is not the issue here.
EDIT:
Standard quote:
4.6 Program execution [intro.execution]
1 The semantic descriptions in this International Standard define a
parameterized nondeterministic abstract machine. This International
Standard places no requirement on the structure of conforming
implementations. In particular, they need not copy or emulate the
structure of the abstract machine. Rather, conforming implementations
are required to emulate (only) the observable behavior of the abstract
machine as explained below.
2 Certain aspects and operations of the abstract machine are described
in this International Standard as implementation-defined (for example,
sizeof(int)). These constitute the parameters of the abstract machine. [...]
The C++ language is defined in term of an abstract machine based on scalar types like integer types with a finite, defined number of bits and only so many possible values. (Dito for pointers.)
There is no "abstract" C++ where integers would be unbounded and could "tend to infinity".
It means in the abstract machine, any array, any container, any data structure is bounded (even if possibly huge compared to available computers and their minuscule memory (compared to f.ex. a 64 bits number).
Obviously the size of scalars in any given C++ implementation is finite.
Of course, you are correct with this statement! Another way of saying this would be "C++ runs on hardware and hardware is finite". Again, absolutely correct.
However, the key point is this: C++ is not formalized for any particular hardware.
Instead, it is formalized against an abstract machine.
As an example, sizeof(int) <= 4 is true for all hardware that I personally have ever programmed for. However, there is no upper bound at all in the standard regarding sizeof(int).
What does the C++ standard state the size of int, long type to be?
So, on a particular hardware the input to some function void f(int) is indeed limited by 2^31 - 1. So, in theory one could argue that, no matter what it does, this is an O(1) algorithm, because it's number of operations can never exceed a certain limit (which is the definition of O(1)). However, on the abstract machine there literally is no such limit, so this argument cannot hold.
So, in summary, I think the answer to your question is that C++ is not as limited as you think. C++ is neither finite nor bounded. Hardware is. The C++ abstract machine is not. Hence it makes sense to state the formal complexity (as defined by maths and theoretical CS) of standard algorithms.
Arguing that every algorithm is O(1), just because in practice there are always hardware limits, could be justified by a purely theoretical thinking, but it would be pointless. Even though, strictly speaking, big O is only meaningful in theory (where we can go towards infinity), it usually turns out to be quite meaningful in practice as well, even if we cannot go towards infinity but only towards 2^32 - 1.
UPDATE:
Regarding your edit: You seem to be mixing up two things:
There is no particular machine (whether abstract or real) that has an int type that could "tend to infinity". This is what you are saying and it is true! So, in this sense there always is an upper bound.
The C++ standard is written for any machine that could ever possibly be invented in the future. If someone creates hardware with sizeof(int) == 1000000, this is fine with the standard. So, in this sense there is no upper bound.
I hope you understand the difference between 1. and 2. and why both of them are valid statements and don't contradict each other. Each machine is finite, but the possibilities of hardware vendors are infinite.
So, if the standard specifies the complexity of an algorithm, it does (must do) so in terms of point 2. Otherwise it would restrict the growth of hardware. And this growth has no limit, hence it makes sense to use the mathematical definition of complexity, which also assumes there is no limit.
asymptotic complexity is a mathematical construct based on asymptotic behavior when the size of inputs and the values of numbers tend to infinity.
Correct. Similarly, algorithms are abstract entities which can be analyzed regarding these metrics within a given computational framework (such as a Turing machine).
C++ tries to use the concept of time complexity in the specification of many library functions
These complexity specifications impose restrictions on the algorithm you can use. If std::upper_bound has logarithmic complexity, you cannot use linear search as the underlying algorithm, because that has only linear complexity.
Obviously the size of scalars in any given C++ implementation is finite.
Obviously, any computational resource is finite. Your RAM and CPU have only finitely many states. But that does not mean everything is constant time (or that the halting problem is solved).
It is perfectly reasonable and workable for the standard to govern which algorithms an implementation can use (std::map being implemented as a red-black-tree in most cases is a direct consequence of the complexity requirements of its interface functions). The consequences on the actual "physical time" performance of real-world programs are neither obvious nor direct, but that is not within scope.
Let me put this into a simple process to point out the discrepancy in your argument:
The C++ standard specifies a complexity for some operation (e.g. .empty() or .push_back(...)).
Implementers must select an (abstract, mathematical) algorithm that fulfills that complexity criterion.
Implementers then write code which implements that algorithm on some specific hardware.
People write and run other C++ programs that use this operation.
You argument is that determining the complexity of the resulting code is meaningless because you cannot form asymptotes on finite hardware. That's correct, but it's a straw man: That's not what the standard does or intends to do. The standard specifies the complexity of the (abstract, mathematical) algorithm (point 1 and 2), which eventually leads to certain beneficial effects/properties of the (real-world, finite) implementation (point 3) for the benefit of people using the operation (point 4).
Those effects and properties are not specified explicitly in the standard (even though they are the reason for those specific standard stipulations). That's how technical standards work: You describe how things have to be done, not why this is beneficial or how it is best used.
Computational complexity and asymptotic complexity are two different terms. Quoting from Wikipedia:
Computational complexity, or simply complexity of an algorithm is the amount of resources required for running it.
For time complexity, the amount of resources translates to the amount of operations:
Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform.
In my understanding, this is the concept that C++ uses, that is, the complexity is evaluated in terms of the number of operations. For instance, if the number of operations a function performs does not depend on any parameter, then it is constant.
On the contrary, asymptotic complexity is something different:
One generally focuses on the behavior of the complexity for large n, that is on its asymptotic behavior when n tends to the infinity. Therefore, the complexity is generally expressed by using big O notation.
Asymptotic complexity is useful for the theoretical analysis of algorithms.
What is the official formalization of complexity in C++, compatible with the finite and bounded nature of C++ operations?
There is none.
Is it possible to write decision-making models in either Stan or PyMC3? By that I mean: we define not only the distribution of random variables, but also the definition of decision and utility variables, and determine the decisions maximizing expected utility.
My understanding is that Stan is more of a general optimizer than PyMC3, so that suggests decision models would be more directly implemented in it, but I would like to hear what people have to say.
Edit: While it is possible to enumerate all decisions and compute their corresponding expected utility, I am wondering about more efficient methods since the number of decisions could be combinatorially too many (for example, how many items to buy from a list with thousands of products). Influence diagram algorithms exploit factorizations in the model to identify independences that allow computing of the decisions on only a smaller set of relevant random variables. I wonder if either Stan or PyMC3 do that kind of thing.
The basic steps for Bayesian decision theory are:
Enumerate a finite set of decisions that could be made
Specify a utility function of the decision and perhaps other things
Draw from the posterior distribution of all the unknowns given the known data
Evaluate the utility function for each possible decision and each posterior draw
Make the decision with the highest expected utility, averaging over the posterior draws.
You can do those five steps with any software --- Stan and PyMC3 included --- that produces (valid) draws from the posterior distribution. In Stan, the utility function should be evaluated in the generated quantities block.
I was just curious to have a better control over outcome of the SVM.
Tried to search the documentation, but couldn't find a function that seems to do the same.
One could say that SVM does not have hidden nodes, but this is only partially true.
SVM, originally, were called Support Vector Networks (this is what Vapnik himself called them), and they were seen as a kind of neural networks with a single hidden layer. Due to the popularity of neural networks in this time, many people till this day use sigmoid "kernel" even though it is rarely a valid Mercer's kernel (only because NN community was so used to using it they started doing so even though it has no mathematical justification).
So is SVM a neural net or not? Yes, it can be seen as a neural network. In fact, many classifiers can be seen through such prism. However, what makes SVM really different is the way they are trained and parametrized. In particular, SVMs work with "activation functions" which are valid Mercer's kernels (they denote dot product in some space). Furthermore, weights of the hidden nodes are equal to training samples, thus you get the same amount of hidden units as you have training examples. During training, SVM, on its own, reduces number of hidden units through solving an optimization problem which "prefers" sparse solutions (removal of hidden units), thus ending up with the hidden layer consisting of the subset of training samples, we call them support vectors. To underline, this is not a classical view of SVMs, but it is a valid perspective, which might be more easy to understand by someone from NN community.
So can you control this number? Yes and no. No, because SVM needs all this hidden units to have a valid optimization problem, and it will remove all redundant ones on its own. Yes, because there is an alternative optimization problem, called nu-SVM, which uses nu-hyperparamer, which is lower bound of support vectors, thus lower bound of hidden units. You cannot, unfortunately, directly specify the upper bound.
But I really need to! If this is the case, you can go with approximate solutions which will follow your restriction. You can use H-dimensional sampler which approximate the kernel space explicitely (http://scikit-learn.org/stable/modules/kernel_approximation.html). One of such methods is Nystroem method. In short terms, if you want to have "H hidden units" you simply fit Nystroem model to produce H dimensional output, you transfrom your input data through it, and fit linear SVM on top. This, from mathematical perspective** is approximating true non-linear SVM with a given kernel, however quite slowly.
I'm currently trying to find good parameters for my program (about 16 parameters and execution of the program takes about a minute). Evolutionary algorithms seemed like a nice idea and I wanted to see how they perform.
Unfortunately I don't have a good fitness function because the variance of my objective function is very high (I can not run it often enough without waiting until 2016). I can, however, compute which set of parameters is better (test two configurations against each other). Do you know if there are evolutionary algorithms that only use that information? Are there other optimization techniques more suitable? For this project I'm using C++ and MATLAB.
// Update: Thank you very much for the answers. Both look promising but I will need a few days to evaluate them. Sorry for the delay.
If your pairwise test gives a proper total ordering, i.e. if a >= b, and b >= c implies a >= c, and some other conditions . Then maybe you can construct a ranking objective on the fly, and use CMA-ES to optimize it. CMA-ES is an evolutionary algorithm and is invariant to order preserving transformation of function value, and angle-preserving transformation of inputs. Furthermore because it's a second order method, its convergence is very fast comparing to other derivative-free search heuristics, especially in higher dimensional problems where random search like genetic algorithms take forever.
If you can compare solutions in a pairwise fashion then some sort of tournament selection approach might be good. The Wikipedia article describes using it for a genetic algorithm but it is easily applied to an evolutionary algorithm. What you do is repeatedly select a small set of solutions from the population and have a tournament among them. For simplicity the tournament size could be a power of 2. If it was 8 then pair those 8 up at random and compare them, selecting 4 winners. Pair those up and select 2 winners. In a final round -- select an overall tournament winner. This solution can then be mutated 1 or more times to provide member(s) for the next generation.
I have a general question about SCIP. I need to use the SCIP as a Branch and Price framework for my problem, I code in c++ so I used the VRP example as a template. On some of the instances, the code stops at the fractional solution and returns that as a optimal solution, I think something is wrong, do I have to set some parameters in order to tell SCIP look for integer solution or I made a mistake, I believe it should not stop and instead branch on the fractional solution until it reaches the integer solution (without any other negative reduced cost column). I also solve the subproblem optimally! any commenets?!
If you define your variables to be continous and just add a pricer, SCIP will solve the master problem to optimality (i.e., solve the restricted master, add improving columns, solve the updated restricted master, and so on, until no more improving columns were found).
There is no reason for SCIP to check if the solution is integral, because you explicitly said that you don't mind whether the values of the variables are integral or not (by defining them to be continuous). On the other hand, if you define the variables to be of integral (or binary) type, SCIP will do exactly as I described before, but at the end check whether all integral variables have an integral value and branch if this is not the case.
However, you should note that all branching rules in SCIP do branching on variables, i.e., they take an integer variable with fractional value and split its domain; a binary variable would be fixed to 0 and 1 in the two child nodes. This is typically a bad idea for branch-and-price: first of all, it's quite unbalanced. You have a huge number of variables out of which only few will have value 1 in the end, most will be 0. Fixing a variable to 1 therefore has a high impact, while fixing it to 0 has almost no impact. But more importantly, you need to take the branching decision into account in your pricing problem. If you fixed a variable to 0, you have to keep the pricer from generating a copy of the forbidden column (which would probably improve the LP solution, because it was part of the former optimal solution). In order to to this, you might need to look for the 2nd (or later k)-best solution. Since you are solving the pricing problems as a MIP with SCIP, you might just add a constraint forbidding this solution (logicor (linear) for binary variables or bounddisjunction (not linear) for general integer variables).
I would recommend to implement your own branching rule, which takes into account that you are doing branch-and-price and branches in a way that is more balanced and does not harm your pricing too much. For an example, check out the Ryan&Foster branching rule, which is the standard for binary problems with a set-partitioning master structure. This rule is implemented in Binpacking as well as the Coloring example shipped with SCIP.
Please also check out the SCIP FAQ, where there is a whole section about branch-and-price which also covers the topic branching (in particular, how branching decisions can be stored and enforced by a constraint handler, which is something you need to do for Ryan&Foster branching): http://scip.zib.de/doc/html/FAQ.php
There were also a lot of questions about branch-and-price on the SCIP mailing list
http://listserv.zib.de/mailman/listinfo/scip/. If you want to search it, you can use google and search for "site:listserv.zib.de scip search-string"
Finally, I would like to recommend to have a look at the GCG project: http://www.or.rwth-aachen.de/gcg/
It is an extension of SCIP to a generic branch-cut-and-price solver, i.e., you do not need to implement anything, you just put in an original formulation of your model, which is then reformulated by a Dantzig-Wolfe decomposition and solved via branch-cut-and-price. You can supply the structure for the reformulation, pricing problems are solved as a MIP (as you do it also), and there are also different branching rules. GCG is also part of the SCIP optimization suite and can be easily built within the suite.