Multiple if and if not in list comprehension - list

I tried to look at similar questions but I'm really not understanding how I can accomplish this using the methods mentioned in the other questions.
So my problem is: I have one list from which I want to remove certain values. For instance:
a = [[[0,0],[0,1]],[[0,0],[0,1]]]
for y in range(2):
a[y][:] = [x for x in a[y] if not random.random() < s]
This removes the elements for which random.random() is below s (being s between 0 and 1). However, I only want this to happen if the second position of each element of the list (that is the [0,0] bit) is equal to 1. I tried multiple solutions (suggested around here for other questions) and I can't get it to work. Does anyone have any suggestion?

Another condition could be added to check the value of the second "bit" of x (x[1] == 0):
a = [[[0,0],[0,1]],[[0,0],[0,1]]]
for y in range(2):
a[y][:] = [x for x in a[y] if x[1] == 0 or random.random() >= 0.5]
This means that if x[1] == 0, then the pair is kept, regardless of a random value. Otherwise, it is kept only if random.random() >= 0.5.

Related

How to Formulate a Piecewise Step Function in pyomo

I have a question regarding the correct formulation of a piecewise step function in pyomo. I want to include in my model a single piecewise function of the form:
/ 1 , 0 <= X(t) <= 1
Z(X) = \ 0 , 1 <= X(t) <= 2
Where X is being fit to data over taken over a time domain and Z acts like a binary variable. The most similar example in pyomo documentation is the step.py example using INC. However, when solving with this formulation I observe the problem of the domain variable x ‘sticking’ to the breakpoint at x=1. I assume this is because (as noted in the documentation) Z can solve to the entire vertical line if continuous or is doubly feasible at both 0 and 1 if binary. Other formulations offered via the piecewise function (i.e. dlog, dcc, log, etc.) experience similar issues (in fact, based on the output to GAMS I’m pretty sure they don’t support binary/integer variables at all).
Is there a ‘correct’ way to formulate a piecewise function in pyomo that avoids the multiple-feasibility issue at the breakpoint, thus avoiding the domain variable converging to the breakpoint? I am using BARON with solvers cplex and ipopt, however my gut tells me this formulation issue can’t be solved by simply changing solvers.
I can also send a document illustrating my observations on why the current pyomo piecewise formulations don’t support binary variables, if it would help.
Here's some sample code where we try to minimise the sum of the step function Z.
model = ConcreteModel()
model.A = Set(initialize=[1,2,3])
model.B = Set(initialize=['J', 'K'])
model.x = Var(model.A, model.B, bounds=(0, 2))
model.z = Var(model.A, model.B, domain = Binary)
DOMAIN_PTS = [0,1,1,2]
RANGE_PTS = [1,1,0,0]
model.z_constraint = Piecewise(
model.A, model.B,
model.z, model.x,
pw_pts=DOMAIN_PTS,
pw_repn='INC',
pw_constr_type = 'EQ',
f_rule = RANGE_PTS,
unbounded_domain_var = True)
def objective_rule(model):
return sum(model.z[a,b] for a in model.A for b in model.B)
model.objective = Objective(rule = objective_rule, sense=minimize)
If you set sense = minimize above, the program will solve and give x = 1 for each index value. If you set sense = maximize, the program will solve and give x = 0 for each index value. I'm not too sure what you mean by stickiness, but I don't think this program does it. and it implements the step function.
This assumes that your z is not also indexed by time. If so, I would need to edit this answer:
model.t = RangeSet(*time*)
model.x = Var(model.t, bounds=(0, 2))
model.z = Var(domain=Binary)
model.d = Disjunction(expr=[
[0 <= model.x[t] for t in model.t] + [model.x[t] <= 1 for t in model.t],
[1 <= model.x[t] for t in model.t] + [model.x[t] <= 2 for t in model.t]
])
TransformationFactory('gdp.bigm').apply_to(model)
SolverFactory('baron').solve(model)

Compress dict sum statement with Python

In my python application I have a big list (now with almost 9000 indexes). I need to find the two most similar items in this list. So, what I have now is something like:
aux1 = 0
aux2 = 1
min_distance = 0xffff
weights = get_weights()
for i in range(0, len(_list)):
for j in range(i + 1, len(_list)):
obj1 = _list[i]
obj2 = _list[j]
dist = 0
for key in self.__fields:
dist += weights[key] * (obj1[key] - obj2[key]) ** 2
if dist < min_distance:
min_distance = dist
aux1 = i
aux2 = j
return aux1, aux2, min_distance
In the code, weights is a dict, obj1 and obj2 are both objects in which the __getitem__ is implemented and the return value also comes from a dict. And self.__fields is a list with the selected fields (it has now 9 items).
My problem is, this loop is taking too much time to complete. Even after 5 hours, the i variable still in the first 100th list items.
With this next silly code, I come to the conclusion that the problem is not the size of the list (the silly code finishes with 5 minutes of difference).
count = 0
total = 9000
for i in range(0, total):
for j in range(i + 1, total):
for k in range(0, 10):
count += 1
print("Count is " + str(count))
Therefore, the problem seems to be in the most internal loop of my code:
for key in self.__fields:
dist += weights[key] * (obj1[key] - obj2[key]) ** 2
I know Python, but I'm not a Python specialist. I conclude that the access to the values of three objects through their key is a slow operation. Some time ago, I saw in some blog that list comprehensions and/or lambda operations can be faster.
So, my question is: how do I make this most internal loop faster using list comprehensions and/or lambda? Feel free to give any other advice if you want.
Not sure whether it's any faster, but you could rewrite that code using itertools.combinations and get the min using a key function calculating the "distance".
from itertools import combinations
weights = get_weights()
aux1, aux2 = min(combinations(_list, 2),
key=lambda pair: sum(weights[key] * (pair[0][key] - pair[1][key]) ** 2
for key in self.__fields))
If this does not help, you might consider temporarily turning the dictionaries in _list into lists, holding just the values of the relevant fields. Instead of using dictionary lookup, you can then just zip those lists together with the weights. Afterwards, turm them back into dicts.
weights_list = [weights[f] for f in self.__fields]
as_lists = [[d[f] for f in self.__fields] for d in _list]
aux1, aux2 = min(combinations(as_lists, 2),
key=lambda pair: sum(w * (x - y) ** 2
for w, x, y in zip(weights_list, *pair)))
aux1, aux2 = (dict(zip(self.__fields, x)) for x in (aux1, aux2))
This should be a bit faster, but it will only work if the dicts do not have any other fields than those in self.__fields, otherwise the dicts can not be reconstructed from the lists (at least not as easily). Alternatively, you might use tuples instead of lists and use another dictionary to map those tuples to the original dictionaries...
Or try this, using the indices of the elements instead of the elements themselves (not tested):
idx1, idx2 = min(combinations(range(len(_list)), 2),
key=lambda pair: sum(w * (x - y) ** 2
for w, x, y in zip(weights_list, as_list[pair[0]], as_list[pair[1]])))
aux1, aux2 = _lists[idx1], _lists[idx2]

How to convert this into a set of linear constraints?

Given a 1-dimensional array of binary variables, for example
x = [0,1,0,0,1]
I would like to create a new variable y such that y <= max(x). In other words
y = 0 only if sum(x) = 0.
y = 1 only if sum(x) > 0.
How do I convert this into a set of linear constraints?
I know this must be possible because IBM CP Optimizer Suite can handle this automatically, but I don't have access to it.
Try something simple like y <= sum(x) which will force y to zero if all the x are zero.
Then for forcing y to 1 you have several choices. You could simply add a constraint that y >= x for every variable in x, or use a big M constraint like My >= sum(x) where M is some constant which is the maximum number of variables in x that can be simultaneously equal to 1. Adding the separate constraints might give a tighter linear relaxation, especially if there are many x variables.

Finding solution set of a Linear equation?

I need to find all possible solutions for this equation:
x+2y = N, x<100000 and y<100000.
given N=10, say.
I'm doing it like this in python:
for x in range(1,100000):
for y in range(1,100000):
if x + 2*y == 10:
print x, y
How should I optimize this for speed? What should I do?
Essentially this is a Language-Agnostic question. A C/C++ answer would also help.
if x+2y = N, then y = (N-x)/2 (supposing N-x is even). You don't need to iterate all over range(1,100000)
like this (for a given N)
if (N % 2): x0 = 1
else: x0 = 0
for x in range(x0, min(x,100000), 2):
print x, (N-x)/2
EDIT:
you have to take care that N-x does not turn negative. That's what min is supposed to do
The answer of Leftris is actually better than mine because these special cases are taken care of in an elegant way
we can iterate over the domain of y and calculate x. Also taking into account that x also has a limited range, we further limit the domain of y as [1, N/2] (as anything over N/2 for y will give negative value for x)
x=N;
for y in range(1,N/2-1):
x = x-2
print x, y
This just loops N/2 times (instead of 50000)
It doesn't even do those expensive multiplications and divisions
This runs in quadratic time. You can reduce it to linear time by rearranging your equation to the form y = .... This allows you to loop over x only, calculate y, and check whether it's an integer.
Lefteris E 's answer is the way to go,
but I do feel y should be in the range [1,N/2] instead of [1,2*N]
Explanation:
x+2*y = N
//replace x with N-2*y
N-2*(y) + 2*y = N
N-2*(N/2) + 2*y = N
2*y = N
//therefore, when x=0, y is maximum, and y = N/2
y = N/2
So now you can do:
for y in range(1,int(N/2)):
x = N - (y<<1)
print x, y
You may try to only examine even numbers for x given N =10;
the reason is that: 2y must be even, therefore, x must be even. This should reduce the total running time to half of examining all x.
If you also require that the answer is natural number, so negative numbers are ruled out. you can then only need to examine numbers that are even between [0,10] for x, since both x and 2y must be not larger than 10 alone.

Probability density function from a paper, implemented using C++, not working as intended

So i'm implementing a heuristic algorithm, and i've come across this function.
I have an array of 1 to n (0 to n-1 on C, w/e). I want to choose a number of elements i'll copy to another array. Given a parameter y, (0 < y <= 1), i want to have a distribution of numbers whose average is (y * n). That means that whenever i call this function, it gives me a number, between 0 and n, and the average of these numbers is y*n.
According to the author, "l" is a random number: 0 < l < n . On my test code its currently generating 0 <= l <= n. And i had the right code, but i'm messing with this for hours now, and i'm lazy to code it back.
So i coded the first part of the function, for y <= 0.5
I set y to 0.2, and n to 100. That means it had to return a number between 0 and 99, with average 20.
And the results aren't between 0 and n, but some floats. And the bigger n is, smaller this float is.
This is the C test code. "x" is the "l" parameter.
//hate how code tag works, it's not even working now
int n = 100;
float y = 0.2;
float n_copy;
for(int i = 0 ; i < 20 ; i++)
{
float x = (float) (rand()/(float)RAND_MAX); // 0 <= x <= 1
x = x * n; // 0 <= x <= n
float p1 = (1 - y) / (n*y);
float p2 = (1 - ( x / n ));
float exp = (1 - (2*y)) / y;
p2 = pow(p2, exp);
n_copy = p1 * p2;
printf("%.5f\n", n_copy);
}
And here are some results (5 decimals truncated):
0.03354
0.00484
0.00003
0.00029
0.00020
0.00028
0.00263
0.01619
0.00032
0.00000
0.03598
0.03975
0.00704
0.00176
0.00001
0.01333
0.03396
0.02795
0.00005
0.00860
The article is:
http://www.scribd.com/doc/3097936/cAS-The-Cunning-Ant-System
pages 6 and 7.
or search "cAS: cunning ant system" on google.
So what am i doing wrong? i don't believe the author is wrong, because there are more than 5 papers describing this same function.
all my internets to whoever helps me. This is important to my work.
Thanks :)
You may misunderstand what is expected of you.
Given a (properly normalized) PDF, and wanting to throw a random distribution consistent with it, you form the Cumulative Probability Distribution (CDF) by integrating the PDF, then invert the CDF, and use a uniform random predicate as the argument of the inverted function.
A little more detail.
f_s(l) is the PDF, and has been normalized on [0,n).
Now you integrate it to form the CDF
g_s(l') = \int_0^{l'} dl f_s(l)
Note that this is a definite integral to an unspecified endpoint which I have called l'. The CDF is accordingly a function of l'. Assuming we have the normalization right, g_s(N) = 1.0. If this is not so we apply a simple coefficient to fix it.
Next invert the CDF and call the result G^{-1}(x). For this you'll probably want to choose a particular value of gamma.
Then throw uniform random number on [0,n), and use those as the argument, x, to G^{-1}. The result should lie between [0,1), and should be distributed according to f_s.
Like Justin said, you can use a computer algebra system for the math.
dmckee is actually correct, but I thought that I would elaborate more and try to explain away some of the confusion here. I could definitely fail. f_s(l), the function you have in your pretty formula above, is the probability distribution function. It tells you, for a given input l between 0 and n, the probability that l is the segment length. The sum (integral) for all values between 0 and n should be equal to 1.
The graph at the top of page 7 confuses this point. It plots l vs. f_s(l), but you have to watch out for the stray factors it puts on the side. You notice that the values on the bottom go from 0 to 1, but there is a factor of x n on the side, which means that the l values actually go from 0 to n. Also, on the y-axis there is a x 1/n which means these values don't actually go up to about 3, they go to 3/n.
So what do you do now? Well, you need to solve for the cumulative distribution function by integrating the probability distribution function over l which actually turns out to be not too bad (I did it with the Wolfram Mathematica Online Integrator by using x for l and using only the equation for y <= .5). That however was using an indefinite integral and you are really integration along x from 0 to l. If we set the resulting equation equal to some variable (z for instance), the goal now is to solve for l as a function of z. z here is a random number between 0 and 1. You can try using a symbolic solver for this part if you would like (I would). Then you have not only achieved your goal of being able to pick random ls from this distribution, you have also achieved nirvana.
A little more work done
I'll help a little bit more. I tried doing what I said about for y <= .5, but the symbolic algebra system I was using wasn't able to do the inversion (some other system might be able to). However, then I decided to try using the equation for .5 < y <= 1. This turns out to be much easier. If I change l to x in f_s(l) I get
y / n / (1 - y) * (x / n)^((2 * y - 1) / (1 - y))
Integrating this over x from 0 to l I got (using Mathematica's Online Integrator):
(l / n)^(y / (1 - y))
It doesn't get much nicer than that with this sort of thing. If I set this equal to z and solve for l I get:
l = n * z^(1 / y - 1) for .5 < y <= 1
One quick check is for y = 1. In this case, we get l = n no matter what z is. So far so good. Now, you just generate z (a random number between 0 and 1) and you get an l that is distributed as you desired for .5 < y <= 1. But wait, looking at the graph on page 7 you notice that the probability distribution function is symmetric. That means that we can use the above result to find the value for 0 < y <= .5. We just change l -> n-l and y -> 1-y and get
n - l = n * z^(1 / (1 - y) - 1)
l = n * (1 - z^(1 / (1 - y) - 1)) for 0 < y <= .5
Anyway, that should solve your problem unless I made some error somewhere. Good luck.
Given that for any values l, y, n as described, the terms you call p1 and p2 are both in [0,1) and exp is in [1,..) making pow(p2, exp) also in [0,1) thus I don't see how you'd ever get an output with the range [0,n)