Linear Programming: How to implement with multiple constraints? - linear-programming

I’m trying to solve a linear programing model and need some help. I’m not a programming expert, but I conceptually can draw up the problem and am hoping for some help implementing it.
I’m looking into an asset allocation problem for an investment portfolio from a theoretical perspective, but for simplicity of this post I’m going to use generic terms.
I have a list of 500+ choices that all have an assigned cost and value add. My goal is to maximize the sum of the value add, given a constraint on how much I can spend. These 500 choices are divided into 5 categories and there are restrictions on how many choices I can have from each category.
Category 1 = 1
Category 2 = 1
Category 3 = 2 or 3
Category 4 = 1 or 2
Category 5 = 2
Category 3 + Category 4 = 4
I figure I’ll need to use a binary X variable attached to each choice and 1 means I’m picking that choice and 0 doesn’t so in the end there should be 8 variables that have 1 and the rest have a 0 value that leads to the maximum value add given the constraints on cost each choice has.
I ultimately hope to be able to run and say for example “what is the nth highest value” so instead of getting the maximum value add I can get the second highest value add and so on.
Is this possible and what software/language would be best to do it? Thanks for your help!

Just to simplify writing everything down, let's assume you had 15 assets, with value added v_1, v_2, ..., v_15 and costs c_1, c_2, ..., c_15. Let's assume assets 1, 2, and 3 are in category 1, assets 4, 5, and 6 are in category 2, assets 7, 8, and 9 are in category 3, assets 10, 11, and 12 are in category 4, and assets 13, 14, and 15 are in category 5. Finally, let's assume a budget B.
We would create binary variables x_1, x_2, ..., x_15 to indicate whether we bought each asset. Now, the objective function of our integer program is:
max v_1*x_1 + v_2*x_2 + ... + v_15*x_15
Our budget constraint is:
c_1*x_1 + c_2*x_2 + ... + c_15*x_15 <= B
Exactly one choice from category 1:
x_1 + x_2 + x_3 = 1
Exactly one choice from category 2:
x_4 + x_5 + x_6 = 1
Either 2 or 3 choices from category 3:
x_7 + x_8 + x_9 >= 2
x_7 + x_8 + x_9 <= 3
Either 1 or 2 choices from category 4:
x_10 + x_11 + x_12 >= 1
x_10 + x_11 + x_12 <= 2
Exactly 2 choices from category 5:
x_13 + x_14 + x_15 = 2
Exactly 4 choices from categories 3 and 4 combined:
x_7 + x_8 + x_9 + x_10 + x_11 + x_12 = 4
Finally, you would specify all variables to be binary.
Note that the only adjustment you would need to your problem is to change the variables in each of these constraints to be the variables associated with each of your five categories.
All that remains would be to implement the model. There are a myriad of linear programming packages in all major languages; check out this survey for details. Since Stack Overflow is not a software recommendation site and you haven't really given any details about your situation (e.g. free vs. non-free solvers or the programming language you're using), I will refrain from suggesting a particular package.

Related

Planning subsequent orders

Let's say I have 5 orders and 3 drivers. I want to maximize the amount of miles they have on the road. Each driver has times that they're available to drive and orders have times that they're able to be picked up at.
Ideally, I would like to be able to plan subsequent orders in one go, rather than writing multiple models at once. My current iteration is to write multiple models that give output and subsequent models take those as inputs. How can you write this as a singular LP model?
O = {Order1, Order2, Order3, Order4, Order5}
D = {Driver1, Driver2, Driver3}
O_avail = {2 pm, 3pm, 230 pm, 8pm, 9pm, 12 am}
D_avail = {2pm, 3pm, 230pm}
Time_to_depot = {7 hours,5 hours,2 hours,5 hours,3hours, 4hours}
constraints
d_avail <= o_avail
obj function
max sum D_i*time_to_depot_i
I laid it out in such a way that driver 1 takes order 1, order 5 and order6. Driver 2 takes order 2 and order 4.

Trajectory Analysis (SAS): Incorrect number of start values

I am attempting a trajectory analysis in SAS (proc traj).
Following instructions found online, I first begin by testing two quadratic models, then three, then four (i.e., order 2 2, order 2 2 2, order 2 2 2 2, order 2 2 2 2 2).
I determined that a three-group linear model is the best fit (order 1 1 1;)
I then wish to add time stable covariates with the risk command. As found online, I did this by adding the start parameters provided in the Log.
At this point, I receive a notice: "Incorrect number of start values. There should be 10 start values based on the model specifications.").
I understand that it's possible to delete some of the 12 parameter estimates provided - But how do I select which ones to remove?
Thank you.
Code:
proc traj data=followupyes outplot=op outstat=os out=of outest=oe itdetail;
id youthid;
title3 'linear 3-gp model ';
var pronoun_allpar1-pronoun_allpar3;
indep time1-time3;
model logit;
ngroups 3;
order 1 1 1;
weight wgt_00;
start 0.031547 0.499724 1.969017 0.859566 -1.236747 0.007471
0.771878 0.495458 0.000000 0.000000 0.000000 0.000000;
risk P00_45_1;
run;
%trajplot (OP, OS, "linear 3-gp model ", "Traj of Pronoun Support", "Pron Support", "Time");
Because you are estimating a model with 3 linear trajectories, you will need 2 start values for each of your 3 groups.
See here for more info: https://www.andrew.cmu.edu/user/bjones/example.htm

Filtering on annotations with max date in Django

I have 3 models in Django-project:
class Hardware(models.Model):
inventory_number = models.IntegerField(unique=True,)
class Subdivision(models.Model):
name = models.CharField(max_length=50,)
class Relocation(models.Model):
hardware = models.ForeignKey('Hardware',)
subdivision = models.ForeignKey('Subdivision',)
relocation_date = models.DateField(verbose_name='Relocation Date', default=date.today())
Table 'Hardware_Relocation' with values for example:
id hardware subdivision relocation_date
1 1 1 01.01.2009
2 1 2 01.01.2010
3 1 1 01.01.2011
4 1 3 01.01.2012
5 1 3 01.01.2013
6 1 3 01.01.2014
7 1 3 01.01.2015 # Now hardware 1 located in subdivision 3 because relocation_date is max
I would like to write a filter to find hardwares in subdivisions on today.
I'm trying to write a filter:
subdivision = Subdivision.objects.get(pk=1)
hardware_list = Hardware.objects.annotate(relocation__relocation_date=Max('relocation__relocation_date')).filter(relocation__subdivision = subdivision)
Now hardware_list contains hardware 1, but it is wrong (because now hardware 1 in subdivision 3).
hardware_list must be None in this example.
The following code works wrong (hardware_list contains hardware 1, for subdivision 1).
limit_date = datetime.datetime.now()
q1 = Hardware.objects.filter(relocation__subdivision=subdivision, relocation__relocation_date__lte=limit_date)
q2 = q1.exclude(~Q(relocation__relocation_date__gt=F('relocation__relocation_date')), ~Q(relocation__subdivision=subdivision))
hardware_list = q2.distinct()
Maybe better use SQL?
This might work...
from django.db.models import F, Q
Hardware.objects
.filter(relocation__subdivision=target_subdivision, relocation__relocation_date__lte=limit_date)
.exclude(~Q(relocation__subdivision=target_subdivision), relocation__relocation_date__gt=F('relocation__relocation_date'))
.distinct()
The idea is, give me all hardware that have been relocated to target division before limit date, which DON'T have been relocated to other divisions after that.

Calculating the distance between characters

Problem: I have a large number of scanned documents that are linked to the wrong records in a database. Each image has the correct ID on it somewhere that says where it belongs in the db.
I.E. A DB row could be:
| user_id | img_id | img_loc |
| 1 | 1 | /img.jpg|
img.jpg would have the user_id (1) on the image somewhere.
Method/Solution: Loop through the database. Pull the image text in to a variable with OCR and check if user_id is found anywhere in the variable. If not, flag the record/image in a log, if so do nothing and move on.
My example is simple, in the real world I have a guarantee that user_id wouldn't accidentally show up on the wrong form (it is of a specific format that has its own significance)
Right now it is working. However, it is incredibly strict. If you've worked with OCR you understand how fickle it can be. Sometimes a 7 = 1 or a 9 = 7, etc. The result is a large number of false positives. Especially among images with low quality scans.
I've addressed some of the image quality issues with some processing on my side - increase image size, adjust the black/white threshold and had satisfying results. I'd like to add the ability for the prog to recognize, for example, that "81*7*23103" is not very far from "81*9*23103"
The only way I know how to do that is to check for strings >= to the length of what I'm looking for. Calculate the distance between each character, calc an average and give it a limit on what is a good average.
Some examples:
Ex 1
81723103 - Looking for this
81923103 - Found this
--------
00200000 - distances between characters
0 + 0 + 2 + 0 + 0 + 0 + 0 + 0 = 2
2/8 = .25 (pretty good match. 0 = perfect)
Ex 2
81723103 - Looking
81158988 - Found
--------
00635885 - distances
0 + 0 + 6 + 3 + 5 + 8 + 8 + 5 = 35
35/8 = 4.375 (Not a very good match. 9 = worst)
This way I can tell it "Flag the bottom 30% only" and dump anything with an average distance > 6.
I figure I'm reinventing the wheel and wanted to share this for feedback. I see a huge increase in run time and a performance hit doing all these string operations over what I'm currently doing.

How to count rating?

My question is more mathematical. there is a post in the site. User can like and dislike it. And below the post is written for example -5 dislikes and +23 likes. On the base of these values I want to make a rating with range 0-10 or (-10-0 and 0-10). How to make it correctly?
This may not answer your question as you need a rating between [-10,10] but this blog post describes the best way to give scores to items where there are positive and negative ratings (in your case, likes and dislikes).
A simple method like
(Positive ratings) - (Negative ratings), or
(Positive ratings) / (Total ratings)
will not give optimal results.
Instead he uses a method called Binomial proportion confidence interval.
The relevant part of the blog post is copied below:
CORRECT SOLUTION: Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter
Say what: We need to balance the proportion of positive ratings with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the "real" fraction of positive ratings is at least what? Wilson gives the answer. Considering only positive and negative ratings (i.e. not a 5-star scale), the lower bound on the proportion of positive ratings is given by:
(source: evanmiller.org)
(Use minus where it says plus/minus to calculate the lower bound.) Here p is the observed fraction of positive ratings, zα/2 is the (1-α/2) quantile of the standard normal distribution, and n is the total number of ratings.
Here it is, implemented in Ruby, again from the blog post.
require 'statistics2'
def ci_lower_bound(pos, n, confidence)
if n == 0
return 0
end
z = Statistics2.pnormaldist(1-(1-confidence)/2)
phat = 1.0*pos/n
(phat + z*z/(2*n) - z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)
end
This is extension to Shepherd's answer.
total_votes = num_likes + num_dislikes;
rating = round(10*num_likes/total_votes);
It depends on number of visitors to your app. Lets say if you expect about 100 users rate your app. When a first user click dislike, we will rate it as 0 based on above approach. But this is not logically right.. since our sample is very small to make it a zero. Same with only one positive - our app gets 10 rating.
A better thing would be to add a constant value to numerator and denominator. Lets say if our app has 100 visitors, its safe to assume that until we get 10 ups/downs, we should not go to extremes(neither 0 nor 10 rating). SO just add 5 to each likes and dislikes.
num_likes = num_likes + 5;
num_dislikes = num_dislikes + 5;
total_votes = num_likes + num_dislikes;
rating = round(10*(num_likes)/(total_votes));
It sounds like what you want is basically a percentage liked/disliked. I would do 0 to 10, rather than -10 to 10, because that could be confusing. So on a 0 to 10 scale, 0 would be "all dislikes" and 10 would be "all liked"
total_votes = num_likes + num_dislikes;
rating = round(10*num_likes/total_votes);
And that's basically it.