AIMMS artificial variable constraint - linear-programming

I have difficulty formulating the constraints correctly. Dumbed down version of the problem:
There are 12 time units, 3 products, demand d_{i,t} for the product i at time $t$ is known in advance and the resources r_{i,t} (all 8, product i uses the non-i resources) necessary for product i at time t are known as well. We need to minimize holding costs h_i by deciding how much of product i we need to produce at time t, this is called x_{i,t}. Starting inventory for every product is 6. To help I introduce stock level s_{i,t}. This equates to the following formulation:
I got this working using the Excel solver but I need to do this in AIMMS. The Stock variable s is giving me issues, I can't get it to work using an if statement to condition on if t=1 nor would I know how to split it in two constraints, given the first iteration of the second constraint would need to reference the first constraint.

You can specify index domain in your constraint attributes as follows:
(t,i) | t > 1
if t > 1 statement should work if the set of time instances is a subset of integers. If not - you should use ord(t) > 1, i.e.
if ord(t) > 1
then
Your_Constraint
endif

Related

Check reduced cost of new column in column generation

I'm working on a decomposition model using column generation. When generate a new column, I'd like to check it's reduced cost using CPLEX function. I did read a post here, however the sign of the reduced cost obtain by this method is always negative (in my case), since the new columns are stated as nonbasis (according to the accepted answer).
Edit 1: So the purpose of checking these values is that I have trouble running my code. My master problem is to maximize an objective function. The pricing problems keep generating new column with positive reduce cost, however, after added new columns to the master and reoptimize, the objective function didn't improve at all, and this cycle keep going on for 30 minutes. So I decide to look into each detail to see what is the problem.

Linear Programming - Re-setting a variable based on it's cumulative count

Detailed business problem:
I'm trying to solve a production scheduling business problem as below:
I have two plants producing FG A and B respectively.
Both the products consume the same Raw Material x
I need to create a 30 day production schedule looking at the Raw Material availability.
FG A and B can be produced if there is sufficient raw material available on the day.
After every 6 days of production the plant has to undergo maintenance and the production on that day will be zero.
Objective is to maximize the margin looking at the day level Raw material available and adhere to the production constraint (i.e. shutdown after every 6th day)
I need to build a linear programming to address the below problem:
Variable y: (binary)
variable z: cumulative of y
When z > 6 then y = 0. I also need to reset the cumulation of z after this point.
Desired output:
How can I build the statement to MILP constraint. Are there any techniques for solving this problem. Thank you.
I think you can model your maintenance differently. Just forbid any sequences of 7 ones for y. I.e.
y[t-6]+y[t-5]+y[t-4]+y[t-3]+y[t-2]+y[t-1]+y[t] <= 6 for t=1,..,T
This is easier than using your accumulator. Note that the beginning needs some attention: you can use historic data for this. I.e., at t=1, the values for t=0,-1,-2,.. are known.
Your accumulator approach is not inherently wrong. We often use it to model inventory. An inventory capacity is a restriction on how large the accumulated inventory can be.

Creating train, test and cross validation datasets in sklearn (python 2.7) with a grouping constraints?

While creating a train,test & cross validation sample in Python, I see the default method as -:
1. Reading the dataset , after skipping headers
2. Creating the train, test and Cross validation sample
import csv
with open('C:/Users/Train/Trainl.csv', 'r') as f1:
next(f1)
reader = csv.reader(f1, delimiter=',')
input_set = []
for row in reader:
input_set.append(row)
import numpy as np
from numpy import genfromtxt
from sklearn import cross_validation
train, intermediate_set = cross_validation.train_test_split(input_set, train_size=0.6, test_size=0.4)
cv, test = cross_validation.train_test_split(intermediate_set, train_size=0.5, test_size=0.5)
My problem though is that I have a field say "A" in the csv file that I read into the numpy array, and all sampling should respect this field. That is, all entries with similar values for "A" should go in one sample .
Line #|A | B | C | D
1 |1 |
2 |1 |
3 |1 |
4 |1 |
5 |2 |
6 |2 |
7 |2 |
Required : line 1,2,3,4 should go in "one" sample and 5,6,7 should go in the "one" sample.
Value of column A is a unique id, corresponding to one single entity(could be seen as a cross section data points on one SINGLE user, so it MUST go in one unique sample of train, test, or cv), and there are many such entities, so a grouping by entity id is required.
B, C,D columns may have any values, but a grouping preservation is not required on them. (Bonus: can I group the sampling for multiple fields?)
What I tried :
A. Finding all unique values of A's - denoting this as my sample I now distribute the sample among-st train, intermediate & cv & test -> then putting the rest of the rows for this value of "A" in each of these files.
that is if train had entry for "3" , test for"2" and cv for "1" then all rows with value of A as 3 go in train, all with 2 go in test and all with 1 go in cv.
Ofcourse this approach is not scalable.
And I doubt, it may have introduced bias into the datasets, since the number of 1's in column A , no of 2's etc. is not equal, meaning this approach will not work !
B. I also tried numpy.random.shuffle, or numpy.random.permutation as per the thread here - Numpy: How to split/partition a dataset (array) into training and test datasets for, e.g., cross validation? , but it did not meet my requirement.
C. A third option of-course is writing a custom function that does this grouping, and then balances the training, test and cv data-sets based on number of data points in each group. But just wondering, if there's already an efficient way to implement this ?
Note my data set is huge, so ideally I would like to have a deterministic way to partition my datasets, without having multiple eye-ball-scans to be sure that the partition is correct.
EDIT Part 2:
Since I did not find any that fit my sampling criteria - I actually wrote a module to sample with grouping constraints. This is the github code to it. The code was not written for very large data in mind, so it's not very efficient. Should you FORK this code - please point out how can I improve the run-time.
https://github.com/ekta1007/Sampling-techniques/blob/master/sample_expedia.py
By forcing such constraints you will introduce bias either way, to you procedure. So approach based on the partition of the "users" data and then collecting their respective "measurements" does not seem bad. And it will scale just fine, this is O(n) method, the only reason for not scaling up is bad implementation, not bad method.
The reason for no such functionality in existing methods (like sklearn library) is because it looks highly artificial, and counter machine learning models idea. If these are somehow one entities then they should not be treated as separate data points. If you need this separate representation then requiring such division, that the particular entity cannot be partially in test test and partially in training will for sure bias the whole model.
To sum up - you should really deeply analyze whether your approach is reasonable from the machine learning point of view. If you are sure about it, I think the only possibility is to write the segmentation by yourself, as even though using many ML libraries in the past, I've never seen such functionality.
In fact I am not sure, if the problem of creating segmentation of the set containing N numbers (sizes of entities) into K (=3) subsets of given sums proportions with uniform distribution when treated as a random process is not NP problem on itself. If you cannot guarantee uniform distribution, then your datasets cannot be used as a statistically correct method of training/testing/validating your model. Even if it has a reasonable polynomial solution, it can still scale up badly (much worse then linear methods). This doubt applies if your constraints are "strict", if they are "weak" you can always do "generate and reject" approach, which should have amortized linear complexity.
I was also facing similar kind of issue, though my coding is not too good I came up with the solution as given below:
Created a new data frame that only contains the Unique Id of the df and removed duplicates.
new = df[["Unique_Id "]].copy()
New_DF = new.drop_duplicates()
Created training and test set on the basis of New_DF
train, test = train_test_split(New_DF, test_size=0.2)
And then merged those training and test set with original df.
df_Test = pd.merge(df, test, how='inner', on = “Unique_Id”)
df_Train = pd.merge(df, train, how='inner', on = “Unique_Id”)
Similarly, we can create sample for the validation part too.
Cheers.

Optimization run assistance

I am running the optimisation of two sets of data against each other and am after some assistance as to looking up settings of the run based on the calculated results. I'll explain....
I run 2 data lines against each other (think graph lines) - Line A and Line B. These lines have crossing points - upward and downward based on the direction of each line.e.g. Line A is going up and Line B is going down is an 'Upwards cross' and Line A going down and Line B going up is a 'Downward cross'.The program calculates financial analysis.
I analyze the crossing points and gain a resultant 'Rank' from the analysis based on a set of rules. The rank is a single integer.
Line A has a number of settings for the optimisation run e.g. Window 1 from a value of 10 to 20 and window 2 at a value of 30 to 40. Line B also has settings.
When I run the optimisation I iterate through the parameters available for each line and calculate the rank. The result of the optimisation run is a list of the ranks which is the size of the number of permutations avaliable.
So my question is this:
What is the best way to look up the line settings from the calculated rank using a position (index) in the rank list. The optimisation settings used to create the run will be stored for that rank run and can be used for the look-up.
I also will be adding additional parameters in the future to the system for the line so I want the program to take into account additional future line settings without affecting any rank files created previous to adding the new parameter.
In addition to that I want to be able to find out an index based on a particular setting included in the optimisation run (the reverse look-up of the previous method).
I want to avoid versioning for backward compatability if at all possible so that the lookup algorithm will be self-sufficient.
Is a hash table suitable for this purpose or do you have any implementation techniques that would fit better? Do you have any examples of this type of operation in action in C++?
Thanks,
Chris.
If I understand correctly, you have a bunch of associated data (settings + rank), on which you would like to be able to perform lookups with different key types. If so, then Boost.MultiIndex sounds like what you're looking for.

Most efficient way to process complex histogram data?

I'm currently implementing a histogram that will show a very large scale data using Qt and I have some doubts about which data structure(s) I should be using for my problem. I will be displaying the amount of queries received from users of an application and the way I should display is as follows -in a single application that will show different histograms upon clicking different "show me this data etc." buttons-
1) Display the histogram of total queries per every month -4 months of data here, I
kept four variables and incremented them as I caught queries belonging to those months
in the CSV file-
2) Display the histogram of total queries per every single day in a selected month -I was thinking of using 4 QVectors to represent the months for this one, incrementing every element of the vector (day), as I come by that specific day -e.g. the vector represents the month of August and whenever I come across a data with 2011-08-XY , I will increment the (XY + 1)th element of that vector by 1- my second alternative is to use 4 QLinkedList's for the sake of better complexity but I'm not sure if the ways I've come up with are efficient enough and I'm willing to listen to any other idea.
3) Here's where things get a bit complicated. Display the histogram of total queries per every hour in a selected day and month. The data represented is multiplied in a vast manner and I don't know which data structure -or combination of structures- I should use to implement this one. A list of lists perhaps?
Any ideas on my problems at 2) and 3) would be helpful, Thanks in advance.
Actually, it shouldn't be too unmanageable to always do queries per hour. Assuming that the number of queries per hour is never greater than the maximum int value, that's only 24 ints per day = 32 bits or 64 depending on your machine. Assuming 32 bits, then you could get up to 28 years worth of data per MB.
There's no need to transfer the month/year - your program can work that out. Just assign hour 0 to the earliest point in your data, which you keep as a constant, then work out the date based on hours passed since then.
This avoids having to have a list of lists or anything fancy - just use an array where each address contains the number of hours since hour 0, and the number of queries for that hour.
Why don't you simply use a classical database?
When you start asking these kind of question I think it is a good time to consider a more robust structure.There are multiple data structures implemented inside any DB, optimized either for different access type. You should considerate at least lookup, insertion, deletion, range queries. There is no structure which is better than the others in all costs, so there is always a trade-off.
Qt has some database classes you can use. I never used the Qt SQL library, but I think you should give it a shot. Fortunately, there is a Qt SQL programming guide at the end of the page linked.