Convert IF-ELSE statement in Linear Programming (ORTools) - linear-programming

I am trying to create an Optimization for Gas Storage using Linear Programming (OR Tools).
I need to write a case like this:
if current_balance > 70% of Total Volume:
set a limit for gas injection as 10
else:
set a limit for gas injection as 30
Current balance is the Total amount of gas that is available today in a gas storage.
I tried looking at Big M notation.
Is there other way except Big M? And if i have to use Big M then how can I use it in above problem?
Edited:
How can i build equation for following case:
if current_balance > 70% of Total Volume and current_balance < 80% of Total Volume:
set a limit for gas injection as 10
else if current_balance > 80% of Total Volume:
set a limit for gas injection as 30

I don't think there is an other way but Big M, although Big M get much better when you put some thought into it and choose M wisely and not too big (as small as possible). When the current balance is never allowed to exceed the total volume the following formulation is the tightest to work for you case. Here exceed is a boolean variable indicating whether you are exceeding the 70% of the total volume.
current_balance - (30%TotalVolume)*exceed <= 70%TotalVolume
gas_injection <= 30 - 20*exceed

Related

Using Logistic Regression For Timeseries Data in Amazon SageMaker

For a project I am working on, which uses annual financial reports data (of multiple categories) from companies which have been successful or gone bust/into liquidation, I previously created a (fairly well performing) model on AWS Sagemaker using a multiple linear regression algorithm (specifically, the AWS stock algorithm for logistic regression/classification problems - the 'Linear Learner' algorithm)
This model just produces a simple "company is in good health" or "company looks like it will go bust" binary prediction, based on one set of annual data fed in; e.g.
query input: {data:[{
"Gross Revenue": -4000,
"Balance Sheet": 10000,
"Creditors": 4000,
"Debts": 1000000
}]}
inference output: "in good health" / "in bad health"
I trained this model by just ignoring what year for each company the values were from and pilling in all of the annual financial reports data (i.e. one years financial data for one company = one input line) for the training, along with the label of "good" or "bad" - a good company was one which has existed for a while, but hasn't gone bust, a bad company is one which was found to have eventually gone bust; e.g.:
label
Gross Revenue
Balance Sheet
Creditors
Debts
good
10000
20000
0
0
bad
0
5
100
10000
bad
20000
0
4
100000000
I hence used these multiple features (gross revenue, balance sheet...) along with the label (good/bad) in my training input, to create my first model.
I would like to use the same features as before as input (gross revenue, balance sheet..) but over multiple years; e.g take the values from 2020 & 2019 and use these (along with the eventual company status of "good" or "bad") as the singular input for my new model. However I'm unsure of the following:
is this an inappropriate use of logistic regression Machine learning? i.e. is there a more suitable algorithm I should consider?
is it fine, or terribly wrong to try and just use the same technique as before, but combine the data for both years into one input line like:
label
Gross Revenue(2019)
Balance Sheet(2019)
Creditors(2019)
Debts(2019)
Gross Revenue(2020)
Balance Sheet(2020)
Creditors(2020)
Debts(2020)
good
10000
20000
0
0
30000
10000
40
500
bad
100
50
200
50000
100
5
100
10000
bad
5000
0
2000
800000
2000
0
4
100000000
I would personally expect that a company which has gotten worse over time (i.e. companies finances are worse in 2020 than in 2019) should be more likely to be found to be a "bad"/likely to go bust, so I would hope that, if I feed in data like in the above example (i.e. earlier years data comes before later years data, on an input line) my training job ends up creating a model which gives greater weighting to the earlier years data, when making predictions
Any advice or tips would be greatly appreciated - I'm pretty new to machine learning and would like to learn more
UPDATE:
Using Long-Short-Term-Memory Recurrent Neural Networks (LSTM RNN) is one potential route I think I could try taking, but this seems to commonly just be used with multivariate data over many dates; my data only has 2 or 3 dates worth of multivariate data, per company. I would want to try using the data I have for all the companies, over the few dates worth of data there are, in training
I once developed a so called Genetic Time Series in R. I used a Genetic Algorithm which sorted out the best solutions from multivariate data, which were fitted on a VAR in differences or a VECM. Your data seems more macro economic or financial than user-centric and VAR or VECM seems appropriate. (Surely it is possible to treat time-series data in the same way so that we can use LSTM or other approaches, but these are very common) However, I do not know if VAR in differences or VECM works with binary classified labels. Perhaps if you would calculate a metric outcome, which you later label encode to a categorical feature (or label it first to a categorical) than VAR or VECM may also be appropriate.
However you may add all yearly data points to one data points per firm to forecast its survival, but you would loose a lot of insight. If you are interested in time series ML which works a little bit different than for neural networks or elastic net (which could also be used with time series) let me know. And we can work something out. Or I'll paste you some sources.
Summary:
1.)
It is possible to use LSTM, elastic NEt (time points may be dummies or treated as cross sectional panel) or you use VAR in differences and VECM with a slightly different out come variable
2.)
It is possible but you will loose information over time.
All the best,
Patrick

Saving from nonlinearity in GAMS

I am trying to overcome a machine allocation problem with time horizon of 5 day. Production plan is hard to catch up, so my objective is to minimize total machines working time spent. Machines uses molds to produce and there are molds for each type of product. If a product is produced at the end of the day and if there will be production later day, total setup needed for that machine should be decreased by one. For this reason,
sets
i: mold type
j:jobs
k: days
parameters
x(i,k) ith mold production needed at day k
y(i,j) 1 if ith mold is compatible with jth machine
Decision variable
m(i,j,k) : 1 if ith mold processed in jth machine in day k 0 o/w
b(j,k) setup number of jth machine in day k
While computing the setup number for day 1, b(j,’1’), is simply equal to the sum of m(i,j,k).
For computing other days setup number I tried these but these made problem nonlinear and it takes months to solve.
b(j,'2')=e=sum(i,m(i,j,'2')) - sum(i,m(i,j,'2')*m(i,j,'1'))
By this way, if mold i is produced in both days, there will not be any setup made at second day. In order to restrain multiple setup reduction I put: sum(i,m(i,j,'2')*m(i,j,'1')) =l= 1
So, how can I decrease the setup number for a machine if it has used a mold a day before without making the problem nonlinear.
It is possible to linearize m(i,j,'2')*m(i,j,'1'):
Both(i,j) <= m(i,j,'2')
Both(i,j) <= m(i,j,'1')
Both(i,j) >= m(i,j,'2')+m(i,j,'1')-1
Both(i,j) is a binary variable
This transformation is done automatically by some solvers.
Note that there are alternative ways to model the start of a run, and often there are things to exploit (depending on the details).

Why is the value of "Sum CPUCreditBalance" so high?

I have 3 EC2 instances which are created by Elastic Beanstalk. Their current CPU Credit Balance are as the following:
And this is the monitoring page in Elastic Beanstalk:
Why is "Sum CPUCreditBalance" equal to 1.8K?
As you can see from the first picture, the CPU credit balances of the 3 EC2 instances are all below 120. 120 * 3 = 360 is far smaller than 1.8K = 1800.
How is 1.8K calculated?
Here are the options I used when creating Sum CPUCreditBalance:
It is the sum of all data points (CPU Credit Balance) in the graph.
Roughly calculating data points: 11x20 + 7x50 + 110x11 = 1780
SUM() isn't a meaningful aggregation of a sampled statistic like CPU Credit Balance. You're adding up all the values from the samples recorded in the time range, and that provides no useful information for this type of measurement.
SUM() only makes sense when the metric itself is a raw count of things per sampling period, such as the number HTTP requests or errors.
Sum -- All values submitted for the matching metric added together. This statistic can be useful for determining the total volume of a metric.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html#Statistic

If Function in GAMS

I've been busy with a model, but I'm uncomfortable about the result because I think GAMS violates a constraint. What I want to tell to GAMS is:
"check demand first -> then check current stocks -> IF there is enough stocks sell from current stocks -> IF there is not enough stocks first buy (produce) then sell."
I think in the model GAMS does not obey any demand (sell), any minimum values and sells everything without buying any.
The model is hereinbelow:
Sets
i items /s,p,b/
t time in quarters /1,2,3/
Parameters
price(i) selling price per unit i per quarter in euros /s 6.34, p 6.46, b 5.93/
inistock(i) initial stock in units /s 320000, p 296199, b 104208/
cap(i) capacity limit for each unit /s 400000, p 350000, b 150000/
c cost of holding 1 unit of i /s 10, p 15, b 12/
Scalars
tcap total capacity of warehouse /650000/
Variables
stock(i,t) stock stored at time t
sell(i,t) stock sold at time t
buy(i,t) stock bought at time t
cost total cost
Positive Variables stock,sell,buy
Equations
cst total cost occurs
stck(i,t) stock balance of unit i at time t;
cst.. cost=e=sum((i,t),price(i)*(buy(i,t)-sell(i,t))+c(i)*stock(i,t));
stck(i,t).. stock(i,t)=e=inistock(i)+stock(i,t-1)+buy(i,t)-sell(i,t);
stck.up(i,t)=tcap;
Option LP=Cplex ;
Option optcr=0;
Model TWH The Warehouse Problem / all /;
Solve TWH minimizing cost using lp;
Thank you in advance for your support!
You haven't set any demand constraints, and the only minimum values are the zero bounds from defining the variables as being positive.
What other constraint did you expect GAMS to obey?
Selling everything is the correct solution to the problem you have defined.
I also think this part is a mistake:
stck.up(i,t)=tcap;
You probably meant to write 'stock' rather than 'stck'.
If this was a lower bound (writing 'lo' instead of 'up'), you would have a problem with a non-trivial solution, as you are adding a constraint that the warehouse should be filled to maximum capacity.

Programming for a Financial Application

I've seen this twice now, and I just don't understand it. When calculating a "Finance Charge" for a fixed rate loan, applications make the user enter in all possible loan amounts and associated finance charges. Even though these rates are calculable (30%), they application makes the user fill out a table like this:
Loan Amount Finance Charge
100 30
105 31.5
etc, with the loan amounts being provided from $5 to $1500 in $5 increments.
We are starting a new initiative to rebuild this system. Is there a valid reason for doing a rate table this way? I would imagine that we should keep a simple interest field, and calculate it every time we need it.
I'm really at a loss as to why anyone would hardcode a table like that instead of calculating...I mean, computers are kind of designed to do stuff like this. Right?
It looks like compound interest where you're generously rounding up. The 100 case + 1 is pretty boring. But the 105 case + 1 is interesting.
T[0] = FC[105] => 31.5
T[1] = FC[136.5] => ?
Where does 136.5 hit -- 135 or 140? At 140, you've made an extra $1.05.
Or... If the rates were ever not calculable, that would be one reason for this implementation.
Or... The other reason (and one I would do if annoyed enough) would be that these rates were constantly changing, the developer got fed up with it, and he gave them an interface where the end users could set them on their own. The $5 buckets seem outrageous but maybe they were real jerks...