Check reduced cost of new column in column generation - linear-programming

I'm working on a decomposition model using column generation. When generate a new column, I'd like to check it's reduced cost using CPLEX function. I did read a post here, however the sign of the reduced cost obtain by this method is always negative (in my case), since the new columns are stated as nonbasis (according to the accepted answer).
Edit 1: So the purpose of checking these values is that I have trouble running my code. My master problem is to maximize an objective function. The pricing problems keep generating new column with positive reduce cost, however, after added new columns to the master and reoptimize, the objective function didn't improve at all, and this cycle keep going on for 30 minutes. So I decide to look into each detail to see what is the problem.

Related

LP warehouse problem with a certain truck capacity

The problem that I need to solve is almost like the basic LP warehouse problem, where you have n warehouses, each one with a certain amount of a product, and m shops, each one demanding a certain amount of that product. So the goal is to minimize the amount of Km made by the trucks that have to deliver the products from the warehouses to the shops.
This was the easy part, I already identified the constraints and the objective function.
The part that I can't get my head around, is that the truck that delivers the products has a certain capacity, C. Every single truck has the same capacity. I can't tell if that piece of information is really relevant and should be included in some kind of constraint or something. I would really apreciate a hint, cause I've been stuck on this part for a while now and couldn't fine any example of this exact type of problem on the Internet
The number of trucs needed can be bounded by
numtrucs(i,j)*capacity >= shipment(i,j)
Add a term to the objective that minimizes the number of trucs.

Optimizing / speeding up calculation time in Google Sheets

I have asked a few questions related to this personal project of mine already on this platform, and this should be the last one since I am so close to finishing. Below is the link to a mock example spreadsheet I've created, which mimics what my actual project does but it contains less sensitive information and is also smaller in size.
Mock Spreadsheet
Basic rundown of the spreadsheet:
Pulls data from a master schedule which is controlled/edited by another party into the Master Schedule tab.
In the columns adjacent to the imported data, an array formula expands the master schedule by classroom in case some of the time slots designate multiple rooms. Additional formulas adjust the date, start time, and end time to be capped within the current day's 24-hour period. The start time of each class is also made to be an hour earlier.
In the Room Schedule tab, an hourly calendar is created based on the room number in the first column, and only corresponds to the current day.
I have tested the spreadsheet extensively with multiple scenarios, and I'm happy with how everything works except for the calculation time. I figured the two volatile functions I use would take some processing time just by themselves, and I certainly didn't expect this to be lightning-fast especially without using a script, but the project that I am actually implementing this method for is much larger and takes a very long time to update. The purpose of this spreadsheet is to allow users to find an open room and "reserve" it by clicking the checkbox next to it (which will consequently color the entire row red) allowing everyone else to know that it is now taken.
I'd like to know if there is any way to optimize / speed up my spreadsheet, or to not update it every time a checkbox is clicked and instead update it "manually", similar to what OP is asking here. I am not familiar with Apps Script nor am I well-versed in writing code overall, but I am willing to learn - I just need a push in the right direction since I am going into this blind. I know the number of formulas in the Room Schedule tab is probably working against me yet I am so close to what I wanted the final product to be, so any help or insight is greatly appreciated!
Feel free to ask any questions if I didn't explain this well enough.
to speed up things you should avoid usage of the same formulae per each row and make use of arrayformulas. for example:
=IF(AND(TEXT(K3,"m/d")<>$A$1,(M3-L3)<0),K3+1,K3+0)
=ARRAYFORMULA(IF(K3:K<>"",
IF((TEXT(K3:K, "m/d")<>$A$1)*((M3:M-L3:L)<0), K3:K+1, K3:K+0), ))
=IF(AND(TEXT(K3,"m/d")=$A$1,(M3-L3)<0),TIMEVALUE("11:59:59 PM"),M3+0)
=ARRAYFORMULA(IF(K3:K<>"",
IF((TEXT(K3,"m/d")=$A$1)*((M3-L3)<0), TIMEVALUE("11:59:59 PM"), M3:M+0), ))

Update in data warehouse fact table

Reading upon many Kimball design tips regarding fact tables (transaction, accumulating, periodic) etc. I'm still vague what should I do with my case of updating a fact table which I believe is not that uncommon. To the case.
We're processing complaints from clients, and we want to be able to reflect current status of complaint in the Data Warehouse. Our complaints have a workflow of statuses they go through, different assignees that deal with them on time, but for our analysis this is irrelevant as of now. We would like to review what the current situation on complaint is.
To my understanding the grain of the fact table would be single complaint, with columns (irrelevant for this question whether it should be junk dimension, degenerate etc) such as:
Complaint Number
Current Status
Current Status Date
Current Assignee
Type of complaint
As far as I understand, since we don't want to view the process history, but instead see what the current status of the process is, storing multiple rows for each complaint representing it's state is an overkill, so instead we store only one row per complaint and update it.
Now, is my reasoning correct to do that? In above case, complaint number and type of complaint store values that don't change, while "Current" columns do and we need to update the row, so we could implement Change Data Capture mechanism (just like we do for dimensions right now) to compare incoming rows from source system for this fact with currently stored fact rows to improve time cost of such operation.
It honestly looks like a Dimension table with mixed SCD Type 0&1 for me, but it stores facts of receiving complaints.
SO Post for reference: Fact table with information that is regularly updatable in source system
Edit
I'm aware that I could use accumulating fact table with time stamps which is somewhat SCD Type 2 alike but the end user doesn't really care about the history of the process. There are more facts involved in the analysis later on, so separating this need from data warehouse doesn't really work in this case.
I’ve encountered similar use cases in the past, where an accumulating snapshot would be the default solution.
However, the accumulating snapshot doesn’t allow processes with varying length. I’ve designed a different pattern, when 2 rows are added for each event: if an object goes from state A to state B you first insert a row with state A and quantity -1, then a new one with state B and quantity +1.
The end result allows:
- no updates necessary, only inserts;
- map-reduce friendly;
- arbitrary length processes;
- counting how many of each in each state at any point in time (with the help of a periodic snapshot for performance reasons);
- how many entered or left any state at any point in time.;
- calculate time in each state and age overall.
Details in 5 blog posts here (with implementation in Pentaho Data Integration):
http://ubiquis.co.uk/dwh/status-change-fact-table-part-1-the-problem/

How to normalize sequence of numbers?

I am working user behavior project. Based on user interaction I have got some data. There is nice sequence which smoothly increases and decreases over the time. But there are little discrepancies, which are very bad. Please refer to graph below:
You can also find data here:
2.0789 2.09604 2.11472 2.13414 2.15609 2.17776 2.2021 2.22722 2.25019 2.27304 2.29724 2.31991 2.34285 2.36569 2.38682 2.40634 2.42068 2.43947 2.45099 2.46564 2.48385 2.49747 2.49031 2.51458 2.5149 2.52632 2.54689 2.56077 2.57821 2.57877 2.59104 2.57625 2.55987 2.5694 2.56244 2.56599 2.54696 2.52479 2.50345 2.48306 2.50934 2.4512 2.43586 2.40664 2.38721 2.3816 2.36415 2.33408 2.31225 2.28801 2.26583 2.24054 2.2135 2.19678 2.16366 2.13945 2.11102 2.08389 2.05533 2.02899 2.00373 1.9752 1.94862 1.91982 1.89125 1.86307 1.83539 1.80641 1.77946 1.75333 1.72765 1.70417 1.68106 1.65971 1.64032 1.62386 1.6034 1.5829 1.56022 1.54167 1.53141 1.52329 1.51128 1.52125 1.51127 1.50753 1.51494 1.51777 1.55563 1.56948 1.57866 1.60095 1.61939 1.64399 1.67643 1.70784 1.74259 1.7815 1.81939 1.84942 1.87731
1.89895 1.91676 1.92987
I would want to smooth out this sequence. The technique should be able to eliminate numbers with characteristic of X and Y, i.e. error in mono-increasing or mono-decreasing.
If not eliminate, technique should be able to shift them so that series is not affected by errors.
What I have tried and failed:
I tried to test difference between values. In some special cases it works, but for sequence as presented in this the distance between numbers is not such that I can cut out errors
I tried applying a counter, which is some X, then only change is accepted otherwise point is mapped to previous point only. Here I have great trouble deciding on value of X, because this is based on user-interaction, I am not really controller of it. If user interaction is such that its plot would be a zigzag pattern, I am ending up with 'no user movement data detected at all' situation.
Please share the techniques that you are aware of.
PS: Data made available in this example is a particular case. There is no typical pattern in which numbers are going to occure, but we expect some range to be continuous with all the examples. Solution I am seeking is generic.
I do not know how much effort you want to involve in this problem but if you want theoretical guaranties,
topological persistence seems well adapted to your problem imho.
Basically with that method, you can filtrate local maximum/minimum by fixing a scale
and there are theoritical proofs that says that if you sampling is
close from your function, then you extracts correct number of maximums with persistence.
You can see these slides (mainly pages 7-9 to get the idea) to get an idea of the method.
Basically, if you take your points as a landscape and imagine a watershed starting from maximum height and decreasing, you have some picks.
Every pick has a time where it is born which is the time where it becomes emerged and a time where it dies which is when it merges with an higher pick. Now a persistence diagram pictures a point for every pick where its x/y coordinates are its time of birth/death (by assumption the first pick does not die and is not shown).
If a pick is a global maximal, then it will be further from the diagonal in the persistence diagram than a local maximum pick. To remove local maximums you have to remove picks close to the diagonal. There are fours local maximums in your example as you can see with the persistence diagram of your data (thanks for providing the data btw) and two global ones (the first pick is not pictured in a persistence diagram):
If you noise your data like that :
You will still get a very decent persistence diagram that will allow you to filter local maximum as you want :
Please ask if you want more details or references.
Since you can not decide on a cut off frequency, and not even on the filter you want to use, I would implement several, and let the user set the parameters.
The first thing that I thought of is running average, and you can see that there are so many things to set, to get different outputs.

Optimization run assistance

I am running the optimisation of two sets of data against each other and am after some assistance as to looking up settings of the run based on the calculated results. I'll explain....
I run 2 data lines against each other (think graph lines) - Line A and Line B. These lines have crossing points - upward and downward based on the direction of each line.e.g. Line A is going up and Line B is going down is an 'Upwards cross' and Line A going down and Line B going up is a 'Downward cross'.The program calculates financial analysis.
I analyze the crossing points and gain a resultant 'Rank' from the analysis based on a set of rules. The rank is a single integer.
Line A has a number of settings for the optimisation run e.g. Window 1 from a value of 10 to 20 and window 2 at a value of 30 to 40. Line B also has settings.
When I run the optimisation I iterate through the parameters available for each line and calculate the rank. The result of the optimisation run is a list of the ranks which is the size of the number of permutations avaliable.
So my question is this:
What is the best way to look up the line settings from the calculated rank using a position (index) in the rank list. The optimisation settings used to create the run will be stored for that rank run and can be used for the look-up.
I also will be adding additional parameters in the future to the system for the line so I want the program to take into account additional future line settings without affecting any rank files created previous to adding the new parameter.
In addition to that I want to be able to find out an index based on a particular setting included in the optimisation run (the reverse look-up of the previous method).
I want to avoid versioning for backward compatability if at all possible so that the lookup algorithm will be self-sufficient.
Is a hash table suitable for this purpose or do you have any implementation techniques that would fit better? Do you have any examples of this type of operation in action in C++?
Thanks,
Chris.
If I understand correctly, you have a bunch of associated data (settings + rank), on which you would like to be able to perform lookups with different key types. If so, then Boost.MultiIndex sounds like what you're looking for.