Using GAMS/CPLEX from Python PYOMO - pyomo

I noticed that Pyomo 5.3 offers a GAMS solver plugin.
https://github.com/Pyomo/pyomo/blob/master/pyomo/solvers/plugins/solvers/GAMS.py
This is very exciting, as we have a GAMS/CPLEX license where we can use CPLEX as solver, but only via GAMS. With the new Pyomo-Gams interface, it should from my understanding be possible to formulate a problem in Pyomo, and have it translated to GAMS and solved by CPLEX.
However, when I test this with the shell integration, it is very slow (40s for 30 solves of a small MIP versus 6s with glpk/ipopt/cbc). Also, the documentation of the plugin is effectively non-existent.
But maybe someone of you has some experience using that interface and can help me with it
does pyomo actually translate the pyomo model into gams code? If yes, where can I find the gams-file?
how efficient is the translation, and how should I proceed if I want to solve a small model repeatedly?
what is the difference between using the shell or the GAMS Python API?
is there any place to find documentation about this?
Also, it seems that conda provides Pyomo 5.3 only for Linux/Python 3.6 OR for Windows/Python 2.7
https://anaconda.org/conda-forge/pyomo/files?version=5.3, so I had to use pip to install Pyomo 5.3 on my machine.
Thanks in advance, Theo
import pyomo.environ as pe
# set up the model
model = pe.ConcreteModel()
model.MaxWeight = pe.Param(initialize=0,mutable=True)
model.Item = ['hammer','wrench','screwdriver','towel']
Weight = {'hammer':5,'wrench':7,'screwdriver':4,'towel':3}
Value = {'hammer':8,'wrench':3,'screwdriver':6,'towel':11}
model.x = pe.Var(model.Item,within=pe.Binary)
model.z = pe.Objective(expr=sum(Value[i] * model.x[i] for i in model.Item),sense=pe.maximize)
model.constraint = pe.Constraint(expr=sum(Weight[i]*model.x[i] for i in model.Item) <= model.MaxWeight)
# time execution
solver_list = ['cbc', 'ipopt', 'gams', 'glpk']
for i, solver_name in enumerate(solver_list):
solver = pe.SolverFactory(solver_name)
print(solver_name)
tic = time.time()
for MaxWeight_i in range(0,30):
model.MaxWeight = MaxWeight_i
result = solver.solve(model)
soln_items = list()
for i in model.x:
if pe.value(model.x[i]) > 0.5:
soln_items.append(i)
# print("Maximum Weight =", MaxWeight_i, soln_items)
print("{:7.2f} s".format(time.time()-tic))
print(" ")

This is rather delayed, but I can answer a few of your questions.
First, a basic documentation page was just created for the GAMS interface on readthedocs which you can find at: http://pyomo.readthedocs.io/en/latest/library_reference/solvers/gams.html. Note that this location may change as I believe we are restructuring the documentation tree some time soon, but you should be able to search for "gams" to find it again in the future. If there's more documentation that you believe you or others would like to see, please let me know as I'd be happy to provide anything that would be helpful.
As for the difference between the shell interface and the Python API interface, there really isn't any. I thought there would have been a performance increase by using the API but that didn't seem to be the case in the past (and in fact the one model I tried it on saw the shell interface be faster anyway). If you try both and experience otherwise, again I'd be happy to know that too.

Related

Accessing Pyomo user time with Neos

I teach an optimization course in which I use the Pyomo modeling language to solve problems. I also encourage students to compare solvers using Neos. However, I have not found a way to measure the computational time required to solve the problems.
To explain my point I have created this notebook in Colab (https://github.com/salvapineda/notebooks/blob/main/UserTimePyomoNeos.ipynb)
First, I solve a model using cbc without using NEOS. As you can see, the "Solver Information" includes the time required to solve the problem.
Then, I solve the same model using cbc through NEOS. However, the "Solver Information" does not include any time information.
Is there any way to access the computational time when I am solving Pyomo models in Neos?
Do you mean the breakdown of the computation time? The model in Colab on NEOS returns 0.00389 seconds.
Message: CBC 2.10.3 optimal, objective 3.49; 0 nodes, 2 iterations, 0.00389 seconds

infeasible row in Cplex C++

I have a small question; I am solving MIP Model , coded on C++ and solving by Cplex solver. I remember that when I test the model with relatively smaller instances , it was giving me "infeasibility row …."; Now ,I test the same model on a large size instance and I get the infeasibility and it does not tell me which row causes infeasibility. How can I find the which parameter or constraint causes infeasibility ? While the larger instance is tested, the presolve is performed, may it cause the infeasibility? I googled about conflict refiner but could not find a small and clear example explaining how to invoke it ? I will be very happy, if you have any suggestions or ideas
Thank you
Another way to find where the infeasibility comes is to export your model as an LP file or similar, then try to solve it with the standalone cplex. It helps if you name your variables and constraints sensibly. Then you have all the interactive tools in cplex to help you find where the issues are.
in C++ you should have a look at FeasOpt
In the documentation see
CPLEX > User's Manual for CPLEX > Infeasibility and unboundedness
If you model in OPL you could call the relaxation from concert C++ APIs

Initializing IPOPT when using pyomo parmest

I am learning to use pyomo parmest. I am trying to recreate the following parameter estimation example. The code that I created is in the following jupyter notebook. IPOPT stops with the message of maximum iterations exceeded when using collocation but solves with finite difference discretization. Since it is suggested that collocation is typically more robust, I would like to know what I might be doing wrong in using the collocation discretization.
I had originally used number of collocation points in discretization ncp = 4. When I changed ncp = 2, IPOPT ran without issues. The updated ipython notebook is in this location.

Building models and estimators in tf2.0 without tf.keras

Given that layers API has been deprecated, how do I build models in tf2 without using tf.keras (or what is the recommended way to build models)? Issue #30829 has the same question, but was closed without any answers.
Update:
I'm okay with using tf.keras.layers instead of tf.layers, but once I've built all the layers and I need to return the model, is there a way to NOT use keras model, compile, fit, predict and evaluate, and just do it the tensorflow's way?
If you were wondering why I would want to do something like that, it is that I would like to use estimators to train, rather than keras' fit function. There exists a keras_model_to_estimator, but it seems it's not mature enough yet
Google released migration guide from TF 1 to TF 2, section Converting Models.
Recommended way to build models
Guide (section "Models based on tf.layers") recommends to convert tf.layers models to tf.keras.layers:
The conversion was one-to-one because there is a direct mapping from v1.layers to tf.keras.layers.
Build models without Keras
The option is to provide own layer implementations (example from the guide):
W = tf.Variable(tf.ones(shape=(2,2)), name="W")
b = tf.Variable(tf.zeros(shape=(2)), name="b")
#tf.function
def forward(x):
return W * x + b
out_a = forward([1,0])
print(out_a)
But it is worth to consider tf.keras.layers.Layer (example), which gives some degree of freedom, but integrate with rest of Keras (and it's layers).
Even with layers written with tf.keras, you are able to write own training loop (example).
To sum up, in practice TF 2.0 requires you to use tf.keras.

TSFRESH library for python is taking way too long to process

I came across the TSfresh library as a way to featurize time series data. The documentation is great, and it seems like the perfect fit for the project I am working on.
I wanted to implement the following code that was shared in the quick start section of the TFresh documentation. And it seems simple enough.
from tsfresh import extract_relevant_features
feature_filtered_direct=extract_relevant_features(result,y,column_id=0,column_sort=1)
My data included 400 000 rows of sensor data, with 6 sensors each for 15 different id's. I started running the code, and 17 hours later it still had not finished. I figured this might be too large of a data set to run through the relevant feature extractor, so I trimmed it down to 3000, and then further down to 300. None of these actions made the code run under an hour, and I just ended up shutting it down after an hour or so of waiting. I tried the standard feature extractor as well
extracted_features = extract_features(timeseries, column_id="id", column_sort="time")
Along with trying the example dataset that TSfresh presents on their quick start section. Which includes a dataset that is very similar to my orginal data, with about the same amount of data points as I reduced to.
Does anybody have any experience with this code? How would you go about making it work faster? I'm using Anaconda for python 2.7.
Update
It seems to be related to multiprocessing. Because I am on windows, using the multiprocess code requires to be protected by
if __name__ == "__main__":
main()
Once I added
if __name__ == "__main__":
extracted_features = extract_features(timeseries, column_id="id", column_sort="time")
To my code, the example data worked. I'm still having some issues with running the extract_relevant_features function and running the extract features module on my own data set. It seems as though it continues to run slowly. I have a feeling its related to the multiprocess freeze as well, but without any errors popping up its impossible to tell. Its taking me about 30 minutes to run to extract features on less than 1% of my dataset.
which version of tsfresh did you use? Which OS?
We are aware of the high computational costs of some feature calculators. There is less we can do about it. In the future we will implement some tricks like caching to increase the efficiency of tsfresh further.
Have you tried calculating only the basic features by using the MinimalFeatureExtractionSettings? It will only contain basic features such as Max, Min, Median and so on but should run way, way faster.
from tsfresh.feature_extraction import MinimalFeatureExtractionSettings
extracted_features = extract_features(timeseries, column_id="id", column_sort="time", feature_extraction_settings = MinimalFeatureExtractionSettings())
Also it is probably a good idea to install the latest version from the repo by pip install git+https://github.com/blue-yonder/tsfresh. We are actively developing it and the master should contain the newest and freshest version ;).
Syntax has changed slightly (see docs), the current approach would be:
from tsfresh.feature_extraction import EfficientFCParameters, MinimalFCParameters
extract_features(timeseries, column_id="id", column_sort="time", default_fc_parameters=MinimalFCParameters())
Or
extract_features(timeseries, column_id="id", column_sort="time", default_fc_parameters=EfficientFCParameters())
Since version 0.15.0 we have improved our bindings for Apache Spark and dask.
It is now possible to use the tsfresh feature extraction directly in your usual dask or Spark computation graph.
You can find the bindings in tsfresh.convenience.bindings with the documentation here. For example for dask, it would look something like this (assuming df is a dask.DataFrame, for example the robot failure dataframe from our example)
df = df.melt(id_vars=["id", "time"],
value_vars=["F_x", "F_y", "F_z", "T_x", "T_y", "T_z"],
var_name="kind", value_name="value")
df_grouped = df.groupby(["id", "kind"])
features = dask_feature_extraction_on_chunk(df_grouped, column_id="id", column_kind="kind",
column_sort="time", column_value="value",
default_fc_parameters=EfficientFCParameters())
# or any other parameter set
Using either dask or Spark (or anything alike) might help you with very large data - both for memory as well as speed (as you can distribute the work over multiple machines). Of course, we still support the usual distributors (docu) as before.
Additional to that, it is also possible to run tsfresh together with a task orchestration system, such as luigi. You can create a task to
* read in the data for only one id and kind
* extract the features
* write out the result to disk
and let luigi handle all the rest. You may find a possible implementation of this here on my blog.
I've found, at least on a multicore machine, that a better way to distribute extract_features calculation over independent subgroups (identified by the column_id value) is through joblib.Parallel with the Loky backend.
For example, you define your features extraction function on a single value of columnd_id and you apply it
from joblib import Parallel, delayed
def map_extract_features(df):
return extract_features(
timeseries_container=df,
default_fc_parameters=settings,
column_id="ID",
column_sort="DATE",
n_jobs=1,
disable_progressbar=True
).reset_index().rename({"index":"ID_CONTO"}, axis=1)
out = Parallel(n_jobs=cpu_count()-1)(
delayed(map_extract_features)(
my_dataframe[my_dataframe["ID"]==id]
) for id in tqdm(my_dataframe["ID"].unique())
)
This method takes way less memory than specifying column_id directly in the extract_features function.