Failed testing on Ubuntu 12.04 - python-2.7

After installing the scikit-learn from source code of version 0.14.1 by 'sodu python setup.py install', I tested the package by 'nosetests sklearn --exe', and received the following information:
==================================================================================
/home/elkan/Downloads/MS2PIP/scikit-learn/sklearn/feature_selection/selector_mixin.py:7: DeprecationWarning: sklearn.feature_selection.selector_mixin.SelectorMixin has been renamed sklearn.feature_selection.from_model._LearntSelectorMixin, and this alias will be removed in version 0.16
DeprecationWarning)
/home/elkan/Downloads/MS2PIP/scikit-learn/sklearn/pls.py:7: DeprecationWarning: This module has been moved to cross_decomposition and will be removed in 0.16
"removed in 0.16", DeprecationWarning)
.......S................../home/elkan/Downloads/MS2PIP/scikit-learn/sklearn/cluster/hierarchical.py:746: DeprecationWarning: The Ward class is deprecated since 0.14 and will be removed in 0.17. Use the AgglomerativeClustering instead.
"instead.", DeprecationWarning)
.........../usr/lib/python2.7/dist-packages/numpy/distutils/system_info.py:1423: UserWarning:
Atlas (http://math-atlas.sourceforge.net/) libraries not found.
Directories to search for the libraries can be specified in the
numpy/distutils/site.cfg file (section [atlas]) or by setting
the ATLAS environment variable.
warnings.warn(AtlasNotFoundError.__doc__)
.............................................../home/elkan/Downloads/MS2PIP/scikit-learn/sklearn/manifold/spectral_embedding_.py:226: UserWarning: Graph is not fully connected, spectral embedding may not work as expected.
warnings.warn("Graph is not fully connected, spectral embedding"
..................................SS..............S.................................................../home/elkan/Downloads/MS2PIP/scikit-learn/sklearn/utils/extmath.py:83: NonBLASDotWarning: Data must be of same type. Supported types are 32 and 64 bit float. Falling back to np.dot.
'Falling back to np.dot.', NonBLASDotWarning)
....................../home/elkan/Downloads/MS2PIP/scikit-learn/sklearn/decomposition/fastica_.py:271: UserWarning: Ignoring n_components with whiten=False.
warnings.warn('Ignoring n_components with whiten=False.')
..................../home/elkan/Downloads/MS2PIP/scikit-learn/sklearn/utils/extmath.py:83: NonBLASDotWarning: Data must be of same type. Supported types are 32 and 64 bit float. Falling back to np.dot.
'Falling back to np.dot.', NonBLASDotWarning)
....................................S................................../home/elkan/Downloads/MS2PIP/scikit-learn/sklearn/externals/joblib/test/test_func_inspect.py:134: UserWarning: Cannot inspect object <functools.partial object at 0xbdebf04>, ignore list will not work.
nose.tools.assert_equal(filter_args(ff, ['y'], (1, )),
FAIL: Check that gini is equivalent to mse for binary output variable
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/elkan/Downloads/MS2PIP/scikit-learn/sklearn/tree/tests/test_tree.py", line 301, in test_importances_gini_equal_mse
assert_almost_equal(clf.feature_importances_, reg.feature_importances_)
File "/usr/lib/python2.7/dist-packages/numpy/testing/utils.py", line 452, in assert_almost_equal
return assert_array_almost_equal(actual, desired, decimal, err_msg)
File "/usr/lib/python2.7/dist-packages/numpy/testing/utils.py", line 800, in assert_array_almost_equal
header=('Arrays are not almost equal to %d decimals' % decimal))
File "/usr/lib/python2.7/dist-packages/numpy/testing/utils.py", line 636, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not almost equal to 7 decimals
(mismatch 70.0%)
x: array([ 0.2925143 , 0.27676187, 0.18835709, 0.04181255, 0.03699054,
0.01668818, 0.03661717, 0.03439216, 0.04422749, 0.03163866])
y: array([ 0.29599052, 0.27676187, 0.19146823, 0.03837769, 0.03699054,
0.01811955, 0.0362238 , 0.03439216, 0.04137032, 0.03030531])
>> raise AssertionError('\nArrays are not almost equal to 7 decimals\n\n(mismatch 70.0%)\n x: array([ 0.2925143 , 0.27676187, 0.18835709, 0.04181255, 0.03699054,\n 0.01668818, 0.03661717, 0.03439216, 0.04422749, 0.03163866])\n y: array([ 0.29599052, 0.27676187, 0.19146823, 0.03837769, 0.03699054,\n 0.01811955, 0.0362238 , 0.03439216, 0.04137032, 0.03030531])')
----------------------------------------------------------------------
Ran 3950 tests in 150.890s
FAILED (SKIP=19, failures=1)
==================================================================================
The python version is 2.7.3, OS is 32 bit.
So, what the problem might be?
Thanks.

It's a numerical precision discrepancy on 32 bit platforms. You can safely ignore it as the failing test is checking the values of the attribute clf.feature_importances_ of a random forest which usually do not need to be precise to be useful (interpretation of the most important features contributing to the RF model).

Related

pyomo mindtpy example program when run becomes unfeasible for binary variable

So I installed pyomo, glpk, and ipopt with anaconda,
When I run the example code here: https://pyomo.readthedocs.io/en/stable/contributed_packages/mindtpy.html
from pyomo.environ import *
model = ConcreteModel()
model.x = Var(bounds=(1.0,10.0),initialize=5.0)
model.y = Var(within=Binary)
model.c1 = Constraint(expr=(model.x-3.0)**2 <= 50.0*(1-model.y))
model.c2 = Constraint(expr=model.x*log(model.x)+5.0 <= 50.0*(model.y))
model.objective = Objective(expr=model.x, sense=minimize)
SolverFactory('mindtpy').solve(model, mip_solver='glpk', nlp_solver='ipopt',tee=True)
model.objective.display()
model.display()
model.pprint()
I get the output that the binary variable has apparently become infeasible:
python minlpex.py
INFO: ---Starting MindtPy---
INFO: Original model has 2 constraints (2 nonlinear) and 0 disjunctions, with
2 variables, of which 1 are binary, 0 are integer, and 1 are continuous.
INFO: NLP 1: Solve relaxed integrality
INFO: NLP 1: OBJ: 1.0 LB: 1.0 UB: inf
INFO: ---MindtPy Master Iteration 0---
INFO: MIP 1: Solve master problem.
WARNING: Empty constraint block written in LP format - solver may error
Traceback (most recent call last):
File "minlpex.py", line 13, in <module>
op.SolverFactory('mindtpy').solve(model, mip_solver='glpk', nlp_solver='ipopt',tee=True)
File "/anaconda3/envs/py36/lib/python3.6/site-packages/pyomo/contrib/mindtpy/MindtPy.py", line 370, in solve
MindtPy_iteration_loop(solve_data, config)
File "/anaconda3/envs/py36/lib/python3.6/site-packages/pyomo/contrib/mindtpy/iterate.py", line 30, in MindtPy_iteration_loop
handle_master_mip_optimal(master_mip, solve_data, config)
File "/anaconda3/envs/py36/lib/python3.6/site-packages/pyomo/contrib/mindtpy/mip_solve.py", line 62, in handle_master_mip_optimal
config)
File "/anaconda3/envs/py36/lib/python3.6/site-packages/pyomo/contrib/gdpopt/util.py", line 199, in copy_var_list_values
v_to.set_value(value(v_from, exception=False))
File "/anaconda3/envs/py36/lib/python3.6/site-packages/pyomo/core/base/var.py", line 173, in set_value
if valid or self._valid_value(val):
File "/anaconda3/envs/py36/lib/python3.6/site-packages/pyomo/core/base/var.py", line 185, in _valid_value
"domain %s" % (val, type(val), self.domain))
ValueError: Numeric value `0.22709088987977885` (<class 'float'>) is not in domain Binary
So I was a little confused, since this was a code provided, I would not expect it to error like this. So I feel like I'm messing something up or I am missing some required library?
Thanks a lot.
Looks like something must be wrong with the conda pyomo install or ipopt install.
When I reinstalled using pip for ipopt and compiling pyomo from github source everything works fine.

Python 2.7 pickle won't recognize numpy multiarray

I need to load a set of pickled data from a collaborator. Problem is, it seems I need multiarray for this. My code is as below:
f = open('data.p', 'rb')
a = pickle.load(f)
And here is the error message.
ImportError Traceback (most recent call last)
<ipython-input-3-17918c47ae2d> in <module>()
----> 1 a = pk.load(f)
/usr/lib/python2.7/pickle.pyc in load(file)
1382
1383 def load(file):
-> 1384 return Unpickler(file).load()
1385
1386 def loads(str):
/usr/lib/python2.7/pickle.pyc in load(self)
862 while 1:
863 key = read(1)
--> 864 dispatch[key](self)
865 except _Stop, stopinst:
866 return stopinst.value
/usr/lib/python2.7/pickle.pyc in load_global(self)
1094 module = self.readline()[:-1]
1095 name = self.readline()[:-1]
-> 1096 klass = self.find_class(module, name)
1097 self.append(klass)
1098 dispatch[GLOBAL] = load_global
/usr/lib/python2.7/pickle.pyc in find_class(self, module, name)
1128 def find_class(self, module, name):
1129 # Subclasses may override this
-> 1130 __import__(module)
1131 mod = sys.modules[module]
1132 klass = getattr(mod, name)
ImportError: No module named multiarray
I thought it was the problem of the compiled numpy in my computer. So I uninstalled the numpy from my Arch Linux repo and installed the numpy through
sudo -H pip2 install numpy
Yet the problem persist. I have checked the folder $PACKAGE-SITE/numpy/core, multiarray.so is in it. And I have no idea why pickle can't load the module.
How can I solve the problem? What else do I need to do?
PS1. I am using Arch Linux. And tried all versions of python 2.7 since last year October. None of them works.
PS2. Since the problem is with the loading step. I suspect the problem being more likely from internal conflicts of python rather than from the data file.
Thanks to #MikeMcKems, the problem is now solved.
The issue is caused by different special symbols used MS Windows and Linux(eg. end of line symbol). My collaborator was using Windows machine, and saved the data with
pickle.dump(obj, 'filename', 'w')
The data was saved in plain text with a lot of special symbols in it. And when I load the data with my Linux machine, the symbols were misintepreted hence causing the problem.
The easiest way to solve it is to find a Windows machine, load the data with
a=pickle.load(open('filename_in', 'r'))
Then output with binary form
pickle.dump(a, open('filename_out', 'wb'))
Since binary data is universally recognized as long as you use pickle to read it, the file filename_out is easily recognizable by Python in linux.

Python int too large to convert to C long in python 3.4

I am getting this error when I am trying to run word2vec from gensim library of python. I am using python 3.4 and OS is windows 7. I have also attached complete stacktrace as well.
I read online and it says that this is an issue with python 2.x, but I am getting in python 3.4
model = word2vec.Word2Vec(sentences, workers=num_workers, \
size=num_features, min_count = min_word_count, \
window = context, sample = downsampling)
Traceback (most recent call last):
File "<pyshell#137>", line 3, in <module>
window = context, sample = downsampling)
File "C:\Python34\lib\site-packages\gensim\models\word2vec.py", line 417, in __init__
self.build_vocab(sentences)
File "C:\Python34\lib\site-packages\gensim\models\word2vec.py", line 483, in build_vocab
self.finalize_vocab() # build tables & arrays
File "C:\Python34\lib\site-packages\gensim\models\word2vec.py", line 611, in finalize_vocab
self.reset_weights()
File "C:\Python34\lib\site-packages\gensim\models\word2vec.py", line 888, in reset_weights
self.syn0[i] = self.seeded_vector(self.index2word[i] + str(self.seed))
File "C:\Python34\lib\site-packages\gensim\models\word2vec.py", line 900, in seeded_vector
once = random.RandomState(uint32(self.hashfxn(seed_string)))
OverflowError: Python int too large to convert to C long
Take note that python (2 and 3) support integers of arbitrary size - python will just keep on adding additional "digits" (actually groups of them in longs) when you reach the current maximum size. The only differences between py2 and py3 is that the former will start with an actual C int or long before going to the arbitrary size python long. In py3, you always get the python long type.
Long story short: check the size of your integers.

Different behaviour for io in pickle with string content

When working with pickled data I encountered a different behavior for the io.open and __builtin__.open. Consider the following simple example:
import pickle
payload = 'foo'
fn = 'test.pickle'
pickle.dump(payload, open(fn, 'w'))
a = pickle.load(open(fn, 'r'))
This works as expected. But running this code here:
import pickle
import io
payload = 'foo'
fn = 'test.pickle'
pickle.dump(payload, io.open(fn, 'w'))
a = pickle.load(io.open(fn, 'r'))
gives the following Traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "D:/**.py", line 15, in <module>
pickle.dump(payload, io.open(fn, 'w'))
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 1370, in dump
Pickler(file, protocol).dump(obj)
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 224, in dump
self.save(obj)
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 488, in save_string
self.write(STRING + repr(obj) + '\n')
TypeError: must be unicode, not str
As I want to be future-compatible, how can I circumwent this misbehavior? Or, what else am I doing wrong here?
I stumbled over this when dumping dictionaries with keys of type string.
My python version is:
'2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)]'
The difference is not supprising, because io.open() explicitly deals with Unicode strings when using text mode. The documentation is quite clear about this:
Note: Since this module has been designed primarily for Python 3.x, you have to be aware that all uses of “bytes” in this document refer to the str type (of which bytes is an alias), and all uses of “text” refer to the unicode type. Furthermore, those two types are not interchangeable in the io APIs.
and
Python distinguishes between files opened in binary and text modes, even when the underlying operating system doesn’t. Files opened in binary mode (including 'b' in the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when 't' is included in the mode argument), the contents of the file are returned as unicode strings, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.
You need to open files in binary mode. The fact that it worked without with the built-in open() at all is actually more luck than wisdom; if your pickles contained data with \n and/or \r bytes the pickle loading may well fail. The Python 2 default pickle happens to be a text protocol but the output should still be considered as binary.
In all cases, when writing pickle data, use binary mode:
pickle.dump(payload, open(fn, 'wb'))
a = pickle.load(open(fn, 'rb'))
or
pickle.dump(payload, io.open(fn, 'wb'))
a = pickle.load(io.open(fn, 'rb'))

Specifying select features to be categorical using OneHotEncoder in sklearn 0.14

I am using the sklearn 0.14 module in Python to create a decision tree. I was hoping to use the OneHotEncoder to convert some features into categorical features. According to the documentation, I should be able to provide an array of indices to indicate which features should be converted. However, trying the following code:
xs = [[64, 15230], [3, 67673], [16, 43678]]
encoder = preprocessing.OneHotEncoder(n_values='auto', categorical_features=[1], dtype=numpy.integer)
encoder.fit(xs)
I receive the following error:
Traceback (most recent call last): File
"C:\Users\sara\Documents\Shipping
Project\PythonSandbox\CarrierDecisionTree.py", line 35, in <module>
encoder.fit(xs) File "C:\Python27\lib\site-packages\sklearn\preprocessing\data.py", line
892, in fit
self.fit_transform(X) File "C:\Python27\lib\site-packages\sklearn\preprocessing\data.py", line
944, in fit_transform
self.categorical_features, copy=True) File "C:\Python27\lib\site-packages\sklearn\preprocessing\data.py", line
795, in _transform_selected
return sparse.hstack((X_sel, X_not_sel)) File "C:\Python27\lib\site-packages\scipy\sparse\construct.py", line 417,
in hstack
return bmat([blocks], format=format, dtype=dtype) File "C:\Python27\lib\site-packages\scipy\sparse\construct.py", line 532,
in bmat
dtype = upcast( *tuple([A.dtype for A in blocks[block_mask]]) ) File "C:\Python27\lib\site-packages\scipy\sparse\sputils.py", line 53,
in upcast
raise TypeError('no supported conversion for types: %r' % (args,)) TypeError: no supported conversion for types: (dtype('int32'),
dtype('S6'))
If instead, I provide the array [0, 1] to categorical_features, it works correctly and converts both features properly. The same correct behavior occurs with using 'all' to categorical_features. However, I only want the second feature converted and not the first. I understand I could do this manually by converting one feature at a time, but I was hoping to use all the beauty of OneHotEncoder as I will be using many more features later on.
Posting as an answer, for the record:
TypeError: no supported conversion for types: (dtype('int32'), dtype('S6'))
means something in the true xs (not the one shown in the code snippet) is a string: dtype('S6') is NumPy's length-six string type.