I've installed all the requirements and successful in running the local server but when I tried to run the prediction part of this 4th_umpire(The cricket match predictor using Random Forest algo project I'm getting the following error:-
Here I'm presenting the error part of the code as mentioned in the error image.
def _transform(self, X, handle_unknown='error'):
X_list, n_samples, n_features = self._check_X(X)
X_int = np.zeros((n_samples, n_features), dtype=np.int)
X_mask = np.ones((n_samples, n_features), dtype=np.bool)
if n_features != len(self.categories_):
raise ValueError(
"The number of features in X is different to the number of "
"features of the fitted data. The fitted data had {} features "
"and the X has {} features."
.format(len(self.categories_,), n_features)
)
The exception line is if n_features != len(self.categories_): although I've checked that OneHotEncoder part and it seems ok to me.
Related
I have to deliver a Machine Learning project, and I received a file called tester.py. After I've finished writing my code in another file, I have to run tester.py to see the results, but I am getting a error: TypeError: 'StratifiedShuffleSplit' object is not iterable
I have researched this error in another topics and website, the solution is always the same: use sklearn.model_selection to import GridSearchCV. I am already doing that since the beginning, but the file tester.py not run.
The part of code from tester.py that occurs the problem is:
def main():
### load up student's classifier, dataset, and feature_list
clf, dataset, feature_list = load_classifier_and_data()
### Run testing script
test_classifier(clf, dataset, feature_list)
if __name__ == '__main__':
main()
My own code works fine.
Any help?
Try changing the following lines of tester.py
The way of working of the current version of StratifiedShuffleSplit is different that the expected when tester.py was developed.
[..]
from sklearn.model_selection import StratifiedShuffleSplit
[..]
#cv = StratifiedShuffleSplit(labels, folds, random_state = 42)
cv = StratifiedShuffleSplit(n_splits=folds, random_state=42)
[..]
#for train_idx, test_idx in cv:
for train_idx, test_idx in cv.split(features, labels):
[..]
I hope you find it useful
I have the following constraint in a simple MINLP model:
model.cm2=Constraint(expr = model.xB2 == log(1.0+model.xA2))
This works when I call bonmin (windows64 binary distribution from AMPL)
When swithing to the couenne solver I need to convert to log10 base
model.cm2=Constraint(expr = model.xB2 == 2.3*log10(1.0+model.xA2))
otherwise I get the error:
ApplicationError: Solver (asl) did not exit normally.
model.pprint() gives in the first case:
cm2 : Size=1, Index=None, Active=True
Key : Lower : Body : Upper : Active
None : 0.0 : xB2 - log( 1.0 + xA2 ) : 0.0 : True
I use the anaconda python installation and work with spyder.
Do anyone have an idea of the reason for this behaviour?
I have read the comment from jsiirola, but I do not think that the problem is evaluating the log of a negative number. Here is a complete test problem, that behaves the same way. If I solve with bonmin i can use log() if I use cuonne I have to use ln(10)*log10().
from pyomo.environ import *
solverpathb="..\\solversAMPL\\bonmin\\bonmin"
solverpathc="..\\solversAMPL\\couenne\\couenne"
model=ConcreteModel()
model.x = Var(within=NonNegativeReals,bounds=(1,2),doc='Nonnegative')
model.y = Var(within=NonNegativeReals,doc='Nonnegative')
model.obj = Objective(expr= model.x+model.y, sense=maximize)
model.c1=Constraint(expr = model.y == log(1.0+model.x))
#model.c2=Constraint(expr = model.y == 2.3*log10(1.0+model.x))
#Works with version c1 and c2 of the constraint
#solver = pyomo.opt.SolverFactory("bonmin", executable=solverpathb)
#only constraint c2 works with this solver
solver = pyomo.opt.SolverFactory("couenne", executable=solverpathc)
results = solver.solve(model, tee = True)
model.display()
The log file that should include errors only includes the model. This is the last part of the errors.
...
File "C:/../testproblem.py", line 24, in
results = solver.solve(model, tee = True)
File "C:\Users..\Python\Python36\site-packages\pyomo\opt\base\solvers.py", line 623, in solve
"Solver (%s) did not exit normally" % self.name)
ApplicationError: Solver (asl) did not exit normally.
Note: I can use the following code for the object function, also with couenne
model.obj = Objective(expr= model.x+log(1.0+model.x), sense=maximize)
First, the entire model would be most useful when debugging problems like this. Second, what error is being thrown by Couenne when it exits abnormally?
As to answers:
Pyomo uses the same interface for BONMIN and Couenne (they are both through the ASL), so you should never have to change your expression just because you switched solvers. log and log10 are not the same function, of course, so you are not solving the same problem.
I suspect the problem is that model.xA2 is going less than or equal to -1 (i.e., the solver is asking the ASL to evaluate log(0). The way to verify this is by looking at the solver log. Additionally, you will want to make sure that Pyomo sends "symbolic" labels to the solver so that the error will reference actual variable / constraint names and not just "x1, x2, x3, ..." and "c1, c2, c3, ..." in the error message.
Couenne is a global solver, whereas BONMIN is a local solver. As this is a nonconvex problem, Couenne is probably exploring parts of the solution space that BONMIN never went to.
Using a binary distribution from COIN-OR solved the problem, it was not a pyomo problem. The binary distribution of couenne downloaded from ampl.com doesn't, for some odd reason, accept the log-function, only log10.
[short summary: how to use TF high-level Estimator on Python with an external file reader? or with feed_dict?]
Been struggling with this for few days, couldn't find any solution on-line...
I'm using TF high-level modules (tf.contrib.learn.Estimator on tf1.0, or tf.estimator.Estimator on tf1.1),
features and targets (x/y) inputted through an input_fn, and the graph built on the model_fn.
Already trained a nn on 'small' data sets, in which the whole input is the part of the graph, using slice_input_producer etc. (I can push an example to github if it serves ppl here).
I try to train a larger nn on 'heavier' data-sets (10s-100s GB).
I have an external Python reader that does some nasty binary file reading, which I really don't want to get into.
This reader has its own queue.Queue with m1 samples. When I use it to extract the m1 {features} & {targets}, the net simply saves all these samples as const. in the first layer of the graph... completely undesired.
I try to either -
feed the output of the external file reader as input to my graph.
define a proper tf queue object that will keep updating the queue (each time a sample is dequeued, i want a completely other sample to be enqueued).
Reminding that I use the "high level", e.g.
self.Estimator = tf.contrib.learn.Estimator(
model_fn=self.model_fn,
model_dir=self.config['model_dir'],
config=tf.contrib.learn.RunConfig( ... ) )
def input_fn(self, mode):
batch_data = self.data[mode].next() # pops out a batch of samples, as numpy 4D matrices
... # some processing of batch data
features_dict = dict(data=batch_data.pop('data'))
targets_dict = batch_data
return features_dict, targets_dict
self.Estimator.fit(input_fn=lambda: self.input_fn(modekeys.TRAIN))
Attached is a final solution for integrating an external reader into the high-level TF api (tf.contrib.learn.Estimator / tf.estimator.Estimator).
Please note:
the architecture and "logic" is not important. it's a stupid simple net.
the external reader outputs a dictionary of numpy matrices.
the input_fn is using this reader.
In order to verify that the reader "pulls new values", I both
save the recent value to self.status (should be > 1.0)
save a summary, to be viewed in tensorboard.
Code example is in gist, and below.
import tensorflow as tf
import numpy as np
modekeys = tf.contrib.learn.ModeKeys
tf.logging.set_verbosity(tf.logging.DEBUG)
# Tested on python 2.7.9, tf 1.1.0
class inputExample:
def __init__(self):
self.status = 0.0 # tracing which value was recently 'pushed' to the net
self.model_dir = 'temp_dir'
self.get_estimator()
def input_fn(self):
# returns features and labels dictionaries as expected by tf Estimator's model_fn
data, labels = tf.py_func(func=self.input_fn_np, inp=[], Tout=[tf.float32, tf.float32], stateful=True)
data.set_shape([1,3,3,1]) # shapes are unknown and need to be set for integrating into the network
labels.set_shape([1,1,1,1])
return dict(data=data), dict(labels=labels)
def input_fn_np(self):
# returns a dictionary of numpy matrices
batch_data = self.reader()
return batch_data['data'], batch_data['labels']
def model_fn(self, features, labels, mode):
# using tf 2017 convention of dictionaries of features/labels as inputs
features_in = features['data']
labels_in = labels['labels']
pred_layer = tf.layers.conv2d(name='pred', inputs=features_in, filters=1, kernel_size=3)
tf.summary.scalar(name='label', tensor=tf.squeeze(labels_in))
tf.summary.scalar(name='pred', tensor=tf.squeeze(pred_layer))
loss = None
if mode != modekeys.INFER:
loss = tf.losses.mean_squared_error(labels=labels_in, predictions=pred_layer)
train_op = None
if mode == modekeys.TRAIN:
train_op = tf.contrib.layers.optimize_loss(
loss=loss,
learning_rate = 0.01,
optimizer = 'SGD',
global_step = tf.contrib.framework.get_global_step()
)
predictions = {'estim_exp': pred_layer}
return tf.contrib.learn.ModelFnOps(mode=mode, predictions=predictions, loss=loss, train_op=train_op)
def reader(self):
self.status += 1
if self.status > 1000.0:
self.status = 1.0
return dict(
data = np.random.randn(1,3,3,1).astype(dtype=np.float32),
labels = np.sin(np.ones([1,1,1,1], dtype=np.float32)*self.status)
)
def get_estimator(self):
self.Estimator = tf.contrib.learn.Estimator(
model_fn = self.model_fn,
model_dir = self.model_dir,
config = tf.contrib.learn.RunConfig(
save_checkpoints_steps = 10,
save_summary_steps = 10,
save_checkpoints_secs = None
)
)
if __name__ == '__main__':
ex = inputExample()
ex.Estimator.fit(input_fn=ex.input_fn)
You can use tf.constant if you have the training data already in python memory as shown in the abalone TF example: https://github.com/tensorflow/tensorflow/blob/r1.1/tensorflow/examples/tutorials/estimators/abalone.py#L138-L141
Note: copying the data from disk to Python to TensorFlow is often less efficient than constructing an input pipeline in TensorFlow (i.e. loading data from disk directly into TensorFlow Tensors), such as using tf.contrib.learn.datasets.base.load_csv_without_header.
I have been trying to run the file blei_lda.py from chapter 4 in the book Building Machine Learning Systems with Python with no success. I am using Python 2.7 with Enthought Canopy GUI. Below is the actual file provided from the creators, but there are also multiple copies up on github.
github repository
The problem is I'm continually receiving this error:
TypeError Traceback (most recent call last)
c:\users\matt\desktop\pythonprojects\pml\ch04\blei_lda.py in <module>()
for ti in range(model.num_topics):
words = model.show_topic(ti, 64)
------>tf = sum(f for f, w in words)
with open('topics.txt', 'w') as output:
output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for f, w in words))
output.write("\n\n\n")
TypeError: unsupported operand type(s) for +: 'int' and 'unicode'
I've tried to create a work around, but wasn't able to find anything that worked completely.
I've also searched all over the web and stack overflow for a solution, but it seems like I'm the only person who is having trouble running this file.
# This code is supporting material for the book
# Building Machine Learning Systems with Python
# by Willi Richert and Luis Pedro Coelho
# published by PACKT Publishing
#
# It is made available under the MIT License
from __future__ import print_function
from wordcloud import create_cloud
try:
from gensim import corpora, models, matutils
except:
print("import gensim failed.")
print()
print("Please install it")
raise
import matplotlib.pyplot as plt
import numpy as np
from os import path
NUM_TOPICS = 100
# Check that data exists
if not path.exists('./data/ap/ap.dat'):
print('Error: Expected data to be present at data/ap/')
print('Please cd into ./data & run ./download_ap.sh')
# Load the data
corpus = corpora.BleiCorpus('./data/ap/ap.dat', './data/ap/vocab.txt')
# Build the topic model
model = models.ldamodel.LdaModel(
corpus, num_topics=NUM_TOPICS, id2word=corpus.id2word, alpha=None)
# Iterate over all the topics in the model
for ti in range(model.num_topics):
words = model.show_topic(ti, 64)
tf = sum(f for f, w in words)
with open('topics.txt', 'w') as output:
output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for f, w in words))
output.write("\n\n\n")
# We first identify the most discussed topic, i.e., the one with the
# highest total weight
topics = matutils.corpus2dense(model[corpus], num_terms=model.num_topics)
weight = topics.sum(1)
max_topic = weight.argmax()
# Get the top 64 words for this topic
# Without the argument, show_topic would return only 10 words
words = model.show_topic(max_topic, 64)
# This function will actually check for the presence of pytagcloud and is otherwise a no-op
create_cloud('cloud_blei_lda.png', words)
num_topics_used = [len(model[doc]) for doc in corpus]
fig,ax = plt.subplots()
ax.hist(num_topics_used, np.arange(42))
ax.set_ylabel('Nr of documents')
ax.set_xlabel('Nr of topics')
fig.tight_layout()
fig.savefig('Figure_04_01.png')
# Now, repeat the same exercise using alpha=1.0
# You can edit the constant below to play around with this parameter
ALPHA = 1.0
model1 = models.ldamodel.LdaModel(
corpus, num_topics=NUM_TOPICS, id2word=corpus.id2word, alpha=ALPHA)
num_topics_used1 = [len(model1[doc]) for doc in corpus]
fig,ax = plt.subplots()
ax.hist([num_topics_used, num_topics_used1], np.arange(42))
ax.set_ylabel('Nr of documents')
ax.set_xlabel('Nr of topics')
# The coordinates below were fit by trial and error to look good
ax.text(9, 223, r'default alpha')
ax.text(26, 156, 'alpha=1.0')
fig.tight_layout()
fig.savefig('Figure_04_02.png')
In this line: words = model.show_topic(ti, 64),words is a list of tuples(unicode,float64)
eg. [(u'school', 0.029515796999228502),(u'prom', 0.018586355008452897)]
So in this line tf = sum(f for f, w in words) f represents the unicode, while w represents the float value. And you are trying to sum the unicode values which gives unsupported operand type error.
Modify this line as tf = sum(f for w, f in words) , so it will now sum the float values.
Also modify this line output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for w, f in words)) for the same reasons.
So the code snippet will look like:
for ti in range(model.num_topics):
words = model.show_topic(ti, 64)
tf = sum(f for w, f in words)
with open('topics.txt', 'w') as output:
output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for w, f in words))
output.write("\n\n\n")
I am using the following python code on the Raspberry Pi to collect an audio signal and output the volume. I can't understand why my output is only integer.
#!/usr/bin/env python
import alsaaudio as aa
import audioop
# Set up audio
data_in = aa.PCM(aa.PCM_CAPTURE, aa.PCM_NONBLOCK, 'hw:1')
data_in.setchannels(2)
data_in.setrate(44100)
data_in.setformat(aa.PCM_FORMAT_S16_LE)
data_in.setperiodsize(256)
while True:
# Read data from device
l,data = data_in.read()
if l:
# catch frame error
try:
max_vol=audioop.max(data,2)
scaled_vol = max_vol/4680
if scaled_vol==0:
print "vol 0"
else:
print scaled_vol
except audioop.error, e:
if e.message !="not a whole number of frames":
raise e
Also, I don't understand the syntax in this line:
l,data = data_in.read()
It's likely that it's reading in a byte. This line l,data = data_in.read() reads in a tuple (composed of l and data). Run the type() builtin function on those variables and see what you've got to work with.
Otherwise, look into the documentation for PCM Terminology and Concepts located within the documentation for the pyalsaaudio package, located here.