I'm running into the following problem.
I import a file that looks like [[58, 59, 60]].
Printing it gives ['[[58, 59, 60]]'].
Now I only want to add 58 59 60 to a new list. The problem is:
output gives ['[[58, 59, 60]]']
output[0] gives '[[58, 59, 60]]'
output[0][2] gives '5'
output[0][3] gives '8'.
Is there a way of importing the file in a way that it only loads full integers?
with open('file', 'r') as fobj:
content = int(fobj.read())
You could use json to do the converting from string to actual list when reading from the file.
For example:
import json
def get_output(filename): # filename is name of text file
with open(filename) as fn:
output = json.loads(fn.read())
return output
If the file 't.txt' contains this: [[58, 59, 60]], then get_output('t.txt') returns the list [[58, 59, 60]]:
output = get_output('t.txt')
type(output) # ===> list
output[0] # ===> [58, 59, 60]
output[0][2] # ===> 60
Related
I am having the hardest time figuring out why i am getting this error. I have searched a lot but unable to fine any solution
import numpy as np
import warnings
from collections import Counter
import pandas as pd
def k_nearest_neighbors(data, predict, k=3):
if len(data) >= k:
warnings.warn('K is set to a value less than total voting groups!')
distances = []
for group in data:
for features in data[group]:
euclidean_distance = np.linalg.norm(np.array(features)-
np.array(predict))
distances.append([euclidean_distance,group])
votes = [i[1] for i in sorted(distances)[:k]]
vote_result = Counter(votes).most_common(1)[0][0]
return vote_result
df = pd.read_csv("data.txt")
df.replace('?',-99999, inplace=True)
df.drop(['id'], 1, inplace=True)
full_data = df.astype(float).values.tolist()
print(full_data)
After running. it gives error
Traceback (most recent call last):
File "E:\Jazab\Machine Learning\Lec18(Testing K Neatest Nerighbors
Classifier)\Lec18(Testing K Neatest Nerighbors
Classifier)\Lec18_Testing_K_Neatest_Nerighbors_Classifier_.py", line 25, in
<module>
full_data = df.astype(float).values.tolist()
File "C:\Python27\lib\site-packages\pandas\util\_decorators.py", line 91, in
wrapper
return func(*args, **kwargs)
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 3299, in
astype
**kwargs)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3224, in
astype
return self.apply('astype', dtype=dtype, **kwargs)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3091, in
apply
applied = getattr(b, f)(**kwargs)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 471, in
astype
**kwargs)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 521, in
_astype
values = astype_nansafe(values.ravel(), dtype, copy=True)
File "C:\Python27\lib\site-packages\pandas\core\dtypes\cast.py", line 636,
in astype_nansafe
return arr.astype(dtype)
ValueError: invalid literal for float(): 3) <-----Reappears in Group 8 as:
Press any key to continue . . .
if i remove astype(float) program run fine
What should i need to do ?
There are bad data (3)), so need to_numeric with apply because need processes all columns.
Non numeric are converted to NaNs, which are replaced by fillna to some scalar, e.g. 0:
full_data = df.apply(pd.to_numeric, errors='coerce').fillna(0).values.tolist()
Sample:
df = pd.DataFrame({'A':[1,2,7], 'B':['3)',4,5]})
print (df)
A B
0 1 3)
1 2 4
2 7 5
full_data = df.apply(pd.to_numeric, errors='coerce').fillna(0).values.tolist()
print (full_data)
[[1.0, 0.0], [2.0, 4.0], [7.0, 5.0]]
It looks like you have 3) as an entry in your CSV file, and Pandas is complaining because it can't cast it to a float because of the ).
I'm trying to make a Seq2Seq Regression example for time-series analysis and I've used the Seq2Seq library as presented at the Dev Summit, which is currently the code on the Tensorflow GitHub branch r1.0.
I have difficulties understanding how the decoder function works for Seq2Seq, specifically for the "cell_output".
I understand that the num_decoder_symbols is the number of classes/words to decode at each time step. I have it working at a point where I can do training. However, I don't get why I can't just substitute the number of features (num_features) instead of num_decoder_symbols. Basically, I want to be able to run the decoder without teacher forcing, in other words pass the output of the previous time step as the input to the next time step.
with ops.name_scope(name, "simple_decoder_fn_inference",
[time, cell_state, cell_input, cell_output,
context_state]):
if cell_input is not None:
raise ValueError("Expected cell_input to be None, but saw: %s" %
cell_input)
if cell_output is None:
# invariant that this is time == 0
next_input_id = array_ops.ones([batch_size,], dtype=dtype) * (
start_of_sequence_id)
done = array_ops.zeros([batch_size,], dtype=dtypes.bool)
cell_state = encoder_state
cell_output = array_ops.zeros([num_decoder_symbols],
dtype=dtypes.float32)
Here is a link to the original code: https://github.com/tensorflow/tensorflow/blob/r1.0/tensorflow/contrib/seq2seq/python/ops/decoder_fn.py
Why don't I need to pass batch_size for the cell output?
cell_output = array_ops.zeros([batch_size, num_decoder_symbols],
dtype=dtypes.float32)
When trying to use this code to create my own regressive Seq2Seq example, where instead of having an output of probabilities/classes, I have a real valued vector of dimension num_features, instead of an array of probability of classes. As I understood, I thought I could replace num_decoder_symbols with num_features, like below:
def decoder_fn(time, cell_state, cell_input, cell_output, context_state):
"""
Again same as in simple_decoder_fn_inference but for regression on sequences with a fixed length
"""
with ops.name_scope(name, "simple_decoder_fn_inference", [time, cell_state, cell_input, cell_output, context_state]):
if cell_input is not None:
raise ValueError("Expected cell_input to be None, but saw: %s" % cell_input)
if cell_output is None:
# invariant that this is time == 0
next_input = array_ops.ones([batch_size, num_features], dtype=dtype)
done = array_ops.zeros([batch_size], dtype=dtypes.bool)
cell_state = encoder_state
cell_output = array_ops.zeros([num_features], dtype=dtypes.float32)
else:
cell_output = output_fn(cell_output)
done = math_ops.equal(0,1) # hardcoded hack just to properly define done
next_input = cell_output
# if time > maxlen, return all true vector
done = control_flow_ops.cond(math_ops.greater(time, maximum_length),
lambda: array_ops.ones([batch_size,], dtype=dtypes.bool),
lambda: done)
return (done, cell_state, next_input, cell_output, context_state)
return decoder_fn
But, I get the following error:
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/seq2seq.py", line 212, in dynamic_rnn_decoder
swap_memory=swap_memory, scope=scope)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 1036, in raw_rnn
swap_memory=swap_memory)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2605, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2438, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2388, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 980, in body
(next_output, cell_state) = cell(current_input, state)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 327, in __call__
input_size = inputs.get_shape().with_rank(2)[1]
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_shape.py", line 635, in with_rank
raise ValueError("Shape %s must have rank %d" % (self, rank))
ValueError: Shape (100,) must have rank 2
As a result, I passed in the batch_size like this in order to get a Shape of rank 2:
cell_output = array_ops.zeros([batch_size, num_features],
dtype=dtypes.float32)
But I get the following error, where Shape is of rank 3 and wants a rank 2 instead:
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/seq2seq.py", line 212, in dynamic_rnn_decoder
swap_memory=swap_memory, scope=scope)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 1036, in raw_rnn
swap_memory=swap_memory)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2605, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2438, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2388, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 980, in body
(next_output, cell_state) = cell(current_input, state)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 327, in __call__
input_size = inputs.get_shape().with_rank(2)[1]
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_shape.py", line 635, in with_rank
raise ValueError("Shape %s must have rank %d" % (self, rank))
ValueError: Shape (10, 10, 100) must have rank 2
I have a function LogReg, which is as follows: (using justmarkham's code as inspiration)
def LogReg(self):
formulA = "class ~"
print self.frame #dataframe used
print self.columnNames[:-1]
for a in self.columnNames[:-1]:
formulA += " {0} +".format(a)
formula = formulA[:-2] #there is always a \n behind, we don't want that
print "formula = " + formula
Y,X = dmatrices(formula, self.frame, return_type="dataframe")
Y = np.ravel(Y) #flatten Y to a 1D list
model = LogisticRegression() #from sklearn.linear_model
model = model.fit(X, Y)
print model.score(X, Y)
with the following outcome:
a0 a1 a2 a3 class
picture1 1 2 3 67 1
picture2 6 7 45 61 3
picture3 8 7 6 5 2
picture4 1 2 4 3 0
['a0', 'a1', 'a2', 'a3']
formula = class ~ a0 + a1 + a2 + a3
Traceback (most recent call last):
File "classification.py", line 80, in <module>
c.LogReg()
File "classification.py", line 61, in LogReg
Y,X = dmatrices(formula, self.frame, return_type="dataframe")
File "/<path>/python2.7/site-packages/patsy/highlevel.py", line 297, in dmatrices
NA_action, return_type)
File "/<path>/python2.7/site-packages/patsy/highlevel.py", line 152, in _do_highlevel_design
NA_action)
File "/<path>/python2.7/site-packages/patsy/highlevel.py", line 57, in _try_incr_builders
NA_action)
File "/<path>/python2.7/site-packages/patsy/build.py", line 660, in design_matrix_builders
NA_action)
File "/<path>/python2.7/site-packages/patsy/build.py", line 424, in _examine_factor_types
value = factor.eval(factor_states[factor], data)
File "/<path>/python2.7/site-packages/patsy/eval.py", line 485, in eval
return self._eval(memorize_state["eval_code"], memorize_state, data)
File "/<path>/python2.7/site-packages/patsy/eval.py", line 468, in _eval
code, inner_namespace=inner_namespace)
File "/<path>/python2.7/site-packages/patsy/compat.py", line 117, in call_and_wrap_exc
return f(*args, **kwargs)
File "/<path>/python2.7/site-packages/patsy/eval.py", line 125, in eval
code = compile(expr, source_name, "eval", self.flags, False)
File "<string>", line 1
class
^
SyntaxError: unexpected EOF while parsing
I do not see what goes wrong here, as the string does by my knowledge not contain the EOF character, nor does the Python code seem erroneous. Therefore, the question: Where does it go wrong (and preferably: , and how to fix it)?
P.S.: The software used are all the most recent stable packages as available on 04/09/2015.
Well, that was quick. By asking the question, I suddenly had color marking in the code, notifying me that 'class' is a protected name, and should not be used as a variable. Nano doesn't give those colors, leaving me blind.
Lesson learnt: Kids, don't do class as variable.
I am currently working on a backup program, I have run into errors while trying to gernate a unique file name with a given destination. I call this function in my code as: getFileUnique(f,pathtofile(backup+"/"+"../trash/")). f is the file path, the rest of the variables are pretty straight forward.
def getFileUnique(path,destination):
path = path.replace("\\","/")
p = path.split("/")[-1]
if not os.path.exists(join(destination,p)):
return destination+p
j = p.split(".")
counter = 0
print(j)
while os.path.exists(join(destination,j[:-1]+str(counter)+"."+j[-1])):
print(counter)
print("asdfsdf")
counter += 1
return destination+j[:-1]+str(counter)+"."+j[-1]
Error:
Traceback (most recent call last):
File "C:\Users\Owner\Google Drive\Programs\Dev Enviroment\python\backup\backup.py", line 76, in <module>
main("files","backup")
File "C:\Users\Owner\Google Drive\Programs\Dev Enviroment\python\backup\backup.py", line 73, in main
updateBackup(oldf,newf,reg,backup)
File "C:\Users\Owner\Google Drive\Programs\Dev Enviroment\python\backup\backup.py", line 65, in updateBackup
k = getFileUnique(f,pathtofile(backup+"/"+"../trash/"))
File "C:\Users\Owner\Google Drive\Programs\Dev Enviroment\python\backup\backup.py", line 41, in getFileUnique
while os.path.exists(join(destination,j[:-1]+str(counter)+"."+j[-1])):
TypeError: can only concatenate list (not "str") to list
return destination + '.'.join(j[:-1]) + str(counter) + "." + j[-1]
I am just about done writing my first mergesort program and am running into trouble when compiling. I have done a bunch of research on this particular error and it seems I'm being non-specific somewhere in my code. I still cannot find said error and would love your help. I have attached the file contents, code, and traceback. Thanks again.
File:
999 Message C1
1033 Message C2
1054 Message C3
1056 Message C4
1086 Message C5
Code:
DEBUG = True
out = []
logs = open("C:\Users\----\Desktop\logs.txt", mode ="r")
lines = logs.readline()
def debug(s):
if DEBUG:
print "DEBUG: ", s
def get_t (line):
s = line
s = s.lstrip()
debug(s)
i = s.find(" ")
debug(s)
s = s[:i]
return int(s)
def get_lowest_i(logs):
lowest_i = -1
for i in range(len(logs)):
log = logs[i]
debug("log=" + repr(log))
if log:
t = get_t(log[0])
debug("t=" + repr(t))
if lowest_i == -1 or t < lowest_t:
lowest_i = i
lowest_t = t
return lowest_i
def get_line_lowest_t(logs):
while True:
i = get_lowest_i(logs)
if i == -1:
break
line = logs[i].pop(0)
def mergesort(logs):
while True:
line = get_line_lowest_t(logs)
if line == None:
break
out.append(line)
return out
print mergesort(logs)
f.close()
Traceback:
Traceback (most recent call last):
File "<module1>", line 50, in <module>
File "<module1>", line 44, in mergesort
File "<module1>", line 37, in get_line_lowest_t
File "<module1>", line 24, in get_lowest_i
TypeError: object of type 'file' has no len()
Thanks in advance.
TypeError: object of type 'file' has no len() the error says it all you are trying to read the length of a file object ... being that logs = open("C:\Users\----\Desktop\logs.txt", mode ="r") is a file maybe you mean to read the lines of the file and sort that ... lines = longs.readlines() print mergesort(lines)
file has no method len(). Put it into strings or arrays and then use len()
You are mergesorting the file, not the array called lines.