Passing list or ndarray as feature_column in DNNClassifier - tensorflow-estimator

import tensorflow as tf
import pandas as pd
a = [[1,1],[2,2],[3,3]]
b = [11,22,33]
mydata = pd.DataFrame({'images':a,'labels':b})
feature_columns = [tf.feature_column.numeric_column('images',shape=[1,1])]
train_input_fn = tf.estimator.inputs.pandas_input_fn(x =mydata,
y=mydata['labels'],
batch_size=60,
num_epochs=1,
shuffle=True)
estimator = tf.estimator.DNNClassifier(hidden_units=[64,32,16],
feature_columns=feature_columns,
n_classes=2)
estimator.train(input_fn=train_input_fn,steps=100)
Error I am getting is
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InternalError'>, Unable to get element as bytes.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmptver1w_k/model.ckpt.
TypeError Traceback (most recent call last)
TypeError: expected bytes, list found
reading through multiple stackoverflow pages and github ... it has to do something with saving_listeners. But not able to figure it out.
Please help.

TF Estimator expects bytes as the input for x.
Try this, it should get you past this error:
a = [bytes([1,1]), bytes([2,2]), bytes([3,3])]

Related

Concatenation of 2 strings failing because of incompatible operands error returned by PyCharm

I'm trying to run the following code :
# -*- coding: utf8 -*-
import requests
from bs4 import BeautifulSoup
link = "https://www.emploi-public.ma/ar/index.asp?p="
number_of_jobs = 0
houceima = u"الحسيمة"
print type(houceima)
for i in range(1,3):
page_link = link+str(i)
print page_link
emp_pub = requests.get(page_link)
soup = BeautifulSoup(emp_pub.content,"lxml")
for link in soup.find_all("a"):
if houceima in link :
print link
But I'm getting following error :
Traceback (most recent call last):
File "scrape_houceima", line 9, in <module>
page_link = link+str(i)
TypeError: unsupported operand type(s) for +: 'Tag' and 'str'
I'm using PyCharm. I stated my IDE because the same concatenation page_link = link+str(i) executed well in IDLE.
What could be the problem here ?
You re-used link in your code:
link = "https://www.emploi-public.ma/ar/index.asp?p="
and
for link in soup.find_all("a"):
The latter use replaces the first link reference, so it is no longer a string object but a Tag object.
Don't mask variables like that, rename one or the other. Perhaps the first use could be named base_url?

Keras:Vgg16 -- Error in `decode_predictions'

I am trying to perform an image classification task using a pre-trained VGG16 model in Keras. The code I wrote, following the instructions in the Keras application page, is:
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input, decode_predictions
import numpy as np
model = VGG16(weights='imagenet', include_top=True)
img_path = './train/cat.1.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)
(inID, label) = decode_predictions(features)[0]
which is quite similar to the code shown in this question already asked in the forum. But in spite of having the include_top parameter as True, I am getting the following error:
Traceback (most recent call last):
File "vgg16-keras-classifier.py", line 14, in <module>
(inID, label) = decode_predictions(features)[0]
ValueError: too many values to unpack
Any help will be deeply appreciated! Thanks!
It's because (according to a function definition which might be found here) a function decode_predictions returns a triple (class_name, class_description, score). This why it claims that there are too many values to unpack.

Jupyter string tokenization for python

I'm trying to implement simple_tokenize using dictionary as the output from my previous code but i get an error message. Any assistance with the following code would be much appreciated. I'm using Python 2.7 Jupyter
import csv
reader = csv.reader(open('data.csv'))
dictionary = {}
for row in reader:
key = row[0]
dictionary[key] = row[1:]
print dictionary
The above works pretty well but issue is with the following:
import re
words = dictionary
split_regex = r'\W+'
def simple_tokenize(string):
for i in rows:
word = words.split
#pass
print word
I get this error:
NameError Traceback (most recent call last)
<ipython-input-2-0d0e05fb1556> in <module>()
1 import re
2
----> 3 words = dictionary
4 split_regex = r'\W+'
5
NameError: name 'dictionary' is not defined
Variables are not saved between Jupyter sessions, unless you explicitly do so yourself. Thus, if you ran the first code section, then quit your Jupyter session, started a new Jupyter session and ran the second code block, dictionary is not preserved from the first session and will thus be undefined, as indicated by the error.
If you run the above code blocks differently (e.g., not across Jupyter sessions), you should indicate this, but the tags and traceback suggest this is what you do.

Memory error in Word2vec while loading freebase-skipgram model

I am trying to use word2vec and using freebase skip gram model. But I'm unable to load the model due to memory error.
Here is the code snippet for the same:
model = gensim.models.Word2Vec()
model = models.Word2Vec.load_word2vec_format('freebase-vectors-skipgram1000.bin.gz', binary=True)
I'm getting following error:
MemoryError Traceback (most recent call last)
<ipython-input-40-a1cfacf48c94> in <module>()
1 model = gensim.models.Word2Vec()
----> 2 model = models.Word2Vec.load_word2vec_format('freebase-vectors-skipgram1000.bin.gz', binary=True)
/../../word2vec.pyc in load_word2vec_format(cls, fname, fvocab, binary, norm_only)
583 vocab_size, layer1_size = map(int, header.split()) # throws for invalid file format
584 result = Word2Vec(size=layer1_size)
--> 585 result.syn0 = zeros((vocab_size, layer1_size), dtype=REAL)
586 if binary:
587 binary_len = dtype(REAL).itemsize * layer1_size
MemoryError:
But the same thing is working fine with google news using following code:
model = gensim.models.Word2Vec()
model = models.Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True)
I am unable to understand why. Is it that freebase requires much more memory than google news? I feel that shouldn't be the case. Am i missing something here?
I figured this out and this was due to memory requirement of freebase. When ran on 8gb machine with other ipython notebook running this was giving me the error. closing all other process and other notebooks allowed me to load it finally!

Error with GDAL

I have tried and run this script from Rutger Kassies.
import gdal
import matplotlib.pyplot as plt
ds = gdal.Open('HDF4_SDS:sample:"A2002037045000.L2_LAC.SAMPLE.hdf":01')
data = ds.ReadAsArray()
ds = None
fig, ax = plt.subplots(figsize=(6,6))
ax.imshow(data[0,:,:], cmap=plt.cm.Greys, vmin=1000, vmax=6000)
But then an error always occured:
Traceback (most recent call last):
File "D:\path\to\python\stackoverflow.py", line 5, in <module>
data = ds.ReadAsArray()
AttributeError: 'NoneType' object has no attribute 'ReadAsArray'
What's wrong with the script? Am I missing something? In installing GDAL I have followed this instruction http://pythongisandstuff.wordpress.com/2011/07/07/installing-gdal-and-ogr-for-python-on-windows/
Am using windows 7/32 bit/Python 2.7.
Thanks!
gdal.Open() is failing and returning 'None'. This produces the sometimes counterintuitive message "NoneType' object has no attribute ...". Quoting from Python: Attribute Error - 'NoneType' object has no attribute 'something', "NoneType means that instead of an instance of whatever Class or Object you think you're working with, you've actually got None. That usually means that an assignment or function call up above failed or returned an unexpected result."
Apparently GDAL is correctly installed. It could be that the file is not readable or that there is an issue with the HDF driver. Are you getting any error message like:
`HDF4_SDS:sample:"A2002037045000.L2_LAC.SAMPLE.hdf":01' does not
exist in the file system, and is not recognised as a supported dataset
name.
To get additional information you can try something like this instead of the gdal.Open() line in your script:
gdal.UseExceptions()
ds=None
try:
ds = gdal.Open('HDF4_SDS:sample:"A2002037045000.L2_LAC.SAMPLE.hdf":01')
except RuntimeError, err:
print "Exception: ", err
exit(1)
Also, there's an extra '}' at the end of the script.
By default, osgeo.gdal returns None on error, and does not normally raise informative exceptions. You can change this with gdal.UseExceptions().
Try something like this:
from osgeo import gdal
gdal.UseExceptions()
source_path = r'HDF4_SDS:sample:"D:\path\to\file\A2002037045000.L2_LAC.SAMPLE.hdf":01'
try:
ds = gdal.Open(source_path)
except RuntimeError as ex:
raise IOError(ex)
The last bit just re-raises the exception as an IOError rather than a RuntimeException.
The solution is to modify source_path to a working path to your data source, e.g., I see
IOError: `HDF4_SDS:sample:"A2002037045000.L2_LAC.SAMPLE.hdf":01' does not exist in the file system, and is not recognised as a supported dataset name.