Memory error in Word2vec while loading freebase-skipgram model

Memory error in Word2vec while loading freebase-skipgram model - python-2.7

I am trying to use word2vec and using freebase skip gram model. But I'm unable to load the model due to memory error.
Here is the code snippet for the same:
model = gensim.models.Word2Vec()
model = models.Word2Vec.load_word2vec_format('freebase-vectors-skipgram1000.bin.gz', binary=True)
I'm getting following error:
MemoryError Traceback (most recent call last)
<ipython-input-40-a1cfacf48c94> in <module>()
1 model = gensim.models.Word2Vec()
----> 2 model = models.Word2Vec.load_word2vec_format('freebase-vectors-skipgram1000.bin.gz', binary=True)
/../../word2vec.pyc in load_word2vec_format(cls, fname, fvocab, binary, norm_only)
583 vocab_size, layer1_size = map(int, header.split()) # throws for invalid file format
584 result = Word2Vec(size=layer1_size)
--> 585 result.syn0 = zeros((vocab_size, layer1_size), dtype=REAL)
586 if binary:
587 binary_len = dtype(REAL).itemsize * layer1_size
MemoryError:
But the same thing is working fine with google news using following code:
model = gensim.models.Word2Vec()
model = models.Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True)
I am unable to understand why. Is it that freebase requires much more memory than google news? I feel that shouldn't be the case. Am i missing something here?

I figured this out and this was due to memory requirement of freebase. When ran on 8gb machine with other ipython notebook running this was giving me the error. closing all other process and other notebooks allowed me to load it finally!

Related

module 'google.cloud.language_v1' has no attribute 'EncodingType'

I am learning google natural language API by following a sample python code, and running the same in google colab, however, I am getting below error..
# Create the document
document = {
"type_": "PLAIN_TEXT",
"language": "en",
"content": "Hello World. I love you! I hate you!"
}
results = client.analyze_sentiment(
document = document,
encoding_type = language_v1.EncodingType.UTF8)
And I am getting below error
AttributeError Traceback (most recent call last)
<ipython-input-26-17157fa1dfc4> in <module>()
8 results = client.analyze_sentiment(
9 document = document,
---> 10 encoding_type = language_v1.EncodingType.UTF8)
AttributeError: module 'google.cloud.language_v1' has no attribute 'EncodingType'
This seems a straight forward error but after trying different options, like removing encoding_type, it still does not work

I have found root cause. There are 2 issues in the sample code:
"type_" is not valid. It should be changed to "type"
encoding_type should just be set to "UTF8" directly. Such as encoding_type = "UTF8"

Passing list or ndarray as feature_column in DNNClassifier

import tensorflow as tf
import pandas as pd
a = [[1,1],[2,2],[3,3]]
b = [11,22,33]
mydata = pd.DataFrame({'images':a,'labels':b})
feature_columns = [tf.feature_column.numeric_column('images',shape=[1,1])]
train_input_fn = tf.estimator.inputs.pandas_input_fn(x =mydata,
y=mydata['labels'],
batch_size=60,
num_epochs=1,
shuffle=True)
estimator = tf.estimator.DNNClassifier(hidden_units=[64,32,16],
feature_columns=feature_columns,
n_classes=2)
estimator.train(input_fn=train_input_fn,steps=100)
Error I am getting is
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InternalError'>, Unable to get element as bytes.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmptver1w_k/model.ckpt.
TypeError Traceback (most recent call last)
TypeError: expected bytes, list found
reading through multiple stackoverflow pages and github ... it has to do something with saving_listeners. But not able to figure it out.
Please help.

TF Estimator expects bytes as the input for x.
Try this, it should get you past this error:
a = [bytes([1,1]), bytes([2,2]), bytes([3,3])]

Keras:Vgg16 -- Error in `decode_predictions'

I am trying to perform an image classification task using a pre-trained VGG16 model in Keras. The code I wrote, following the instructions in the Keras application page, is:
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input, decode_predictions
import numpy as np
model = VGG16(weights='imagenet', include_top=True)
img_path = './train/cat.1.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)
(inID, label) = decode_predictions(features)[0]
which is quite similar to the code shown in this question already asked in the forum. But in spite of having the include_top parameter as True, I am getting the following error:
Traceback (most recent call last):
File "vgg16-keras-classifier.py", line 14, in <module>
(inID, label) = decode_predictions(features)[0]
ValueError: too many values to unpack
Any help will be deeply appreciated! Thanks!

It's because (according to a function definition which might be found here) a function decode_predictions returns a triple (class_name, class_description, score). This why it claims that there are too many values to unpack.

How to solve AttributeError in python active_directory?

Running the below script works for 60% of the entries from the MasterGroupList however suddenly fails with the below error. although my questions seem to be poor ou guys have been able to help me before. Any idea how I can avoid getting this error? or what is trhoughing off the script? The masterGroupList looks like:
Groups Pulled from AD
SET00 POWERUSER
SET00 USERS
SEF00 CREATORS
SEF00 USERS
...another 300 entries...
Error:
Traceback (most recent call last):
File "C:\Users\ks185278\OneDrive - NCR Corporation\Active Directory Access Scr
ipt\test.py", line 44, in <module>
print group.member
File "C:\Python27\lib\site-packages\active_directory.py", line 805, in __getat
tr__
raise AttributeError
AttributeError
Code:
from active_directory import *
import os
file = open("C:\Users\NAME\Active Directory Access Script\MasterGroupList.txt", "r")
fileAsList = file.readlines()
indexOfTitle = fileAsList.index("Groups Pulled from AD\n")
i = indexOfTitle + 1
while i <= len(fileAsList):
fileLocation = 'C:\\AD Access\\%s\\%s.txt' % (fileAsList[i][:5], fileAsList[i][:fileAsList[i].find("\n")])
#Creates the dir if it does not exist already
if not os.path.isdir(os.path.dirname(fileLocation)):
os.makedirs(os.path.dirname(fileLocation))
fileGroup = open(fileLocation, "w+")
#writes group members to the open file
group = find_group(fileAsList[i][:fileAsList[i].find("\n")])
print group.member
for group_member in group.member: #this is line 44
fileGroup.write(group_member.cn + "\n")
fileGroup.close()
i+=1

Disclaimer: I don't know python, but I know Active Directory fairly well.
If it's failing on this:
for group_member in group.member:
It could possibly mean that the group has no members.
Depending on how phython handles this, it could also mean that the group has only one member and group.member is a plain string rather than an array.
What does print group.member show?
The source code of active_directory.py is here: https://github.com/tjguk/active_directory/blob/master/active_directory.py
These are the relevant lines:
if name not in self._delegate_map:
try:
attr = getattr(self.com_object, name)
except AttributeError:
try:
attr = self.com_object.Get(name)
except:
raise AttributeError
So it looks like it just can't find the attribute you're looking up, which in this case looks like the 'member' attribute.

Jupyter string tokenization for python

I'm trying to implement simple_tokenize using dictionary as the output from my previous code but i get an error message. Any assistance with the following code would be much appreciated. I'm using Python 2.7 Jupyter
import csv
reader = csv.reader(open('data.csv'))
dictionary = {}
for row in reader:
key = row[0]
dictionary[key] = row[1:]
print dictionary
The above works pretty well but issue is with the following:
import re
words = dictionary
split_regex = r'\W+'
def simple_tokenize(string):
for i in rows:
word = words.split
#pass
print word
I get this error:
NameError Traceback (most recent call last)
<ipython-input-2-0d0e05fb1556> in <module>()
1 import re
2
----> 3 words = dictionary
4 split_regex = r'\W+'
5
NameError: name 'dictionary' is not defined

Variables are not saved between Jupyter sessions, unless you explicitly do so yourself. Thus, if you ran the first code section, then quit your Jupyter session, started a new Jupyter session and ran the second code block, dictionary is not preserved from the first session and will thus be undefined, as indicated by the error.
If you run the above code blocks differently (e.g., not across Jupyter sessions), you should indicate this, but the tags and traceback suggest this is what you do.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Memory error in Word2vec while loading freebase-skipgram model - python-2.7

I figured this out and this was due to memory requirement of freebase. When ran on 8gb machine with other ipython notebook running this was giving me the error. closing all other process and other notebooks allowed me to load it finally!

Related

module 'google.cloud.language_v1' has no attribute 'EncodingType'

Passing list or ndarray as feature_column in DNNClassifier

Keras:Vgg16 -- Error in `decode_predictions'

How to solve AttributeError in python active_directory?

Jupyter string tokenization for python

Categories

Resources