Training Doc2Vec on 20newsgroups dataset. Getting Exception AttributeError: 'str' object has no attribute 'words'

Training Doc2Vec on 20newsgroups dataset. Getting Exception AttributeError: 'str' object has no attribute 'words' - python-2.7

There were a similar question here Gensim Doc2Vec Exception AttributeError: 'str' object has no attribute 'words', but it didn't get any helpful answers.
I'm trying to train Doc2Vec on 20newsgroups corpora.
Here's how I build the vocab:
from sklearn.datasets import fetch_20newsgroups
def get_data(subset):
newsgroups_data = fetch_20newsgroups(subset=subset, remove=('headers', 'footers', 'quotes'))
docs = []
for news_no, news in enumerate(newsgroups_data.data):
tokens = gensim.utils.to_unicode(news).split()
if len(tokens) == 0:
continue
sentiment = newsgroups_data.target[news_no]
tags = ['SENT_'+ str(news_no), str(sentiment)]
docs.append(TaggedDocument(tokens, tags))
return docs
train_docs = get_data('train')
test_docs = get_data('test')
alldocs = train_docs + test_docs
model = Doc2Vec(dm=dm, size=size, window=window, alpha = alpha, negative=negative, sample=sample, min_count = min_count, workers=cores, iter=passes)
model.build_vocab(alldocs)
Then I train the model and save the result:
model.train(train_docs, total_examples = len(train_docs), epochs = model.iter)
model.train_words = False
model.train_labels = True
model.train(test_docs, total_examples = len(test_docs), epochs = model.iter)
model.save(output)
The problem appears when I try to load the model:
screen
I tried:
using LabeledSentence instead of TaggedDocument
yielding TaggedDocument instead of appending them to the list
setting min_count to 1 so no word would be ignored (just in case)
Also the problem occurs on python2 as well as python3.
Please, help me solve this.

You've hidden the most important information – the exact code that triggers the error, and the error text itself – in the offsite (imgur) 'screen' link. (That would be the ideal text to cut & paste into the question, rather than other steps that seem to run OK, without triggering the error.)
Looking at that screenshot, there's the line:
model = Doc2Vec("20ng_infer")
...which triggers the error.
Note that none of the arguments as documented for the Doc2Vec() initialization method are a plain string, like the "20ng_infer" argument in the above line – so that's unlikely to do anything useful.
If trying to load a model that was previously saved with model.save(), you should use Doc2Vec.load() – which will take a string describing a local file path from which to load the model. So try:
model = Doc2Vec.load("20ng_infer")
(Note also that larger models might be saved to multiple files, all starting with the string you supplied to save(), and these files must be kept/moved together to again re-load() them in the future.)

Related

Cannot iterate over AbstractOrderedScalarSet before it has been constructed (initialized)

I have just started with pyomo and Python, and trying to create a simple model but have a problem with adding a constraint.
I followed the following example from GitHub
https://github.com/brentertainer/pyomo-tutorials/blob/master/introduction/02-lp-pyomo.ipynb
import pandas as pd
import pyomo.environ as pe
import pyomo.opt as po
#DATA
T=3;
CH=2;
time = ['t{0}'.format(t+1) for t in range(T)]
CHP=['CHP{0}'.format(s+1) for s in range(CH)]
#Technical characteristic
heat_maxprod = {'CHP1': 250,'CHP2': 250} #Only for CHPS
#MODEL
seq=pe.ConcreteModel
### SETS
seq.CHP = pe.Set(initialize = CHP)
seq.T = pe.Set(initialize = time)
### PARAMETERS
seq.heat_maxprod = pe.Param(seq.CHP, initialize = heat_maxprod) #Max heat production
### VARIABLES
seq.q_DA=pe.Var(seq.CHP, seq.T, domain=pe.Reals)
### CONSTRAINTS
##Maximum and Minimum Heat Production
seq.Heat_DA1 = pe.ConstraintList()
for t in seq.T:
for s in seq.CHP:
seq.Heat_DA1.add( 0 <= seq.q_DA[s,t])
seq.Heat_DA2 = pe.ConstraintList()
for t in seq.T:
for s in seq.CHP:
seq.Heat_DA2.add( seq.q_DA[s,t] <= seq.heat_maxprod[s])
### OBJECTIVE
seq.obj=Objective(expr=sum( seq.C_fuel[s]*(seq.rho_heat[s]*seq.q_DA[s,t]) for t in seq.T for s in seq.CHP))
When I run the program I am getting the following error:
RuntimeError: Cannot iterate over AbstractOrderedScalarSet 'AbstractOrderedScalarSet' before it has been constructed (initialized): 'iter' is an attribute on an Abstract component and cannot be accessed until the component has been fully constructed (converted to a Concrete component) using AbstractModel.create_instance() or AbstractOrderedScalarSet.construct().
Can someone, please, help with an issue? Thanks!
P.S. I know that the resulting answer for the problem is zero, I just want to make it work in terms of correct syntaxis.

In this line of code:
seq=pe.ConcreteModel
You are missing parenthesis. So, I think you are just creating an alias for the function instead of calling it.
Try:
seq=pe.ConcreteModel()

Error running flopy.modflow.HeadObservation: ValueError: Can't cast from structure to non-structure, except if the structure only has a single field

I am using Flopy to set up a MODFLOW model in Python 2.7. I am trying to add head observations via the HOB package. The following example code is taken directly from the function documentation at https://modflowpy.github.io/flopydoc/mfhob.html:
import flopy
model = flopy.modflow.Modflow()
dis = flopy.modflow.ModflowDis(model, nlay=1, nrow=11, ncol=11,
nper=2, perlen=[1,1])
obs = flopy.modflow.mfhob.HeadObservation(model, layer=0, row=5,
column=5,
time_series_data=[[1.,54.4],
[2., 55.2]])
Using this example code for the function, I am getting the following error:
ValueError: Can't cast from structure to non-structure, except if the structure only has a single field.
I get the same error when I try to create a head observation for my model, which is steady-state and has some different input values. Unfortunately, I haven't been able to find a working example to compare with. Any ideas?
Edit: jdhughes's code works like a charm; BUT I had also neglected to update Flopy to the most recent version - I tried updating numpy first, but didn't get rid of the ValueError until I updated Flopy from 3.2.8 to 3.2.9. Works now, thank you!!!

You need to create one or more instances of a HeadObservation type and pass that to ModflowHob. An example with two observation locations is shown below.
# create a new hob object
obs_data = []
# observation location 1
tsd = [[1., 1.], [87163., 2.], [348649., 3.],
[871621., 4.], [24439070., 5.], [24439072., 6.]]
names = ['o1.1', 'o1.2', 'o1.3', 'o1.4', 'o1.5', 'o1.6']
obs_data.append(flopy.modflow.HeadObservation(mf, layer=0, row=2, column=0,
time_series_data=tsd,
names=names, obsname='o1'))
# observation location 2
tsd = [[0., 126.938], [87163., 126.904], [871621., 126.382],
[871718.5943, 115.357], [871893.7713, 112.782]]
names = ['o2.1', 'o2.2', 'o2.3', 'o2.4', 'o2.5']
obs_data.append(flopy.modflow.HeadObservation(mf, layer=0, row=3, column=3,
time_series_data=tsd,
names=names, obsname='o2'))
hob = flopy.modflow.ModflowHob(mf, iuhobsv=51, obs_data=obs_data)
Will submit an issue to update the documentation and docstrings.

Django - Search matches with all objects - even if they don't actually match

This is the model that has to be searched:
class BlockQuote(models.Model):
debate = models.ForeignKey(Debate, related_name='quotes')
speaker = models.ForeignKey(Speaker, related_name='quotes')
text = models.TextField()
I have around a thousand instances on the database on my laptop (with around 50000 on the production server)
I am creating a 'manage.py' function that will search through the database and returns all 'BlockQuote' objects whose textfield contains the keyword.
I am doing this with the Django's (1.11) Postgres search options in order to use the 'rank' attribute, which sounds like something that would come in handy. I used the official Django fulltext-search documentation for the code below
Yet when I run this code, it matches with all objects, regardless if BlockQuote.text actually contains the queryfield.
def handle(self, *args, **options):
vector = SearchVector('text')
query = options['query'][0]
Search_Instance = Search_Instance.objects.create(query=query)
set = BlockQuote.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
for result in set:
match = QueryMatch.objects.create(quote=result, query=Search_Instance)
match.save()
Does anyone have an idea of what I am doing wrong?

I don't see you actually filtering ever.
BlockQuote.objects.annotate(...).filter(rank__gte=0.5)

Saving image file field manually in django

I am trying to save a preview image generated by "preview_generator" Python app.
But I am getting IntegrityError duplicate key value violates unique constraint "users_material_pkey". I've tried many things but nothing seems to be working.
If I call super at the end of save I don't get material_file url or path.

remove this line material = self in your code and use below way
obj = super(Material, self).save(force_update=False, using=None,update_fields=None)
material = obj

Python - Lotus Notes (Sending Email)

I am trying to use Python 2.7.3.2 to send an email through Lotus Notes 8.5.
There are plenty of examples on how to do this in other languages, and I've done it myself in VBA, but having difficulties with Python.
self.db = self.session.getDatabase(server, dbfile)
# ...
mailDoc = self.db.CreateDocument
mailDoc.Form = "Memo"
mailDoc.sendto = recipientList
mailDoc.subject = subject
mailDoc.Body = bodytext
Error returned: AttributeError: Property 'CreateDocument.Form' can not be set.
I have attempted to skip setting the form, but it also fails on setting any of these attributes.
Would anyone have code on this, or suggestions on what to try to resolve it.

I know nothing about Python, but my guess is that the shorthand notation document.item = "foo" for setting an item value is not supported. Most likely, you need to do this:
mailDoc.AppendItemValue("Form","Memo")
(You can also use ReplaceItemValue, which is equivalent for a newly created document, and also works for updating existing documents, so many people prefer to just remember the one method name.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Training Doc2Vec on 20newsgroups dataset. Getting Exception AttributeError: 'str' object has no attribute 'words' - python-2.7

Related

Cannot iterate over AbstractOrderedScalarSet before it has been constructed (initialized)

Error running flopy.modflow.HeadObservation: ValueError: Can't cast from structure to non-structure, except if the structure only has a single field

Django - Search matches with all objects - even if they don't actually match

Saving image file field manually in django

Python - Lotus Notes (Sending Email)

Categories

Resources