Haystack-Whoosh not indexing any documents

Haystack-Whoosh not indexing any documents - django

I followed the Haystack tutorial to set up for Whoosh
>>> pip install whoosh
settings.py
import os
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine',
'PATH': os.path.join(os.path.dirname(__file__), 'whoosh_index'),
},
}
and I am getting an empty list
>>> list(ix.searcher().documents())
[]
Following is my code for searcher_indexes.py
from haystack import indexes
from view_links.models import Projdb
class ProjdbIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
title = indexes.CharField(model_attr='title')
author = indexes.CharField(model_attr = 'owner')
# pub_date = indexes.DateTimeField(model_attr='date_start')
def get_model(self):
return Projdb
def index_queryset(self,using=None):
"""Used when the entire index for model is updated."""
return self.get_model().objects.all()#filter(pub_date__lte=datetime.datetime.now())
I was previously able to get results for elasticsearch but when I shifted to Whoosh I am getting no results.
Thank you for your time. If you require further information, please let me know.
EDIT:
I am getting results now and here are two things I learned.
I need to register the app whose model is being used for indexing.
If a Model's class is misspelled in search_indexes.py, running the python manage.py rebuild_index does not throw any error and you will get zero indexed objects

Did you run the command?
./manage.py rebuild_index
Do you have any Projdb records?
You have this in your code:
text = indexes.CharField(document=True, use_template=True)
Have you set-up the corresponding template (projdb_text.txt)?

Related

I want to upload a xml file into a PostgreSQL using Django

I am new to Django and my current task is to upload a xml file with 16 fields and more than 60000 rows to a database in PostgreSQL. I used Django to connect to the Database and was able to create a table in the database.
I also used XML Etree to parse the xml file. I am having trouble storing the data in the table that I created in the sql database.
This is the code that I used to parse:
import xml.etree.ElementTree as ET
def saveXML2db():
my_file = "C:/Users/Adithyas/myproject/scripts/supplier_lookup.xml"
tree = ET.parse(my_file)
root = tree.getroot()
cols = ["organization", "code", "name"]
rows = []
for i in root:
organization = i.find("organization").text
code = i.find("code").text
name = i.find("name").text
x = rows.append([organization, code, name])
data = """INSERT INTO records(organization,code,name) VALUES(%s,%s,%s)"""
x.save()
saveXML2db()
the code runs without any error, but I am unable to store the data into the table in the SQL database.

So I figured out the answer to my question and I wish to share this with you guys.
This is how I imported a xml file to PostgreSQL database using Django ORM:
First, I created a virtual environment to work with:
open command prompt in the folder you wish to run the project
py -m venv envy
envy\Scripts\activate
our virtual environment is ready to use
then,
pip install django
pip install psycopg2
django-admin startproject projectq
cd projectq
py manage.py startapp myapp
now both our project and app is created and ready to use
code . #to open Visual code
now go to settings.py in 'projectq' and add 'myapp' to INSTALLED_APPS:
INSTALLED_APPS = [
'myapp',#add myapp to the installed apps
]
now to connect our project to PostgreSQL database we have to make some changes in the DATABASES in settings.py as well:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'projectq',
'USER': 'postgres',
'PASSWORD': '1234',
}
}
change dbsqlite to the name of the database that you are using, add name of your Database, username and password
now the connection is established.
we move on to the next step
go to models.py to create our table in PostgreSQL to store our xml data:
from django.db import models
# Create your models here.
class Record(models.Model):
po_organization = models.IntegerField()
code = models.CharField(max_length = 100)
name = models.CharField(max_length=100)
address_1 = models.CharField(max_length=100 , null = True)
address_2 = models.CharField(max_length=100, null = True)
If your data has null values it's best to add null = True, to avoid errors
py manage.py makemigrations
py manage.py migrate
now the table we created should appear on the PostgreSQL database
next step is to parse our xml file and to import it to the table we created.
For that we will use Django ORM queries
open terminal in our visual code in models.py
activate virtual environment again
to use ORM query:
py manage.py shell
now add these codes to the interactive console:
>>>from myapp.models import Record
>>>import xml.etree.ElementTree as ET
>>>def data2db():
...file_dir = 'supplier_lookup.xml'
...data = ET.parse(file_dir)
...root = data.findall('record')
...for i in root:
... organization = i.find('organization').text
... code = i.find('code').text
... name = i.find('name').text
... address_1 = i.find('address_1').text
... address_2 = i.find('address_2').text
... x = Record.objects.create(organization=organization, code=code,
... name=name, address_1=address_1, address_2=address_2)
... x.save()
...
>>>data2db()
That's It. The data should be loaded into the database now.
Hope this helps.

Have you checked any python/PostgreSQL examples? Your code should have something like this (untested):
import psycopg2
def storeXmlToPostgres(xmldata):
with psycopg2.connect(host="dbhost", database="dbname", user="username", password="password") as conn:
sql = "INSERT INTO records(organization,code,name) VALUES(%s,%s,%s)"
cur = conn.cursor()
for i in xmldata:
organization = i.find("organization").text
code = i.find("code").text
name = i.find("name").text
cur.execute(sql, [organization, code, name])

Haystack SearchIndex model_attr not following relation correctly?

I'm using Django Haystack v2.0.0 and Whoosh v2.4.0. According to Haystack's documentation search indexes can use Django's related field lookup in the model_attr parameter. However, running the following code using manage.py shell command:
from haystack.query import SearchQuerySet
for r in SearchQuerySet():
print r.recruitment_agency # Prints True for every job
print r.recruitment_agency == r.object.employer.recruitment_agency
# Prints False if r.object.employer.recruitment_agency is False
I have tried rebuilding the index several times, the index's directory is writeable, and I don't get any error messages. All other fields work as expected.
I have the following (simplified) models:
companies/models.py:
class Company(models.Model):
recruitment_agency = models.BooleanField(default=False)
jobs/models.py:
class Job(models.Model):
employer = models.ForeignKey(Company, related_name='jobs')
jobs/search_indexes.py:
class JobIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
recruitment_agency = indexes.BooleanField(model_attr='employer__recruitment_agency')
def get_model(self):
return Job
jobs/forms.py:
class JobSearchForm(SearchForm):
no_recruitment_agencies = forms.BooleanField(label="Hide recruitment agencies", required=False)
def search(self):
sqs = super(JobSearchForm, self).search()
if self.cleaned_data['no_recruitment_agencies']:
sqs = sqs.filter(recruitment_agency=False)
return sqs
Does anyone know what could be the problem?

Meanwhile I've switched over to the ElasticSearch backend, but the problem persisted, indicating that it might be a problem in haystack, and not in Whoosh.
The problem is that the python values True and False are not saved as boolean values, but as string, and they are not converted back to boolean values. To filter on boolean values, you have to check for the strings 'true' and 'false':
class JobSearchForm(SearchForm):
no_recruitment_agencies = forms.BooleanField(label="Hide recruitment agencies", required=False)
def search(self):
sqs = super(JobSearchForm, self).search()
if self.cleaned_data['no_recruitment_agencies']:
sqs = sqs.filter(recruitment_agency='false') # Change the filter here
return sqs

Failed to add documents to Solr: [Reason: None]

I'm trying to index a model in Solr with django-haystack, but it returns me the following error(when using rebuild_index or update_index) :
Indexing 2 jobposts
Failed to add documents to Solr: [Reason: None]
<response><lst name="responseHeader"><int name="status">400</int><int name="QTime">358</int></lst><lst name="error"><str name="msg">ERROR: [doc=jobpost.jobpost.1] unknown field 'django_id'</str><int name="code">400</int></lst></response>
This is search_indexes.py
from haystack import indexes
from haystack.indexes import SearchIndex
from jobpost.models import *
class JobIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
post_type = indexes.CharField(model_attr='post_type')
location = indexes.CharField(model_attr='location')
job_type = indexes.CharField(model_attr='job_type')
company_name = indexes.CharField(model_attr='company_name')
title = indexes.CharField(model_attr='title')
def get_model(self):
return jobpost
def index_queryset(self,**kwargs):
return self.get_model().objects.all()

You need to update schema.xml of your solr engine, as it written here:
"You’ll need to revise your schema. You can generate this from your application (once Haystack is installed and setup) by running ./manage.py build_solr_schema. Take the output from that command and place it in apache-solr-3.5.0/example/solr/conf/schema.xml. Then restart Solr."

haystack.exceptions.SearchBackendError: No fields were found in any search_indexes. Please correct this before attempting to search

I am trying to implement Haystack with whoosh.
I keep getting this error although everything seems to be configured fine. I get the error:
haystack.exceptions.SearchBackendError: No fields were found in any search_indexes. Please correct this before attempting to search.
...when I try to do ./manage.py rebuild_index
configuration:
HAYSTACK_SITECONF = 'myproject'
HAYSTACK_SEARCH_ENGINE = 'whoosh'
HAYSTACK_WHOOSH_PATH = cwd + '/whoosh/mysite_index'
There are successfully created whoosh/mysite_index directories in the root folder of my project.
*search_sites.py*
import haystack
haystack.autodiscover()
*search_indexes.py*
from haystack.indexes import *
from haystack import site
from myproject.models import *
class ResearchersIndex(SearchIndex):
text = CharField(document=True, use_template=True)
name = CharFIeld(model_attr='name')
class SubjectIndex(SearchIndex):
short_name = CharField(model_attr='short_name')
name = CharField(model_attr='name')
text = CharField(document=True, use_template=True)
class ResearchIndex(SearchIndex):
text = CharField(document=True, use_template=True)
abstract = TextField(model_attr='abstract')
methodology = TextField(model_attr='methodology')
year = IntegerField(model_attr='year')
name = CharField(model_attr='name')
class Graph(SearchIndex):
text = CharField(document=True, use_template=True)
explanation = TextField(model_attr='explanation')
type = CharField(model_attr='type')
name = CharField(model_attr='name')
site.register(Researchers, ResearchersIndex)
site.register(Subject, SubjectIndex)
site.register(Research, ResearchIndex)
site.register(Graph, GraphIndex)
Thanks

the problem is in your HAYSTACK_SITECONF. It must be the path to your search_sites file. Fix this and it should work.

Make sure your site_indexes.py is in an app that you have registered in the INSTALLED_APPS in settings.py

How do I query for empty MultiValueField results in Django Haystack

Using Django 1.4.2, Haystack 2.0beta, and ElasticSearch 0.19, how do I query for results which have an empty set [] for a MultiValueField?

I'd create an integer field named num_<field> and query against it.
In this example 'emails' is the MultiValueField, so we'll create 'num_emails':
class PersonIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
name = indexes.CharField(model_attr='name')
emails = indexes.MultiValueField(null=True)
num_emails = indexes.IntegerField()
def prepare_num_emails(self, object):
return len(object.emails)
Now, in your searches you can use
SearchQuerySet().filter(num_emails=0)

You can also change prepare_ method of your MultiValueField:
def prepare_emails(self, object):
emails = [e for e in object.emails]
return emails if emails else ['None']
Then you can filter:
SearchQuerySet().filter(emails=None)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Haystack-Whoosh not indexing any documents - django

Did you run the command? ./manage.py rebuild_index Do you have any Projdb records? You have this in your code: text = indexes.CharField(document=True, use_template=True) Have you set-up the corresponding template (projdb_text.txt)?

Related

I want to upload a xml file into a PostgreSQL using Django

Haystack SearchIndex model_attr not following relation correctly?

Failed to add documents to Solr: [Reason: None]

haystack.exceptions.SearchBackendError: No fields were found in any search_indexes. Please correct this before attempting to search

How do I query for empty MultiValueField results in Django Haystack

Categories

Resources