django haystack with elastic search - raise BulkIndexError - django

I am getting issue with " raise BulkIndexError while running this python manage.py rebuild_index ?
Here is my haystack configuration in settings.py file
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://127.0.0.1:9200/',
'INDEX_NAME': 'haystack',
#'SILENTLY_FAIL': False,
},
}
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
Here is my search_indexes.py
class ProductIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
content_auto = indexes.EdgeNgramField(model_attr='title')
def get_model(self):
return Product
def index_queryset(self, using=None):
return self.get_model().objects.all()
here is my views.py
def search_titles():
products = SearchQuerySet().autocomplete(content_auto=request.POST.get('search_text', ''))
return render_to_response('sea.html', {'products':products})
When I try to indexing the my product model I ran this command
python manage.py rebuild_index
, It has not indexed. It raises
File "/home/Documents/swamy/project/env/local/lib/python2.7/site-packages/elasticsearch/helpers/init.py", line 156, in streaming_bulk
raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
elasticsearch.helpers.BulkIndexError: ('500 document(s) failed to index.'
this error.
My model has 21000 products, Can any one help to fix this issue ?
Thanks In Advance !

Elasticsearch changed the way bulk indices are created.
You can use version 1.4.0 which works seamlessly with django-haystack.

Related

How to configure sqlite of django project with pythonanywhere

I have deployed my project to pythonanywhere.
It is working locally.
But with pythonanywhere I am getting no such table exception.
I have configured sqllite as in this link
Just mentioned to generate the sqlite file using runmigrations.
I have changed the settings.py to use os.path.join at that Database section also but still same issue.
Exception Type: ProgrammingError
Exception Value:
(1146, "Table 'todo_todo' doesn't exist")
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': BASE_DIR / 'db.sqlite3',
}
}
tried with os.path.join also but same error.
my models.py
from django.db import models
from django.contrib.auth.models import User
# Create your models here.
class Todo(models.Model):
title = models.CharField(max_length=100)
memo = models.TextField(blank=True)
created=models.DateTimeField(auto_now_add=True)
datecompleted=models.DateTimeField(null=True, blank=True)
important=models.BooleanField(default=False)
user = models.ForeignKey(User,on_delete=models.CASCADE)
def __str__(self):
return self.title
I migrated individual apps also.
python manage.py makemigrations appname

I am using django haystack with whoosh , But there some error after entering some search query in searchbar, below is the description

Error screenshot
AttributeError at /search/
'NoneType' object has no attribute '_default_manager'
Request Method: GET
Request URL: http://127.0.0.1:8000/search/?q=desktop
Django Version: 1.9
Exception Type: AttributeError
Exception Value:'NoneType' object has no attribute '_default_manager'
Exception Location: /home/ankit/venv/django/lib/python3.4/site-packages/haystack/query.py in post_process_results, line 219
Python Executable: /home/ankit/venv/django/bin/python
Python Version: 3.4.3
Python Path: ['/home/ankit/venv/django/p2',
'/home/ankit/venv/django/lib/python3.4',
'/home/ankit/venv/django/lib/python3.4/plat-x86_64-linux-gnu',
'/home/ankit/venv/django/lib/python3.4/lib-dynload',
'/usr/lib/python3.4',
'/usr/lib/python3.4/plat-x86_64-linux-gnu',
'/home/ankit/venv/django/lib/python3.4/site-packages']
Server time: Sun, 27 Dec 2015 10:29:28 +0000
My search_indexes.py
import datetime
from haystack import indexes
from inventory.models import Item
class ItemIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
pub_date = indexes.DateTimeField(model_attr='pub_date')
content_auto = indexes.EdgeNgramField(model_attr='title')
def get_model(self):
return Item
def index_queryset(self, using=None):
return self.get_model().objects.all()
My settings.py file
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine',
'PATH': os.path.join(os.path.dirname(__file__), 'whoosh_index'),
},
}
The django.db.model.get_model is not available any longer in 1.9, so None is returned when calling django.db.models.get_model, but in the more recent commit (from the 3rd of January) the utils.app_loading.py is used to either use the django.apps.apps.get_model when using Django 1.7 or higher, otherwise the old django.db.models._get_models is used.
So, best to upgrade to the latest development version git+https://github.com/django-haystack/django-haystack.git.

Django Haystack ElasticSearch InvalidJsonResponseError: <Response [404]>

I'm using Django and the Haystack module to create a search engine. I want to use ElasticSearch. I have installed it and launched it with:
$ brew install elasticsearch
$ elasticsearch -f -D es.config=/usr/local/Cellar/elasticsearch/0.90.2/config/elasticsearch.yml
My settings seem correct and work:
# Haystack configuration
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://127.0.0.1:8000/',
'INDEX_NAME': 'haystack',
},
}
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
Here is my search indexes:
from haystack import indexes
from account.models import Profile
class ProfileIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
first_name = indexes.CharField(model_attr='first_name')
last_name = indexes.CharField(model_attr='last_name')
def get_model(self):
return Profile
and my profile_text.txt:
{{ object.first_name }}
{{ object.last_name }}
Everything seems correct I guess, I follow the documentation and this tutorial.
But now, when I'm triggering:
$ python manage.py rebuild_index
I get this error:
pyelasticsearch.exceptions.InvalidJsonResponseError: <Response [404]>
If someone knows why? :)
Thank you.
You're running the Elastic Search Server on the same port as the Django Server is running on.
Change the port from 8000 to something else, and then it'll work!

AttributeError: 'module' object has no attribute 'ElasticSearchError' : Using Haystack Elasticsearch

Using Django & Haystack with ElasticSearch.
After installing haystack and ES, and Rebuilding Index
./manage.py rebuild_index
WARNING: This will irreparably remove EVERYTHING from your search index in connection 'default'.
Your choices after this are to restore from backups or rebuild via the rebuild_index command.
Are you sure you wish to continue? [y/N] y
Removing all documents from your index because you said so.
All documents removed.
Indexing 1039 <django.utils.functional.__proxy__ object at 0x10ca3ded0>.
AttributeError: 'module' object has no attribute 'ElasticSearchError'
Updating index has the same problem
/manage.py update_index
Indexing 1039 <django.utils.functional.__proxy__ object at 0x10ea49d90>.
AttributeError: 'module' object has no attribute 'ElasticSearchError'
Clear index works fine though ( probably because there is no index )
./manage.py clear_index
WARNING: This will irreparably remove EVERYTHING from your search index in connection 'default'.
Your choices after this are to restore from backups or rebuild via the `rebuild_index` command.
Are you sure you wish to continue? [y/N] y
Removing all documents from your index because you said so.
All documents removed.
Versions
django-haystack==2.0.0-beta
pyelasticsearch==0.5
elasticsearch==0.20.6
localhost:9200 says :
{
"ok" : true,
"status" : 200,
"name" : "Jigsaw",
"version" : {
"number" : "0.20.6",
"snapshot_build" : false
},
"tagline" : "You Know, for Search"
}
Haystack settings :
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://127.0.0.1:9200/',
'INDEX_NAME': 'haystack',
},
}
search_indexes.py :
import datetime
import haystack
from haystack import indexes
from app.models import City
class CityIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
name = indexes.CharField(model_attr='name')
state = indexes.CharField(model_attr='state')
country = indexes.CharField(model_attr='country')
lat = indexes.FloatField(model_attr='latitude')
lon = indexes.FloatField(model_attr='longitude')
alt = indexes.FloatField(model_attr='altitude')
pop = indexes.IntegerField(model_attr='population')
def get_model(self):
return City
Any help - why I am getting error ?
Solved it !
After debugging the process using pdb
./manage.py rebuild_index
At line 222 - in /haystack/backend/elasticsearch_backend.py
Changed
except (requests.RequestException, pyelasticsearch.ElasticSearchError), e:
To
# except (requests.RequestException, pyelasticsearch.ElasticSearchError), e:
except Exception as inst:
import pdb; pdb.set_trace()
I found out the core error was this
'ElasticSearch' object has no attribute 'from_python'.
To which I found solution here - https://github.com/toastdriven/django-haystack/issues/514#issuecomment-4058230
The version of pyelasticsearch I was using was from http://github.com/rhec/pyelasticsearch,
So I installed pyelasticsearch from a fork - http://github.com/toastdriven/pyelasticsearch using :
pip install --upgrade git+https://github.com/toastdriven/pyelasticsearch.git#3bfe1a90eab6c2dfb0989047212f4bc9fb814803#egg=pyelasticsearch
and That fixed it & Index was build !

Django tests complain of missing tables

When I run my test dealing with my Customer model, I get the following error:
DatabaseError: (1146, "Table 'test_mcif2.customer' doesn't exist")
I'm not entirely surprised because I have my Django project connected to a "legacy" database. Since my tables weren't created "the Django way," it's not shocking that Django wouldn't be able to talk to them without some finagling. Here's my model:
from django.db import models
from django.db import connection, transaction
from mcif.models.mcif_model import McifModel
class Customer(McifModel):
class Meta:
db_table = u'customer'
app_name = 'mcif'
id = models.BigIntegerField(primary_key=True)
customer_number = models.CharField(unique=True, max_length=255)
social_security_number = models.CharField(unique=True, max_length=33)
name = models.CharField(unique=True, max_length=255)
phone = models.CharField(unique=True, max_length=255)
deceased = models.IntegerField(unique=True, null=True, blank=True)
do_not_mail = models.IntegerField(null=True, blank=True)
created_at = models.DateTimeField()
updated_at = models.DateTimeField()
def distinguishing_column_names(self):
return ['name', 'customer_number', 'social_security_number', 'phone']
Any idea why exactly this isn't working?
Edit: Here's McifModel:
from django.db import models
from django.db import connection, transaction
class McifModel(models.Model):
class Meta:
abstract = True
def upsert(self):
cursor = connection.cursor()
cursor.execute(self.upsert_sql())
transaction.commit_unless_managed()
return self
def value_list(self):
return ','.join(map(lambda column_name: "'{c}'".format(c=getattr(self, column_name)), self.distinguishing_column_names()))
def upsert_sql(self):
column_names = ','.join(self.distinguishing_column_names())
return "INSERT IGNORE INTO {t} ({c}) VALUES ({v})".format(t=self._meta.db_table, c=column_names, v=self.value_list())
#classmethod
def save_from_row(cls, row):
object = cls()
map(lambda column_name: setattr(object, column_name, row.value(object._meta.db_table, column_name)), object.distinguishing_column_names())
return object.upsert()
Edit: I took tarequeh's advice and put the contents of the Caktus file in mcif/utils.py. I also set TEST_RUNNER = 'mcif.utils.ManagedModelTestRunner'. If I go on the console I can verify that Customer is unmanaged:
>>> [m for m in get_models() if not m._meta.managed]
[<class 'mcif.models.customer.Customer'>]
However, my test still complains that the table doesn't exist. What am I missing?
Here's my settings.py:
# Django settings for mcifdjango project.
DEBUG = True
TEMPLATE_DEBUG = DEBUG
ADMINS = (
('Jason Swett', 'jason.swett#gmail.com'),
)
MANAGERS = ADMINS
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql', # Add 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
'NAME': 'xxxxx', # Or path to database file if using sqlite3.
'USER': 'xxxxx', # Not used with sqlite3.
'PASSWORD': 'xxxxx', # Not used with sqlite3.
'HOST': '', # Set to empty string for localhost. Not used with sqlite3.
'PORT': '', # Set to empty string for default. Not used with sqlite3.
}
}
# Local time zone for this installation. Choices can be found here:
# http://en.wikipedia.org/wiki/List_of_tz_zones_by_name
# although not all choices may be available on all operating systems.
# On Unix systems, a value of None will cause Django to use the same
# timezone as the operating system.
# If running in a Windows environment this must be set to the same as your
# system time zone.
TIME_ZONE = 'America/Chicago'
# Language code for this installation. All choices can be found here:
# http://www.i18nguy.com/unicode/language-identifiers.html
LANGUAGE_CODE = 'en-us'
SITE_ID = 1
# If you set this to False, Django will make some optimizations so as not
# to load the internationalization machinery.
USE_I18N = True
# If you set this to False, Django will not format dates, numbers and
# calendars according to the current locale
USE_L10N = True
# Absolute path to the directory that holds media.
# Example: "/home/media/media.lawrence.com/"
MEDIA_ROOT = ''
# URL that handles the media served from MEDIA_ROOT. Make sure to use a
# trailing slash if there is a path component (optional in other cases).
# Examples: "http://media.lawrence.com", "http://example.com/media/"
MEDIA_URL = ''
# URL prefix for admin media -- CSS, JavaScript and images. Make sure to use a
# trailing slash.
# Examples: "http://foo.com/media/", "/media/".
ADMIN_MEDIA_PREFIX = '/media/'
# Make this unique, and don't share it with anybody.
SECRET_KEY = '#7+qm%hqfe+z8ul5#x_i&sqmu!n=4sa0&i0_#)m99*w$fbk3%#'
# List of callables that know how to import templates from various sources.
TEMPLATE_LOADERS = (
'django.template.loaders.filesystem.Loader',
'django.template.loaders.app_directories.Loader',
# 'django.template.loaders.eggs.Loader',
)
MIDDLEWARE_CLASSES = (
'django.middleware.common.CommonMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
)
ROOT_URLCONF = 'mcifdjango.urls'
TEMPLATE_DIRS = (
# Put strings here, like "/home/html/django_templates" or "C:/www/django/templates".
# Always use forward slashes, even on Windows.
# Don't forget to use absolute paths, not relative paths.
)
INSTALLED_APPS = (
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
'django.contrib.messages',
'django.contrib.admin',
'django_extensions',
'mcif',
# Uncomment the next line to enable the admin:
# 'django.contrib.admin',
# Uncomment the next line to enable admin documentation:
# 'django.contrib.admindocs',
)
TEST_RUNNER = 'mcif.utils.ManagedModelTestRunner'
import os
ROOTDIR = os.path.abspath(os.path.dirname(__file__))
TEMPLATE_DIRS = (
# Put strings here, like "/home/html/django_templates" or "C:/www/django/templates".
# Always use forward slashes, even on Windows.
# Don't forget to use absolute paths, not relative paths.
ROOTDIR + '/mcif/templates',
)
Edit 2:
Here's my Customer class now:
from django.db import models
from django.db import connection, transaction
from mcif.models.mcif_model import McifModel
class Customer(McifModel):
class Meta:
db_table = u'customer'
managed = False
id = models.BigIntegerField(primary_key=True)
customer_number = models.CharField(unique=True, max_length=255)
social_security_number = models.CharField(unique=True, max_length=33)
name = models.CharField(unique=True, max_length=255)
phone = models.CharField(unique=True, max_length=255)
deceased = models.IntegerField(unique=True, null=True, blank=True)
do_not_mail = models.IntegerField(null=True, blank=True)
created_at = models.DateTimeField()
updated_at = models.DateTimeField()
def distinguishing_column_names(self):
return ['name', 'customer_number', 'social_security_number', 'phone']
Here's what I get when I run the test:
$ ./manage.py test mcif.CustomerUpsertTest
Creating test database 'default'...
Creating table auth_permission
Creating table auth_group_permissions
Creating table auth_group
Creating table auth_user_user_permissions
Creating table auth_user_groups
Creating table auth_user
Creating table auth_message
Creating table django_content_type
Creating table django_session
Creating table django_site
Creating table django_admin_log
Installing index for auth.Permission model
Installing index for auth.Group_permissions model
Installing index for auth.User_user_permissions model
Installing index for auth.User_groups model
Installing index for auth.Message model
Installing index for admin.LogEntry model
No fixtures found.
E
======================================================================
ERROR: test_upsert (mcif.tests.customer_upsert_test.CustomerUpsertTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jason/projects/mcifdjango/mcif/tests/customer_upsert_test.py", line 9, in test_upsert
customer.upsert()
File "/home/jason/projects/mcifdjango/mcif/models/mcif_model.py", line 11, in upsert
cursor.execute(self.upsert_sql())
File "/usr/lib/pymodules/python2.6/django/db/backends/mysql/base.py", line 86, in execute
return self.cursor.execute(query, args)
File "/usr/lib/pymodules/python2.6/MySQLdb/cursors.py", line 166, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/pymodules/python2.6/MySQLdb/connections.py", line 35, in defaulterrorhandler
raise errorclass, errorvalue
DatabaseError: (1146, "Table 'test_mcif_django.customer' doesn't exist")
----------------------------------------------------------------------
Ran 1 test in 3.724s
FAILED (errors=1)
Destroying test database 'default'...
Since you're using a legacy database, you are probably not adding the app name to INSTALLED_APPS. If an app is not included in INSTALLED_APPS, the tables for the apps' models will not get created on syncdb. This works for you in production since you already have a table, but not in test environment.
You can adopt any of the following:
The supermonkeypatch way: Take out app_name from Customer class Meta, put the model in a models.py file inside a python module name mcif, and add mcif to INSTALLED_APPS - just for the sake of testing
The nicer way: Extend DjangoTestSuiteRunner and override setup_test_environment to call super and then create your legacy table manually in the test DB.
The nicest way: Put your model in properly named app module. Remove app_name from model Meta but add managed=False docs. Include app name in INSTALLED_APPS. Now django will not create table for that model. Then use this nice snippet the Caktus group folks have compiled to run your tests.
Cheers!
Edit - How to use the overridden DjangoTestSuiteRunner
You will need at least Django 1.2 for this.
Copy the code from here. Put it in utils.py inside the mcif app.
Add/edit the following in settings.py:
TEST_RUNNER = 'mcif.utils.ManagedModelTestRunner'
Now when you run tests, all unmanaged tables will be treated as managed table only for the duration of the test. So the tables will be created prior to running tests.
Notice this part of the code, thats where the magic happens.
self.unmanaged_models = [m for m in get_models() if not m._meta.managed]
for m in self.unmanaged_models:
m._meta.managed = True
2nd Edit: Possible Gotchas
Make sure of the following:
The DB user has privilege to create databases and not only tables because django will try to create a test database
The test cases extend django.test.TransactionTestCase, since you have transactional behavior
If none of the above applies, put a pdb in ManagedModelTestRunner's setup_test_environment just to make sure the code is being reached. Because if that code is reached, the table should get created
3rd Edit: Debugging
Inside mcif.utils.ManagedModelTestRunner replace setup_test_environment function with the following and let me know if the output of your test changes:
def setup_test_environment(self, *args, **kwargs):
print "Loading ManagedModelTestRunner"
from django.db.models.loading import get_models
self.unmanaged_models = [m for m in get_models()
if not m._meta.managed]
for m in self.unmanaged_models:
print "Modifying model %s to be managed for testing" % m
m._meta.managed = True
super(ManagedModelTestRunner, self).setup_test_environment(*args, **kwargs)
The solutions presented by tarequeh worked for me after overriding DATABASE_ROUTERS.
I am using routers in order to prevent writes on the legacy database. In order to get around this I created a test_settings file with the following contents:
from settings import *
DEBUG = True
TEST_RUNNER = 'legacy.utils.ManagedModelTestRunner'
DATABASE_ROUTERS = []
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': os.path.join(HERE, 'test.db'),
},
}
Then when running tests:
python manage.py test [app_name] --settings=test_settings
There's not enough info above to answer your first question. However, once you get that issue resolved you'll probably want to install django-extensions for the following reason: It has an incredibly useful sqldiff command that will inform you if there's a mismatch between the legacy database and your application model.
Here's a more up-to-date solution, that also works with current versions of Django (I tested it on Django 3.2.11):
https://medium.com/an-idea/testing-with-the-legacy-database-in-django-3be84786daba
Also, in case you want to furthermore populate the Django test-database with your legacy database's data:
Check out fixtures