Implementing search with Haystack and Solr issues - django

I'm trying to implement search to my django based website.
While following the tutorial I found this:
If you’re using the Solr backend, you have an extra step. Solr’s
configuration is XML-based, so you’ll need to manually regenerate the
schema. You should run ./manage.py build_solr_schema first, drop the
XML output in your Solr’s schema.xml file and restart your Solr
server.
First, I don't know where to put my schema.xml, after some resarch I figured I'd create a folder inside my project to put it: myprojectname/solr/schema.xml. Is that right?
Second, how do I restart Solr?
UPDATE
I downloaded Solr,unzipped it and I put the schema.xml generated inside example/solr/conf
then I start solr java -jar start.jar
but when I try to build the index :
./manage.py rebuild_index
I get :
WARNING: This will irreparably remove EVERYTHING from your search index.
Your choices after this are to restore from backups or rebuild via the `rebuild_index` command.
Are you sure you wish to continue? [y/N] y
Removing all documents from your index because you said so.
All documents removed.
Indexing 1 News.
Failed to add documents to Solr: [Reason: None]
<response><lst name="responseHeader"><int name="status">400</int><int name="QTime">4</int></lst><lst name="error"><str name="msg">ERROR: [doc=news.news.2] unknown field 'django_id'</str><int name="code">400</int></lst></response>
Indexing 1 entries.
Failed to add documents to Solr: [Reason: None]
<response><lst name="responseHeader"><int name="status">400</int><int name="QTime">17</int></lst><lst name="error"><str name="msg">ERROR: [doc=zinnia.entry.2] unknown field 'django_id'</str><int name="code">400</int></lst></response>
I verified my schema.xml ,and I do have :
<field name="django_ct" type="string" indexed="true" stored="true" multiValued="false" />
<field name="django_id" type="string" indexed="true" stored="true" multiValued="false" />
P.S.
I'm using Django 1.2 and Haystack 1.2.7

The solr server needs to have a copy of your schema.xml not django. I usually keep a copy of the schema.xml in my django project for version control, but solr can't find it there.
Is you solr server local? Are you using a hosted or remote Solr service? I develop locally then use websolr b/c i dont want to configure solr for production.
For local dev on OSX
I'm assuming this is local development on OSX and that you have homebrew installed (assumptions - give me more info if this isnt the case):
brew install solr
This is going to install Solr at someplace like: /usr/local/Cellar/solr/...
Note: When im developing locally, I like to use fabric for running deployment and some startup tasks.
So in my fabfile.py I have a fabric command to copy my schema.xml into the proper file and start the solr server (I just run fab solr at the cmd line)
def solr() :
# build a new updated schema.xml (changes to indexes/models may require this so always do it for local testing)
local('python manage.py build_solr_schema > schema.xml')
# copy the schema.xml into the proper directory
local('cp schema.xml /usr/local/Cellar/solr/3.6.0/libexec/example/solr/conf/schema.xml')
# start the solr server
local('cd /usr/local/Cellar/solr/3.6.0/libexec/example && java -jar start.jar')
Note: you can run these commands on the command line if you dont use fabric

I had the same issue, the rebuild task failed. For me the solution was:
Build a new schema.xml and place it in the corresponding folder
Restart Solr
rebuild the index without problems

Related

Deploying Django to production correct way to do it?

I am developing Django Wagtail application on my local machine connected to a local postgres server.
I have a test server and a production server.
However when I develop locally and try upload it there is always some issue with makemigration and migrate e.g. KeyError etc.
What are the best practices of ensuring I do not get run into these issues? What files do I need to port across?
so ill tell you what i do and what most of the companies that i worked as a django developer did and i can tell you by experience that worked pretty well.
First containerize your application, this will make your life much more easy and you will remove external influence in your code, also will get you an easy way to reproduce your environment.
Your Dockerfile should be from some python image and should do 3 basically things:
Install your requirements dependencies
Run the python manage.py migrate --noinput command
Run a http server such as gunicorn with gunicorn -c /gunicorn.py wsgi:application
You ill do the makemigration in your local machine and make sure that everything is working before commit then to the repo.
In your gunicorn.py you ill put your settings to run the app such as the number of CPU, the binding port, the folder that your app is, something like:
import os
import multiprocessing
# Chdir to specified directory before apps loading.
# https://docs.gunicorn.org/en/stable/settings.html#chdir
chdir = '/app/'
# Bind the application on localhost both on ipv6 and ipv4 interfaces.
# https://docs.gunicorn.org/en/stable/settings.html#bind
bind = '0.0.0.0:8000'
Second containerize your other stuff, for example the postgres database, the redis (for cache), a connection pooler for the database depending on the size of your application.
Its highly recommend that you have a step in the pipeline to do tests, they need to run before everything, maybe just after lint
Ok what now? now you need a way to deploy that stuff, the best for that scenario is: pull your image to github registry, and you can add a tag to that for example:
IMAGE_ID=ghcr.io/${{ github.repository_owner }}/$IMAGE_NAME
# Change all uppercase to lowercase
IMAGE_ID=$(echo $IMAGE_ID | tr '[A-Z]' '[a-z]')
docker tag $IMAGE_NAME $IMAGE_ID:staging
docker push $IMAGE_ID:staging
This can be add in a github action in the build step for example.
After having your new code in a new image inside github you just need to update the current one, this can be done by creaaing a script to do it in the server and running that script from github action, is something like:
docker pull ghcr.io/${{ github.repository_owner }}/$IMAGE_NAME
echo 'Restarting Application...'
docker stop {YOUR_CONTAINER} && docker up -d
sudo systemctl restart nginx
echo 'Cleaning old images'
sudo docker system prune -af
You can see that i create the image with a staging tag, you can create a rule in github actions for example to trigger that action when you create a new release for example, and create another action to be trigger in every new commit and build/deploy for a dev tag.
For the migration problem, the first thing is, when your application go live squash every migration to the first one (you can drop the database and all the migration then create the database and run the makemigration command again to reach this), so you can have a clean migration in the server. Never creates unnecessary relation between the tables, prefer always doing cached properties instead of add new columns, use UUID for unique ids, and try to not do breaking changes in the database, its hard but if you plan the database before is not so difficult to do.
Hit me if you have any questions. A lot of the stuff that i said can be done in a lot of other platforms such as gitlab, travis, circle ci, but i use the github action in the example because i think is more simple to picture.
EDIT:
I forgot to tell you to have a cron in your server doing backups of your databases, the migrate command ill apply the changes only after the verification but if something else break the database this can save your life.

Enable PSQL hstore extension in Django test db during CI

Context
Some steps of my Continuos Integration procedure are:
start Postgres docker container
run Django tests
When manage.py test --noinput command is runned it:
creates a new test_xxx database (drop if exists)
runs the founded migrations against it
runs the founded set of tests
Into the tests that need to fetch data from the database are configured a set of fixtures that will be loaded automatically in the test_xxx db.
Problem
Some migrations need the Postgres hstore extension, in fact i'm getting this error:
django.db.utils.ProgrammingError: type "hstore" does not exist
Question
How can i enable the hstore extension?
In development and other envs it was set up with CREATE EXTENSION IF NOT EXISTS hstore; but here can't be manually set.
Is possible to define a "migration zero" with the hstore creation? Anyway i don't like this approach.
I've found that it should be theorically possible to listen to the pre_migrate signal, and it will be the sweet spot, but before make things more complex i'd like to search for an easier, more direct solution.
EDIT: in this particular case i must use Django 1.8, since 1.11 is possible to define a template from which create the test db, so is possible to define a template with the hstore and problem solved.
The final workaround until the Django update is:
when start the Postgre container add an init.sql file with the test_xxx database creation and hstore extension
when launching the django tests, instead of using the --noinput flag, use the --keepdb one so the db isn't recreated

Configuring postgresql database for local development in Django while using Heroku

I know there are a lot of questions floating around there relating to similar issues, but I think I have a specific flavor which hasn't been addressed yet. I'm attempting to create my local postgresql database so that I can do local development in addition to pushing to Heroku.
I have found basic answers on how to do this, for example (which I think is a wee bit outdated):
'#DATABASES = {'default': dj_database_url.config(default='postgres://fooname:barpass#localhost/dbname')}'
This solves the "ENGINE" is not configured error. However, when I run 'python manage.py syncdb' I get the following error:
'OperationalError: FATAL: password authentication failed for user "foo"
FATAL: password authentication failed for user "foo"'
This happens for all conceivable combinations of username/pass. So my ubuntu username/pass, my heroku username/pass, etc. Also this happens if I just try to take out the Heroku component and build it locally as if I was using postgresql while following the tutorial. Since I don't have a database yet, what the heck do those username/pass values refer to? Is the problem exactly that, that I need to create a database first? If so how?
As a side note I know I could get the db from heroku using the process outlined here: Should I have my Postgres directory right next to my project folder? If so, how?
But assuming I were to do so, where would the new db live, how would django know how to access it, and would I have the same user/pass problems?
Thanks a bunch.
Assuming you have postgres installed, connect via pgadmin or psql and create a new user. Then create a new database and with your new user as the owner. Make sure you can connect via psql with the new user into to the database. you will then need to set up an env variable in your postactivate file in your virtualenv's bin folder and save it. Here is what I have for the database:
export DATABASE_URL='postgres://{{username}}:{{password}}#localhost:5432/{{database}}'
Just a note: adding this value to your postactivate doesn't do anything. The file is not run upon saving. You will either need to run this at the $ prompt, or simply deactivate and active your virtualenv.
Your settings.py should read from this env var:
DATABASES = {'default': dj_database_url.config()}
You will then configure Heroku with their CLI tool to use your production database when deployed. Something like:
heroku config:set DATABASE_URL={{production value here}}
(if you don't have Heroku's CLI tool installed, you need to do it)
If you need to figure how exactly what that value you need for your production database, you can get it by logging into heroku's postgresql subdomain (at the time this is being written, it's https://postgres.heroku.com/) and selecting the db from the list and looking at the "Connection Settings : URL" value.
This way your same settings.py value will work for both local and production and you keep your usernames/passwords out of version control. They are just env config values.

Google App Engine Development and Production Environment Setup

Here is my current setup:
GitHub repository, a branch for dev.
myappdev.appspot.com (not real url)
myapp.appspot.com (not real url)
App written on GAE Python 2.7, using django-nonrel
Development is performed on a local dev server. When I'm ready to release to dev, I increment the version, commit, and run "manage.py upload" to the myappdev.appspot.com
Once testing is satisfactory, I merge the changes from dev to main repo. I then run "manage.py upload" to upload the main repo code to the myapp.appspot.com domain.
Is this setup good? Here are a few issues I've run into.
1) I'm new to git, so sometimes I forget to add files, and the commit doesn't notify me. So I deploy code to dev that works, but does not match what is in the dev branch. (This is bad practice).
2) The datastore file in the git repo causes issues. Merging binary files? Is it ok to migrate this file between local machines, or will it get messed up?
3) Should I be using "manage.py upload" for each release to the dev or prod environment, or is there a better way to do this? Heroku looks like it can pull right from GitHub. The way I'm doing it now seems like there is too much room for human error.
Any overall suggestions on how to improve my setup?
Thanks!
I'm on a pretty similar setup, though I'm still runing on py2.5, django-nonrel.
1) I usually use 'git status' or 'git gui' to see if I forgot to check in files.
2) I personally don't check in my datastore. Are you familiar with .gitignore? It's a text file in which you list files for git to ignore when you run 'git status' and other functions. I put in .gaedata as well as .pyc and backup files.
To manage the database I use "python manage.py dumpdata > file" which dumps the database to a json encoded file. Then I can reload it using "python manage.py loaddata".
3) I don't know of any deploy from git. You can probably write a little python script to check whether git is up to date before you deploy. Personally though, I deploy stuff to test to make sure it's working, before I check it in.

solr + haystack + django where do I place schema.xml?

I just installed Solr and Haystack for a Django project I'm working on. Following this Haystack tutorial I created a schema.xml but I'm not sure where to place it in the Solr installation.
My Solr installation is in a directory like this: /solr and I'm starting the Solr service from /solr/example with this command java -jar start.jar.
Any ideas where to place that schema.xml and how to tell Solr to use it?
Solr looks for schema.xml in the ./conf directory under the "Solr home" directory.
See this page for more info.
http://wiki.apache.org/solr/ConfiguringSolr
It has to do with the details of the core your've created. Ideally the first step of using solr would be creating your core like this
./bin/solr create -c haystack where haystack is the name of your core
Now you can see a new directory has been created as ./server/solr/haystack/conf which is where schema.xml should be copied into.