I am building out a solr instance for django, but the example provided from solr is super verbose, with many things that are not relevant to haystack. A sample with spelling suggestions, morelikethis, and faceting, without the extra stuff that haystack doesn't use would go a long way to helping me understand what is needed and what isn't.
I use this one. I works and fits my needs, except for morelikethis that wasn't that good and I don't use faceting.
You should not use an "out of the box" solr config. You should understand your search requirements and write a schema and config that matches.
This is one of the drawbacks of the way people use haystack. They rely on the default behaviour which is very rarely the optimum behaviour for solr.
You shouldn't need to write an xml file - one of the benefits of Haystack is that it does that for you. Once your searchindex classes are defined, just run ./manage.py build_solr_schema and copy the resulting xml to your solrconfig file.
Related
I'm just wondering what is exactly the functionality that haystack provides and if I need it.
I mean the search and indexing is done by whoosh. As far as I can tell, haystack is just offering ready made views, and forms. If I want to write my own form and views do I still need haystack?
Am I missing something?
P.S. I don't plan to use any other search engine than whoosh so I also don't need haystacks's multiple search engine wrapping.
Besides views, forms and a search engine-agnostic layer, the other powerful thing about Haystack is its ability to map Django models to something the search index understands. Using Haystack, you can easily specify which fields in a model should be indexed and how (see the SearchIndex API - http://django-haystack.readthedocs.org/en/latest/searchindex_api.html).
Once you have done that, you can then leverage the built-in management commands to (re)index your data when required.
It also comes with some nice templatetags to help present search results, like highlighting the matching bits.
Is there a particular reason that you don't want to use Haystack? It is a pretty non-intrusive plugin that lets you use as much of it as you need, and makes it easy to use more advanced functionality when you need it later down the road. In one of the sites I built, I only used the SearchIndex and SearchQuerySet APIs; I built my own views and forms. Ultimately, if you end up writing your own indexing and searching code, views and forms, you have basically re-written a large part of Haystack, in which case, you may want to consider using something that is in use out there and reasonably well tested.
That said, I have rolled my own 'Haystack' like layer in another project, mainly because the data source didn't map to the Django ORM. In that case, I wrote my own indexing scripts, and used PySolr to interface with my Apache Solr instance.
Given that Whoosh is written in Python, I'd assume it has a decent Python interface, so it shouldn't be too hard to do. I would only do it if there's something special about your scenario though.
I'm build a Django app with Neo4j (along with Postgres), I found this Django integration called neo4django, I was wondering if it's possible to use neo4restclient only, like, what would be the disadvantages of not using Neo4django? Does using neo4-rest-client only, give me more flexibility?
When I was creating my models with Neo4Django, it seemed that there is no difference between modeling a graph db and relational db. Am I missing anything?
Thanks!
You can absolutely go ahead with neo4j-rest-client or py2neo, without using neo4django. In the same way, you can use any other database driver you'd like any time using Django, any REST client, etc.
What'll you lose? The model DSL, the built-in querying (eg, Person.objects.filter(name="Mohamed")), the built-in indexing, and the Lucene, Gremlin and Cypher behind that. Some things will be much easier- like setting an arbitrary property on a node- but you'll need to learn more about how Neo4j works.
You'll also lose some of the shortcuts Django provides that work with neo4django, like get_object_or_404() and some of the class-based views that work with querysets.
What'll you gain? Absolute power over the DB, and an easier time tweaking DB performance. Though neo4django isn't nearly as good a lib as some traditional ORMs in the Python sphere, the trade-off of power vs provided ease is similar.
That said, the two can work together- you can drop down from neo4django to the underlying REST client nodes and relationships anytime. Just use model_instance.node to get the underlying neo4j-rest-client node object from a model, and from neo4django.db import connection to get a wrapped neo4j-rest-client GraphDatabase.
On whether you're missing something: neo4django was written to re-use a powerful developer interface- the Django ORM- so it should feel similar to writing models for Postgres. I've written a bit about that odd feeling in the past. I think part of the problem might be that the lib doesn't highlight the graph terminology new graph-interested devs expect- like traversals and pattern matching- and instead dresses those techniques in Django query clothing.
I'd love your thoughts, or to know anything you'd like the library to do that it isn't doing :) Good luck!
We are planning to use django-haystack with Solr4.0 (with near real time search) for our web app, and I was wondering if anyone could advice on the limits of using haystack (when compared to using solr directly). i.e is there a performance hit/overhead of using django-haystack? We have around 3 million+ documents which would need indexing + an additional (estimated) 100k added everyday.
Ideally, I'd think we need a simple API over Solr4 - but I am finding it hard to find anything specific to python which is still actively maintained (except django-haystack ofcourse). I'd appreciate any guidance on this.
It seems like your question could be rephrased "How has Haystack burned you?" Haystack is nice for some things, but has also caused me some headaches on the job. Here are some things I've had to code around.
You mentioned indexing. Haystack has management commands for rebuilding the index. We'll use these for nuke and pave rebuilding during testing, but for reindexing our production content we kind of hit the wall. The command would take forever, you wouldn't know where it was in terms of progress, and if it failed you were screwed and had to start all over again. We reached a point where we had too much content and it would fail reliably enough. We switched to making batches of content and reindexing these in celery tasks. We made a management command to do the batching and kick off all those tasks. This has been a little more robust in the face of partial failures and, even better, it actually finishes.The underlying task will use haystack calls to convert a database object to a solr document -- this ORMiness is nice and hasn't burned me yet. We edit our solr schema by hand, though.
The query API is okay for simple stuff. If you are doing more convoluted solr queries, you can find yourself just feeding in raw queries. This can lead to spaghetti code. We eventually pushed that raw spaghetti into request handlers in the solrconfig file instead. We still use haystack to turn on highlighting or to choose a specific request handler, but again, I'm happier when we keep it simple and we did hack on a method to add arbitrary parameters as needed.
There are other assumptions about how you want to use solr that seem to get baked in to the code. Haystack is the only open source project where I actually have some familiarity with the code. I'm not sure if that's a good thing, since it hasn't always been by choice. We have plenty of layer cake code that extends a haystack class and overrides it to do the right thing. This isn't terrible, but when you have to also copy and paste haystack code into those overridden methods then it starts being more terrible.
So... it's not perfect, but parts are handy. If you are on the fence about writing your own API anyway, using haystack might save you some trouble, especially when you want to take all your solr results and pass them back into django templates or whatever. It sounds like with the steady influx of documents, you will want to write some batch indexing jobs anyway. Start with that, prepare to be burnt a little and look at the source when that happens, it's actually pretty readable.
I want to store some single data of my web-site. Actually, I want to set articles that I want to display at the start page, popular tags and another stuff.
Django offers me to make a model, so it is supposed that there are lots of such data.
How to realize this task in the right way? May be my approach is completely wrong?
Thank you in advance!
You might consider looking at a CMS, Django-CMS is getting quite mature.
Aside from that, it sounds like you need to store some one-off or singleton objects. You can most certainly use models for this as it will not only help you think properly about your data structures and learn about this powerful Django DB abstraction, but I suspect that you'll find rather quickly that you may indeed want to create multiple objects over time (its is often rare that you don't).
If you have something that really is and always should be a singleton, consider placing it in your settings.py file instead.
I am looking for a good ldap library on Django, that would allow me to manage my ldap server :
adding, modifying, deleting entries
for groups, users, and all kind of objects
The library django-ldapdb looked promising, it offers a Model base class that can be used to declare ldap objects in a Django fashion (which is what we ideally want), however we've had some bugs with it, and furthermore it seems like it is not maintained any more.
Does somebody know a good library that could do the trick ? Otherwise I guess I'll just try to improve and debug django-ldapdb ...
Thanks !
sebpiq, you say you applied "one or two fixes" to django-ldapdb, would you care to share them? So far django-ldapdb meets my needs, but I'd be happy to integrate any fixes you might have.
When using ldapdb to query ldap with more results than the server allows instead of getting the partial list (of say the first 500 users) I get SIZELIMIT_EXCEEDED exception. Trying to change the code to catch that exception resulted in an empty result objects.
Anyone else had that problem?
I fixed that problem by changing the search_s function to use search_ext and read the results one by one until the exception happens.
http://www.python-ldap.org/doc/html/index.html
The beauty of Django is that you can use any python module within your application.
There is also django-auth-ldap which claims
LDAP configuration can be as simple as a single distinguished name template, but there are many rich options for working with User objects, groups, and permissions.
Actually, I have found out that with one or two fixes, django-ldapdb is a pretty good library. The only bad point is that it is not very actively maintained... I will use it anyways, because it is the best solution I have found.