Does anyone have any experience using django-haystack with the whoosh backend?
I'm looking to use it for a categorized live-search type tool. Is it gonna be fast/efficient enough in a production environment to avoid setting up either solr or xapian?
As a general principle, I put Whoosh in the same category as SQLite: great for getting started, wonderful for single-user or really small-scale apps, but not suitable for large-scale deployment.
Whoosh is, in my experience, about an order of magnitude slower than Solr. A typical search against a bigish Solr index I've got in production takes about a hundredth of a second ; the same search using Whoosh and the same data takes roughly a tenth of second.
You should decide what's "fast enough" for you, but I don't think Whoosh is a good idea for anything where you expect high performance.
I found xapian extremely easy to setup on my Debian.
aptitude install python-xapian
and that's all.
To use it with django there is a very good app named djapian.
I would go with either Solr or Xapian (although it's not quite officially supported by haystack yet; see this thread). Solr is easy to setup and get running if you follow the tutorial, however I've had a heck of a time getting it installed in a production environment - but that's mostly due to my lack of experience with Java server environments. Your mileage may vary.
I'd also put in another plug for djapian. It's very well documented and is under very active development.
You should use elasticsearch instead of whoosh....It is more faster and has more functionalities than solr also.
Related
Please excuse the noobiness of my question. I am mostly searching here for some directions and buzzwords to start digging from.
I spent some time developing an application in Python
Basically, it takes a bunch of images and creates a video out of it.
It i quite simple, and uses only a few libraries (opencv and nunmpy mostly).
I designed a small gui in gtk, but I think that it would be a good idea to offer the service over the web.
I think I could reuse some of my core and design a front end that people could access in their browser.
I only need a few data to get it running (images, an email)
The thing is my web dev skills are really close to 0, and I don't exactly know where to start from .
I don't plan on having hundreds of people a day on the platform.
People would connect, feed me with the data (link to a dropbox folder, google drive, whatever) and I would send them a message where it's finished.
If you could provide me with some names or links so that I could touch the field, I'd be really glad.
CGI is a fine option, but if you already have Python experience Django is definitely worth checking out (it falls in the category of rhooligan's #3 except it uses Python!). Django completely takes care of all of the database backend details for you, which is a benefit over simple CGI. It also provides easy-to-use pre-defined classes for handling file uploads, images, etc. It also has a great tutorial that will get you up and running. Just be careful about whether you're using version 1.3, 1.4, or the latest dev version, because some aspects of the framework have changed fairly quickly. Make sure that you're always looking at the right version of the docs.
Another handy service to keep in mind for doing something like image processing through a web app is a hosted cloud computing service provider like PiCloud. Unless you already have a private web server with lots of memory and processing power, these cloud services that charge by the ms are really cool. They also give you 1000s of cores which could allow you to do lot's of concurrent processing. They provide a nice Python API, and it has numpy and opencv pre-installed in both v2.6 and v2.7. (They use PyOpenCV, but you also have root access to install anything you want, so you can set up the "cv2" interface if that's what you're using--actually I just looked at your GitHub and it looks like you're using the old "cv" interface. You can also install any application you want on PiCloud--it doesn't have to be Python.)
You could start by looking into the Python CGI module and see if it will work for you. Then you'll need to do the following steps:
Decide on a webserver and install it, Apache is probably a good starting point.
Design the UI. Wireframe things out on paper paper. Figure out how you'd ideally want the users to go through your site and what you want on each page/view.
Your decision in #2 drives all the decisions from this point out. These days, most web applications are a combination of Web 1.0 and JSON/REST "services" (there's a couple of buzzwords for ya!). JQuery is a popular and widely used JavaScript library for developing the front end of your site. That would be another thing to look at. JQuery is completely independent from the back end and can be used with any type of back end (PHP, Ruby, Perl, .NET, etc)
I need to create a In-App-Purchase backend for a iPhone App, and think in build it on GAE.
However, after my experience in a recent gig in one of the largest GAE customers and reading stuff like this http://www.agmweb.ca/blog/andy/2286/, I wonder if right now is good idea (ie: reliable) to host a django+gae project like this. I expect low traffic in the first months. Mainly a API-based website with some web front-end.
Or exist any kind of hints so get possible get a reliable operation using django + gae? I'm using App engine Helper, but could switch to another implementation if is more rock solid.
From my experience it seems that Django needs a bit of effort to get working correctly, and using it on AppEngine is a bit different to how you would use it otherwise. I suggest considering the possibility of using a different framework.
Personally, I suggest Tipfy as it was built specifically for App Engine, but there are quite a few frameworks I haven't even tried but have heard great things about.
IIRC the problem with Django poisoning instances due to exceeding the deadline has been solved.
I am developing location based service. FYI, the database will expand vastly as time and location are the variables. I am considering GAE for initial deployment. I am open for any of python or java based development. While calculating the scalability, I am getting confused. I never thought of scalability before as I haven't worked on big projects. Also I am considering the fact that may be I will have to change hosting in near future for more flexibility.
Considering this situation, what should I start with? Struts2? or Django? Will there be a big difference in terms of development time?
Do you know already know Java or Python? If you are proficient in one and not the other, you might want to use what you know. If you are unfamiliar with both, and particularly programming in general, I think Python would be much easier to learn. But this is very subjective.
GAE is a good platform for some applications. If you are, for example, frequently reporting a location from a mobile device (like a phone), I think GAE would be a good fit. But I would not use django to handle such requests; Instead use the 'lightest' possible framework to record the data (probably webapp (Python) or the low-level datastore API (java)).
Keep in mind the limitations on queries in GAE. There are no JOINS, you'll need to denormalize. You can use inequality filters on one property at a time, so for proximity queries you'll need a technique like GeoBoxes. If you can work around those limitations, App Engine has a lot to offer.
I have never created a high traffic site so I have no idea what the best long term plan is. There is no room for a dedicated server in the budget. I'm currently using VPS hosting for my current site. I was going to stick with VPS and migrate grails. I looked at Django and python hosting plans (which look cheaper than VPS plans) from fatcow.com for example. Which is a better investment, grails through VPS hosting or django through a standard python hosting plan? Which would have better performance in short and in long term?
The front end of the application is javafx, and the backend will be a REST interface.
I went through the same process as you too before deciding to use django. I am a Java programmer during the day and I want to have a pet project that I can make during my spare time. So I got myself a VPS with the cheapest plan available. I installed Java webserver and deploy a Grails app, but it turns out that it needs a bigger memory. Then I realized that Java webapp needs a large memory to get running. So I went to look for a non-Java framework. I didn't have much criteria at that time other than it can run smooth on my current VPS plan.
I took a look at django and I was amazed that:
It is so simple and easy to get started. It only creates small numbers of file (compared to Grails)
It has many built-in feature that Grails doesn't have:
RSS feed framework
Commenting system
The admin system (you gonna love it, it's like scaffolding only better)
And many other webby features that takes time to create
It needs less memory to get started, but it can also scale really well
Other than that you're just going to compare Groovy and Python. If you're a Java programmer you're going to love Groovy syntax as it is really close to Java. But python is a good language too (despite that many people don't like it's syntax).
If you want to use JavaFX as the front-end, then you can use django just to return JSON data or XML data, and you can do this easily because it has a built-in serializer to do this.
So all in all the criteria drills down to what you need and what you already know.
I would stick with Django. Django and Grails are quite similar, but I prefer Python over Groovy. Python's development cycle is just less tedious than Groovy's. The Python console is e.g. started immediately, while the Groovy console can take over a second to load. That's just a small issue, but waiting a second many times gets frustrating in the end.
There is a Grails App Engine plugin that does not use hibernate.
http://www.grails.org/plugin/app-engine
Personally, I think the choice comes down to which language you like the most. If you are a Java/JSP developer, you'll probably like Grails better. However, if you are already quite proficient in Python then that is the better choice.
Here are some resources that might help you evaluate Grails.
http://grails.org/Success+Stories
http://www.pubbs.net/grails/200908/12877
Python is already well established and mature. There are plenty of resources and it is certainly a good choice, if you are a Python fan.
Have you looked at Google AppEngine? You can run Django there, and it's a good cheap way to start.
I haven't seen any performance comparisons between CPython and Jython, but I do know that Django runs on the latest version of Jython now. This also allows you the flexibility of being able to rewrite parts of your app later (remember, no premature optimization) in Java or, say, Scala, if you need the speed.
You may want to consider the memory footprint consumed by the app server in the VPS environment. If your VPS is really small (256 mb) then you might run out of memory if you are running the app server + db server.
Groovy's future is debatable. Its creator, James Strachan, has said:
I can honestly say if someone had shown me the Programming in Scala book by by Martin Odersky, Lex Spoon & Bill Venners back in 2003 I'd probably have never created Groovy.
-- http://macstrac.blogspot.com/2009/04/scala-as-long-term-replacement-for.html
My 2 cents: go with Python & Django. Skip Scala. Seriously consider Lisp.
I have my first app, not that big, but it is the first step. (next big one on the way)
Now if I want to put it on my own Linode VPS, I have to configure mod_python or mod_wsgi, as well as memcache, Ngix, mySQL or Postgresql, etc. to make it work. If I put it GAE, All I have to do is convert the models to use GAE's API.
What I like about GAE is scaling. (if they can really do it)
Then I'd only worry about developing my apps and doing SEO work on them instead of worrying about load share/balance, cache, db / IO redundancy, etc.
I don't want to do any porting later on. (I have to decide now and stick with it)
So, if you have any experience on this, what do you recommend:
1- Use VPS(s) for everthing
2- Use VPS(s) plus Amazon S3
3- Use VPS(s) plus Amazon S3 & SimpleDB
4- Use GAE
Also: Would I be able to get away with not having JOIN rights when using the BigTable?
Note: I don't have any spatial need now, but for a location table I might need that later on.
I'd like to know what do you think!
There's business risk and technical risk.
Business risk is that you might have to move hosts later for some external reason. VPS's, EC2, etc require more upfront investment, but keep you independent. Tools like Chef can help with the configuration effort.
Technical risk is that your application may not be easily implemented on the platform. Since most VPS options allow you to install arbitrary software, they minimize this, again at the cost of more configuration effort on your part. AFAIK, the largest constraint GAE enforces on you is it's difficult to do long running background tasks. (Working without JOINs and other aspects of de-normalized data requires a different way of thinking, but this approach is fairly common in web applications no matter where they run once the SQL database is larger than a single host can support.)
If you can live with both these risks, GAE would appear to save you a substantial amount of effort. If you cannot live with these risks, you should tailor your own environment.
As an aside, I find S3 to be worth it no matter your environment. It's far simpler than ensuring your local server static file storage is reliably backed up, and you never have to worry about capacity. It's best if you use it for data that is uploaded but rarely overwritten or deleted (think facebook photo albums).
I don't want to do any porting later on. (I have to decide now and stick with it)
If that's the case, wouldn't you prefer to control deployment from the outset? It could be a great pain to port back from GAE later down the line if you hit its limits (whether they be technological limits or simply business decisions by Google that run counter to your plans for the future of your app).
Also configuring mod_wsgi, installing postgres etc. isn't particularly difficult, and you don't have to worry about things like load balancing and db redundancy for a while yet.
If it were me, I'd prefer the long-term certainty of a traditional server over the quick win of GAE. It all depends on your vision for the app, however.
I may be biased, but if you can live with GAE's limitations it really saves you a lot of work and worry about system administration issues (and to some extent scaling) -- plus, it's free as long as your resource consumption is low (basically meaning your traffic is low).
Can you do without joins? I don't know, as I don't know your app -- I'm a SQL fanatic, myself, yet for simple enough needs I haven't found it too hard to adapt. As I see it, the main limitation of non-relational DBs is that they're nowhere as nice as relational ones for "ad hoc" queries... you typically have to write a lot of procedural code instead of a nice SELECT or two:-(. But, that's more of a "data mining later" issue than one connected with serving your web app -- probably best solved by regularly bulk-downloading data from the web app's online storage to a "data warehouse" kind of setup, anyway, even if such storage was relational in the first place;-).
Before deciding, it might be worth a quick prototype adaptation of your app to GAE. You might run into stoppers that force the decision. Possible stopper issues include
Your schema doesn't make the transition to BigTable
You're depending on some C-based library that GAE doesn't support
You have a few long-running requests that exceed the thresholds that GAE imposes
The answer depends on the complexity and nature of your model layer, really. If it's complex or tightly bound to the rest of your code, porting is likely to be a significant effort. If it's fairly straightforward, or easy to tear out and replace, I would say go for it.
These days, I mostly write new code for GAE, but the fact that I can simply deploy with a single command has really lowered the barrier I feel towards writing cool new apps. Not having to worry about deployment and hosting is quite liberating.
All I have to do is convert the models to use GAE's API.
I am sorry, you are totally mistaken.
You also need to rewrite all the views code that uses the ORM. There are no joins. So you have to deal with and write a lot of procedural code instead of the nifty SQL that provides U whatever you want.
Querying is slow. You need to override save method of each model to store additional information of that model which may take a lot of time to compute when need. You also need to work on memcache to make the queries fast enough.
And then, Guido has said Django 1.1 is going to be included in a future version of Appengine. I am hoping they will have an out of the box generic ORM to BigTable mapper.
That said, if your app is simple without many joins needed, you could use the appengine patch project to use the current version of django on Appengine. Here is how.