Hi I'm writing a web application using Django. I'm still learning the framework and reading the howto book. I know I might be asking this question prematurely however i'd really like to know. I want to create a python data structure in memory that is shared across all the sessions. What would be the best and most scalable way to perform this. So far I have read about redis however I would like to more flexibility and understand redis can only store strings instead of python objects..
This post is partially close to what you want (excluding the java part and the later update on the post). The summary of the answer is that django is a muti-process environment, and thus sharing objects across sessions is not feasible. One option is to use the database for storing such shared objects.
Related
In my Visual C++ application, I want to allocate a lot of objects, which will use up all available memory in the system. To solve this problem, I decide to store the objects in database. I just have 3 candidates: MySQL, PostgreSQL, and SQLite. But don’t know which one is more appropriate.
What I need is:
Store objects in the database instead of memory.
Fast to find the objects via a key.
Light-weight so the RDBMS will not require a lot of system resources, including both the memory and disk spaces.
No server required.
Easy to deploy.
Which one should be best for my needs? Of course, if you have any other better alternatives, then just tell me.
SQLite provides a detailed doc how when it should be used. But MySQL and PostgreSQL does not so it is a little difficult to choose as I am not familiar with these two. Thanks.
I'd use SQLite. It doesn't require a service and is cross platform. It is easy to deploy and is light-weight. It supports transaction. It's in the public domain.
Your questions:
Store objects in the database instead of memory.
Any database can do this, that's the definition of a database.
Fast to find the objects via a key.
Also standard functionality, if you can't find your data, what's the point of using a database.
Light-weight so the RDBMS will not require a lot of system
resources, including both the memory and disk spaces.
That's mostly in your hands, bad queries generate a lot of overhead. No matter what brand of database (or software language) you use.
No server required.
Do you mean "hardware" or "client-server model" ? Both MySQL and PostgreSQL are services in a client-server model. SQLite works best for a single client.
Easy to deploy.
All 3 databases are easy to deploy, but SQLite is the easiest one. It's not a server like the others.
It looks like SQLite is the best fit, but also check your other requirements, the ones you didn't mention: performance, reliability, backup, failover, etc. etc. And do you needs an RDBMS for this kind of work? A C++ object in memory is very different from a bunch of records in a couple of databases that can be accessed by using SQL.
I need to design and implement a Java web application that can be used by multiple users at the same time. The data that is handled by this application is going to be huge and may take about 5 minutes for a page to display the results(database records).
I had designed this application using HTML, Servlets and JSP. But when two users would try to get the records, only one user was able to view the results while the other faced an error.
I always thought a web application would take care of handling multiple users but this is not the case.
Any insights on this would be highly appreciated.
Thanks.
I always thought a web application would take care of handling multiple users but this is not the case.
They do if they're written correctly. Obviously yours is not. That's all we can tell you unless you give more information, most importantly details of the error shown to the second user.
One possibility is that everything is OK on the web layer but your DB access for the first user causes an exclusive lock so that the second user cannot access the data at the same time. This could be fixed by using non-exclusive read locks. How to do that depends mainly on what DB you're using.
Getting concurrency right requires you to choose the correct tools and use them correctly. It doesn't just happen magically because it's a web app.
What are are using to develop this web-application? If you are developing it in your own way from the start I must say you are trying to re-invent the same wheel which has been already created and enhanced by very solid frameworks.
I suggest you analyze your requirements thoroughly and study some available frameworks. Let them handle the things like multi threading and other aspects in the best possible manner.
Handling multiple request at a time is a container work and as an application developer we have to concentrate how we are handling and processing those requret being forwarded by the container.
I must suggest you to get some insight how web-application work and how request -response cycle happens
I am relatively new to Django and this is a more general 'concept' question.
For a client I need to construct an expansive database holding data returned from a series of questionnaires as well as some basic biological data. The idea is to move away from the traditional tools (i.e. Microsoft Access) and manage the data in a mysql database using a basic CRUD interface. Initially the project doesn't need to live on the web, but the next phase will to be to have a centralized db with login and admin page.
I have started building the db with Django models which is great, and I want to use the Django admin for the management of the data.
My question is: Is this a good use of Django? Is there anything I should consider before relying on django for the whole process? And is it advisable to us the Django runserver for db admin on a client's local machine (before we get to the web phase).
Any advice would be much appreciated.
Actually, your description sounds exactly like the sort of thing for which Django is an ideal solution. It sounds more complex and customized than a CMS, and if it's as straightforward as your description then the ORM is definitely a good tool for this. Then again, this sounds exactly like an appserver-ready problem, so Rails, Express for Node.js, or even ChicagoBoss (if you're brave) would be good platforms for this kind of application.
And sure, Django is solid enough you can run it with the test server for local clients before you go whole-hog and run the thing on the web. For that, though, I recommend Apache/mod_wsgi, and if you're going to be fault tolerant there are diamond architectures (one front end proxy with monitoring failover, two or more appserver machines, one database with hot spare) and more complex (see: sharding) architectural layouts you can approach later.
If you're going to run it in a client's local setting, and you're not running Windows, I recommend looking into the screen program. It will allow you to detach the running job into the background while making diagnostics accessible in an ongoing fashion.
I'm looking to build my first web service and I would like to be pointed in the right direction right from the start.
Here are the interactions that must take place:
First, a smartphone or computer will send a chunk of data to my web service.
The web service will persist the information to the database.
Periodically, an algorithm will access and modify the database.
The algorithm will periodically bundle data and send it out to smartphones or computer (how?)
The big question is: What basic things do I need to learn to in order to implement something like this?
Now here are the little rambling questions that I've also got rolling around in my head. Feel free to answer these if you wish. (...maybe consider them extra credit?)
I've heard a lot of good things about RESTful services, I've read the wiki article, and I've even played around with the Twitter's webservice which is RESTful. Is this the obvious way to go? Or should I still consider something else?
What programming language do I use to persist things to the database? I'm thinking php will be the first choice for this.
What programming language do I use to interact with the database? I'm thinking anything is probably acceptable, right?
Do I have to worry about concurrent access to the database, or does MySQL handle that for me? (I'm fairly new to databases too.)
How on earth do I send information back? Obviously, if it's a reply to an HTTP request that's no problem, but there will be times when the reply may take quite a long time to compute. Should I just let the HTTP request remain outstanding until I have the answer?
There will be times when I need to send information to a smartphone regardless of whether or not information has been sent to me. How can I push information to my users?
Other information that may help you know where I'm coming from:
I am pretty familiar with Java, C#, C++, and Python. I have played around with PHP, Javascript, and Ruby.
I am relatively new to databasing, but I get the basic idea.
I've already set up my server and I'm using a basic LAMP stack. My understanding of L, A, M and P is fairly rudimentary.
Language: Python for it's ease of use assuming the GIL is not a particular concern for your requirements (e.g. multi-threading). It has drivers for most databases and supports numerous protocols. There are several web frameworks for it - the most popular probably being Django.
Protocols: if you are HTTP focused study SOAP and REST. Note, SOAP tends to be verbose, which causes problems moving volumes of data. On the other hand, if you are looking at other options study socket programming and perhaps some sort of binary format such as Google's protocol buffers. Flash is also a possibility (see: Flash Remoting). Note, binary options require users install something onto their machine (e.g. applet or standalone app).
Replies: if the process is long running, and the client should be notified when it's done, I would recommend developing an app for the client. Browser's can be programmed with JavaScript to periodically poll, or a Flash movie can be embedded to real time updates, but these are somewhat tricky bits of browser programming. If you're dealing with wireless phones, look at SMS. Otherwise I would just provide a way for clients to get status, but not send out notification (e.g. push vs. pull). As #jcomea_ictx wrote, AJAX is an option if it's a browser based solution - study jQuery.
Concurrency: understand what ACID means with regards to databases. Think about what should happen if you receive multiple writes to the same data - database may not necessarily solve this problem the way you'd want.
Please, for the love of programming, don't use PHP if you're already comfortable with Python. The latter makes for far cleaner, more maintainable code. Not that it's impossible to write good code in PHP, but it's a relative rarity. You can use Python for all the server-end stuff including MySQL interaction, with the MySQLdb module. Either with standard CGI, or FCGI, or mod_python.
As for the database, use of transactions will eliminate conflicts. But you can usually design a system in such a way that conflicts will not happen. For example, use of auto-incrementing primary-key IDs on each insert will make sure that every entry is unique.
You can "pull" data with Javascript, perhaps using AJAX methodology, or "push" using SMS or other technologies.
When replies take a while to compute, you can "poll" using AJAX. This is a very common technique. The server just returns "we are working on this" (or equivalent) with a built-in refresh until the results are ready.
I'm no expert on REST, but AJAX, especially when using polling rather than simply responding to user input, can be said to violate RESTful principles. But you can be a purist, or you can do whatever works. It's up to you.
I don't believe I've ever used any "push" technologies other than SMS, and that was years ago when many companies had free SMS gateways. So if you want to "push" data, better hope someone else joins in the conversation!
Use Java. The latest version of Java EE 6 makes coding RESTful and SOAP services a breeze, and it interoperates with databases very easily as well.
The benefits of using a true language instead of a script are: full server-side state, strong typing, multithreading, and countless other things that may or may not come in handy, but knowing they are available makes your project future proof.
I have my first app, not that big, but it is the first step. (next big one on the way)
Now if I want to put it on my own Linode VPS, I have to configure mod_python or mod_wsgi, as well as memcache, Ngix, mySQL or Postgresql, etc. to make it work. If I put it GAE, All I have to do is convert the models to use GAE's API.
What I like about GAE is scaling. (if they can really do it)
Then I'd only worry about developing my apps and doing SEO work on them instead of worrying about load share/balance, cache, db / IO redundancy, etc.
I don't want to do any porting later on. (I have to decide now and stick with it)
So, if you have any experience on this, what do you recommend:
1- Use VPS(s) for everthing
2- Use VPS(s) plus Amazon S3
3- Use VPS(s) plus Amazon S3 & SimpleDB
4- Use GAE
Also: Would I be able to get away with not having JOIN rights when using the BigTable?
Note: I don't have any spatial need now, but for a location table I might need that later on.
I'd like to know what do you think!
There's business risk and technical risk.
Business risk is that you might have to move hosts later for some external reason. VPS's, EC2, etc require more upfront investment, but keep you independent. Tools like Chef can help with the configuration effort.
Technical risk is that your application may not be easily implemented on the platform. Since most VPS options allow you to install arbitrary software, they minimize this, again at the cost of more configuration effort on your part. AFAIK, the largest constraint GAE enforces on you is it's difficult to do long running background tasks. (Working without JOINs and other aspects of de-normalized data requires a different way of thinking, but this approach is fairly common in web applications no matter where they run once the SQL database is larger than a single host can support.)
If you can live with both these risks, GAE would appear to save you a substantial amount of effort. If you cannot live with these risks, you should tailor your own environment.
As an aside, I find S3 to be worth it no matter your environment. It's far simpler than ensuring your local server static file storage is reliably backed up, and you never have to worry about capacity. It's best if you use it for data that is uploaded but rarely overwritten or deleted (think facebook photo albums).
I don't want to do any porting later on. (I have to decide now and stick with it)
If that's the case, wouldn't you prefer to control deployment from the outset? It could be a great pain to port back from GAE later down the line if you hit its limits (whether they be technological limits or simply business decisions by Google that run counter to your plans for the future of your app).
Also configuring mod_wsgi, installing postgres etc. isn't particularly difficult, and you don't have to worry about things like load balancing and db redundancy for a while yet.
If it were me, I'd prefer the long-term certainty of a traditional server over the quick win of GAE. It all depends on your vision for the app, however.
I may be biased, but if you can live with GAE's limitations it really saves you a lot of work and worry about system administration issues (and to some extent scaling) -- plus, it's free as long as your resource consumption is low (basically meaning your traffic is low).
Can you do without joins? I don't know, as I don't know your app -- I'm a SQL fanatic, myself, yet for simple enough needs I haven't found it too hard to adapt. As I see it, the main limitation of non-relational DBs is that they're nowhere as nice as relational ones for "ad hoc" queries... you typically have to write a lot of procedural code instead of a nice SELECT or two:-(. But, that's more of a "data mining later" issue than one connected with serving your web app -- probably best solved by regularly bulk-downloading data from the web app's online storage to a "data warehouse" kind of setup, anyway, even if such storage was relational in the first place;-).
Before deciding, it might be worth a quick prototype adaptation of your app to GAE. You might run into stoppers that force the decision. Possible stopper issues include
Your schema doesn't make the transition to BigTable
You're depending on some C-based library that GAE doesn't support
You have a few long-running requests that exceed the thresholds that GAE imposes
The answer depends on the complexity and nature of your model layer, really. If it's complex or tightly bound to the rest of your code, porting is likely to be a significant effort. If it's fairly straightforward, or easy to tear out and replace, I would say go for it.
These days, I mostly write new code for GAE, but the fact that I can simply deploy with a single command has really lowered the barrier I feel towards writing cool new apps. Not having to worry about deployment and hosting is quite liberating.
All I have to do is convert the models to use GAE's API.
I am sorry, you are totally mistaken.
You also need to rewrite all the views code that uses the ORM. There are no joins. So you have to deal with and write a lot of procedural code instead of the nifty SQL that provides U whatever you want.
Querying is slow. You need to override save method of each model to store additional information of that model which may take a lot of time to compute when need. You also need to work on memcache to make the queries fast enough.
And then, Guido has said Django 1.1 is going to be included in a future version of Appengine. I am hoping they will have an out of the box generic ORM to BigTable mapper.
That said, if your app is simple without many joins needed, you could use the appengine patch project to use the current version of django on Appengine. Here is how.