Offline web application - offline

I’m thinking about building an offline-enabled web application.
The architecture I’m considering is as follows:
Web server (remote) <--> Web server/cache (local) <--> Browser/Prism
The advantages I envision for this model are:
Deployment is web-based, with all the advantages of this approach
Offline-enabled
UI (html/js) synchronization is a non-issue
Data synchronization can be mostly automated
as long as I stay within a RESTful paradigm
I can break this as required but manual synchronization would largely remain surgical
The local web server is started as a service; I can run arbitrary code, including behind-the-scene data synchronization
I have complete control of the data (location, no size limit, no possibility of user deleting unknowingly)
Prism with an extension could allow to keep the javascript closed source
Any thoughts on this architecture? Why should I / shouldn’t I use it? I'm particularly looking for success/horror stories.
The long version
Notes:
Users are not very computer-literate.
For instance, even superficially
explaining how Gears works is totally
out of the question.
I WILL be held liable if data is loss, even if it’s really the users fault (short of him deleting random directories on his machine)
I can require users to install something on their machine. It doesn’t have to be 100% web-based and/or run in a sandbox
The common solutions to this problem don’t feel adequate somehow. Here is a short analysis of each.
Gears/HTML5:
no control over data, can be deleted
by users without any warning
no
control over location of data (not
uniform across browsers and
platforms)
users need to open application in browser for synchronization to happen; no automatic, behind-the-scene synchronization
different browsers are treated differently, no uniform view of data on a single machine
limited disk space available
synchronization is completely manual, sql-based storage makes this a pain (would be less complicated if sql tables were completely replicated but it’s not so in my case). This is a very complex problem.
my code would be almost completely open sourced (html/js)
Adobe AIR:
some of the above
no server-side includes (!)
can run in the background, but not windowless
manual synchronization
web caching seems complicated
feels like a kludge somehow, I’ve had trouble installing on some machines
My requirements are:
Web-based (must). For a number of
reasons, sharing data between users
for instance.
Offline (must). The application must be fully usable offline (w/ some rare exceptions).
Quick development (must). I’m a single developer going against players with far more business resources.
Closed source (nice to have). Yes, I understand the open source model. However, at this point I don’t want competitors to copy me too easily. Again, they have more resources so they could take my hard work and make it better in less time than I could myself. Obviously, they can still copy me developing their own code -- that is fine.

Horror stories from a CRM product:
If your application is heavily used, storing a complete copy of its data on a user's machine is unfeasible.
If your application features data that can be updated by many users, replication is not simple. If three users with local changes synch, who wins?
In reality, this isn't really what users want. They want real-time access to the most current data from anywhere. We had better luck offering a mobile interface to a single source of truth.

The part about running the local Web server as a service appears unwise. Besides the fact that you are tied to certain operating environments that are available in the client, you are also imposing an additional burden of managing the server, on the end user. Additionally, the local Web server itself cannot be deployed in a Web-based model.
All in all, I am not too thrilled by the prospect of a real "local Web server". There is a certain bias to it, no doubt since I have proposed embedded Web servers that run inside a Web browser as part of my proposal for seamless off-line Web storage. See BITSY 0.5.0 (http://www.oracle.com/technology/tech/feeds/spec/bitsy.html)
I wonder how essential your requirement to prevent data loss at any cost is. What happens when you are offline and the disk crashes? Or there is a loss of device? In general, you want the local cache to be the least farther ahead of the server, but be prepared to tolerate loss of data to the extent that the server is behind the client. This may involve some amount of contractual negotiation or training. In practice this may not be a deal-breaker.

The only way to do this reliably is to offer some sort of "check out and lock" at the record level. When a user is going remote they must check out the records they want to work with. This check out copied the data to a local DB and prevents the record in the central DB from being modified while the record is checked out.
When the roaming user reconnects and check their locked records back in the data is updated on the central DB and unlocked.

Related

Pitfalls with local in memory cache invalidated using RabbitMQ

I have a java web server and am currently using the Guava library to handle my in-memory caching, which I use heavily. I now need to expand to multiple servers (2+) for failover and load balancing. In the process, I switched from a in-process cache to Memcache (external service) instead. However, I'm not terribly impressed with the results, as now for nearly every call, I have to make an external call to another server, which is significantly slower than the in-memory cache.
I'm thinking instead of getting the data from Memcache, I could keep using a local cache on each server, and use RabbitMQ to notify the other servers when their caches need to be updated. So if one server makes a change to the underlying data, it would also broadcast a message to all other servers telling them their cache is now invalid. Every server is both broadcasting and listening for cache invalidation messages.
Does anyone know any potential pitfalls of this approach? I'm a little nervous because I can't find anyone else that is doing this in production. The only problems I see would be that each server needs more memory (in-memory cache), and it might take a little longer for any given server to get the updated data. Anything else?
I am a little bit confused about your problem here, so I am going to restate in a way that makes sense to me, then answer my version of your question. Please feel free to comment if I am not in line with what you are thinking.
You have a web application that uses a process-local memory cache for data. You want to expand to multiple nodes and keep this same structure for your program, rather than rely upon a 3rd party tool (memcached, Couchbase, Redis) with built-in cache replication. So, you are thinking about rolling your own using RabbitMQ to publish the changes out to the various nodes so they can update the local cache accordingly.
My initial reaction is that what you want to do is best done by rolling over to one of the above-mentioned tools. In addition to the obvious development and rigorous testing involved, Couchbase, Memcached, and Redis were all designed to solve the problem that you have.
Also, in theory you would run out of available memory in your application nodes as you scale horizontally, and then you will really have a mess. Once you get to the point when this limitation makes your app infeasible, you will end up using one of the tools anyway at which point all your hard work to design a custom solution will be for naught.
The only exceptions to this I can think of are if your app is heavily compute-intensive and does not use much memory. In this case, I think a RabbitMQ-based solution is easy, but you would need to have some sort of procedure in place to synchronize the cache between the servers on occasion, should messages be missed in RMQ. You would also need a way to handle node startup and shutdown.
Edit
In consideration of your statement in the comments that you are seeing access times in the hundreds of milliseconds, I'm going to advise that you first examine your setup. Typical read times for a single item in the cache from a Memcached (or Couchbase, or Redis, etc.) instance are sub-millisecond (somewhere around .1 milliseconds if I remember correctly), so your "problem child" of a cache server is several orders of magnitude from where it should be in terms of performance. Start there, then see if you still have the same problem.
We're using something similar for data which is read-only and doesn't require updated every time. I'm in doubt, that this is good plan for you. Just imagine you should have one more additional service on each instance, which will monitor queue, and process change to in-memory storage. This is very hard to test.
Are you sure that most of the time is spent on communication between your servers? Maybe you run multiple calls?

using SQLite in Django in production?

Sorry for this question, I dont know if i've understood the concept, but SQLite is Serverless, this means the database in in a local machine, and it's stored in one file, this file is only accessible on one mode: if one client reads it, it's made only for reading mode for other clients, and if a client writes, then all clients have the write mode, so only in one mode at once!
so imagine that i've made a django application, a blog for example; then how is this made using sqlite? since if a client enters to the blog he gots the reading mode to see the page and the blog entries, and if a registred client tries to add a comment then the file will be made as write mode, so how can sqlite handle this?
so, does SQLite is here just like the BaseHTTPServer (the server shipped with django), for testing and learning purpose?
Different databases manage concurrency in different ways, but in sqlite, the method used is a global database-level lock. Only one thread or process can make changes to a sqlite database at a time; all other, concurrent processes will be forced to wait until the currently running process has finished.
As your number of users grows; sqlite's simple locking strategy will lead to increasingly great lock contention, and you will need to migrate your data to another database, such as MySQL (Which can do row level locking, at least with InnoDB engine) or PostgreSQL (Which uses Multiversion Concurrency Control). If you anticipate that you will get a substantial number of users (on the level of say, more than 1 request per second for a good part of the day), you should migrate off of sqlite; and the sooner you do so, the easier it will be.
SQLite is not like BaseHTTPServer or anything basic like that. It's a fully featured embedded database. Quite fast too. Its SQL language might not have the most bells and whistles, but it's flexible enough. I haven't run into cases where I needed something it cannot do for the projects I was involved in (which aren't your typical web apps, truth be told).
Anyone that claims SQLite is good or bad for production without discussing the actual design is not telling you much. SQLite is pretty fast. In some cases, literally orders of magnitude faster than, say, Postgres, which comes up as a go-to alternative among Djangonauts. As someone pointed out, it also supports lots of concurrency. It's a matter of whether your app falls under the 'some cases' or not.
Now, there is one significant factor that has to be taken into account. SQLite is an in-process database. This is really important. If you are using something like gevent, you may run into edge cases where your app breaks. E.g., trying to do a transaction where you have a context switch in middle of it can possibly break the transaction in horrible ways. In other words, 'concurrency' really depends on your app, because SQLite is part of your app.
What you can't do with SQLite, though, in terms of scaling, is you can't make clusters of SQLite servers like you can with some of the other database engines, because it's in-process. Your app may or may not need to go to such lengths in terms of scaling, but my guess is that vast majority of apps out there don't anyway (wild guess).
On the other hand, being in-process means adding custom functions and aggregates to it is pretty trivial. I'm not sure if Django's ORM makes that any more difficult than it has to be, but you can come up with pretty good designs taking advantage of those features.
This issue in database theory is called concurrency and SQLite does support it in Windows versions > Win98 and elsewhere according to the FAQ:
http://www.sqlite.org/faq.html#q5
We are aware of no other embedded SQL database engine that supports as
much concurrency as SQLite. SQLite allows multiple processes to have
the database file open at once, and for multiple processes to read the
database at once. When any process wants to write, it must lock the
entire database file for the duration of its update. But that normally
only takes a few milliseconds. Other processes just wait on the writer
to finish then continue about their business. Other embedded SQL
database engines typically only allow a single process to connect to
the database at once.
Basically, do not worry about concurrency, any database worth its salt takes care of just fine. More information on as how SQLite3 manages this can be found here. You, as a developer, not a database designer, needn't care about it unless you are interested in the inner-workings.
SQLite will only work effectively in production from some specific situations. It's quite easy to get MySQL or PostgreSQL up and running, even on Windows, and have a database that works in most situations.
The real problem is that SQLite3 isn't threaded in Django so only one PAGE view can happen at a time on your server, see this bug https://code.djangoproject.com/ticket/12118 Fixed
I don't use SQLite3 even in development.
EDIT: I keep getting downvoted here but the Django documentation itself recommended not using SQLite3 in Production at the time I wrote this answer. The documentation still contains the following caveat:
SQLite provides an excellent development alternative for applications that are predominantly read-only or require a smaller installation footprint.
If you do not have a small foot print/read-only Django instance, do NOT use SQLite3. Feel free to continue to downvote this answer.
It is not impossible to use Django with Sqlite as database in production, primarily depending on your website/webapp traffic and how hard you hit your db (alongside what kind of operations you perform on it i.e. reads/writes/etc). In fact, approaching end of 2019, I have used it in several low volume applications with less than 5k daily interactions (these are more common than you might think).
Simply put for the current state of tech , at the moment Sqlite-3 supports unlimited concurrent reads (or as far as your machine / workers can handle), BUT only a single process can write to it at any point in time. Bear in mind, a well designed query/ops to the db will last only miliseconds!
Coming from experience in using sqlite as the only db for simple non-routine (by non-routine i mean that a typical user would not be using this app on a daily basis year-round) production web app for overseas job matching that deal with ~5000 registered students (stats show consistently less than 2k requests per day that involves hitting the database during peak season - 40% write 60% read), I've had no problems whatsoever with timeouts/performance issues.
It really boils down to being pragmatic about the development and the URS (client spec). If it becomes the next unicorn , one can always migrate the SQLITE to another RDBMS. For instance, see David d C e Freitas's take on migration in Quick easy way to migrate SQLite3 to MySQL?
Additionally the SQLITE website uses sqlite db at its backend .. see below...
The SQLite website (https://www.sqlite.org/) uses SQLite itself, of course, and as of this writing (2015) it handles about 400K to 500K HTTP requests per day, about 15-20% of which are dynamic pages touching the database. Dynamic content uses about 200 SQL statements per webpage. This setup runs on a single VM that shares a physical server with 23 others and yet still keeps the load average below 0.1 most of the time.
Bear in mind that the above quote is of course mainly referring to read operations, so the values may not be a applicable for write-heavy sites.
The example I gave above on the job matching application I built using sqlite as db is quite write heavy if you've noticed the numbers ... on average, 40% are short lived write operations (i.e. form submissions, etc etc) but bear in mind my volume hitting the db is only 2k per day during peak season.
Then again, if you realize that your sqlite.db is causing alot of timeout and bad user experience (408 !!! on form submission...), especially with Django throwing the OperationalError: database is locked error. (and then they have to key in the whole thing again)...You can always increase the timeout in your settings.py as per django docs as a temporary solution while you prepare for migrating the db.
'OPTIONS': {
# ...
'timeout': 20,
# ...
}
Again, it all boils down to pragmatic development and facing reality that the site may not attract as much activity as hoped , and is prone to over-engineering from the get-go.
There are many times that going for a simple solution enables faster time to market , essentially, to quickly test waters , and of course, be prepared If the piranhas do come in swarms and then its time to upgrade to another RDBMS.
With Django's ORM, for most cases you dont need to touch your models.py during migration to other supported sql db. Be VERY mindfull though that Sqlite does not support some more advanced functions or even fields that its bigger cousins MYSQL and POSTGRES do.
Late to the party, but the question is still relavant as of mid 2018.
"Client" of a blog site is a different term that a "database client". SQLite documentation refers to a client as a process opening a database file. Such process, say a django app, may handle many web app clients ("users") simultaneously and it still is going to be just one client from the standpoint of SQLiite.
The important consideration for choosing SQLite over proper RDBMS is whether your architecture is comprised of more than one software component connecting to a database. In such case, using SQLite may be a major performance bottleneck due to the fact that each app needs to access the same DB file, possibly over a network.
If multiple apps(database clients) is not the case, SQLite is a great production choice in 99% of cases. The remaining 1% is apps using specific DB features, apps under enormous load, etc.
Know your architecture.
The anwer to this question depends on the application that you want to deploy in production:
According to the how to use from the SQLite website, SQLite works great in production as the database engine for most website having low to medium traffic (which is to say, most websites).
They argue that the amount of web traffic that SQLite can handle depends on how heavily you use the database of your website. It is known that any site that gets fewer than 100K hits/day should work fine with SQLite. However, this 100K hits/day figure is a conservative estimate, not a hard upper bound.
In summary, SQLite might be a great choice for applications with fewer users and databases uses. Thus, use SQLite for website with fewer or medium interactions with the database and MySQL or PostgreSQL for website with higher interactions with the database.
Reference: sqlite.org

Web application monitoring best practices [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
We are finishing up our web application and planning for deployment. Very important aspect of deployment to production is monitoring the health of the system. Having a small team of developers/support makes it very critical for us to get the early notifications of potential problems and resolve them before they have impact on users.
Using Nagios seams like a good option, but wanted to get more opinions on what are the best monitoring tools/practices for web application in general and specifically for Django app? Also would welcome recommendations on what should be monitored aside from the obvious CPU, memory, disk space, database connectivity.
Our web app is written in Django, we are running on Linux (Ubuntu) under Apache + Fast CGI with PostgreSQL database.
EDIT
We have a completely virtualized environment under Linode.
EDIT
We are using django-logging so we have a way separate info, errors, critical issues, etc.
Nagios is good, it's good to maybe have system testing (Selenium) running regularily.
Edit: Hyperic and Groundwork also look interesting.
There is probably a test suite system that can keep pressure testing everything as well for you. I can't remember the name off the top of my head, maybe someone can mention one below.
Other things I like to do:
The best motto for infrastructure is always fix, detect, repair. Get it up, get to the root of it, and cure/prevent it if you can.
Since a system exists at many levels, we should test at many levels:
Edit: Have all errors or warnings posted directly to your case manager via email. That way you can track occurrences in one place.
1) Connection : monitor your internet connectivity from the server and from the outside. Log this somewhere
2) Server : monitor all the processes that you need to to ensure they are running and not pinning the server. Use a HP Server or something equivalent with hardware failure notification that it can do from a bios level. Notify and log if they are.
3) Software : Identify the key software that always needs to be running. Set the performance levels if any and then monitor them. Nagios should be able to help with this. On windows it can be a bit more. When an exception occurs, you should be able to run a script from it to restart processes automatically. My dream system is allowing me to interact with servers via SMS if the server sees it as an exception that I have to either permit, or one that will happen automatically unless I cancel by sms. One day..
4) Remote Power : Ensure Remote power-reset capabilities are in your hand. You might want to schedule weekly reboots if you ever use windows for anything.
5) Business Logic Testing : Have regularly running scripts testing the workflow of your system. Selenium can probably achieve some of this, but I like logging the results as well to say this ran at this time and these files had errors. If possible anywhere, have the system monitor itself through your scripts.
6) Backups : Make a backup that you can set and forget. If you can get things into virtual machines it would be ideal as you can scale, move, or deploy any part of your infrastructure anywhere. I have had instances where I moved a dead server onto my laptop, let it run in vmware while I fixed a problem.
Monitoring the number of connections to your Web server and your database is another good thing to track. Chances are if one shoots through the roof, something is starving for resources and the site is about to go down.
Also make sure you have a regular request for a URL that is a reasonable end-to-end test of the system. If your site supports search, then have nagios execute a search - that should make sure the search index is healthy, the Web server and the database server.
Also, make sure that your applications sends you email anytime your users see an error, or there is an unhandled exception. That way you know how the application is failing in the field.
If I had to pick one type of testing it would be to test the end-user functionality of the system. The important thing to consider is the user. While testing things like database availability, server up-time, etc, are all important, testing work-flows through your system via a remote UI testing system covers all these bases. If you know that the critical parts of your system are available to the end-user, then you know your system is prolly Ok.
Identify the important work-flows in your system. For example, if you wrote an eCommerce site you might identify a work-flow of "search for a product, put product in shopping cart, and purchase product".
Prioritize the work-flows, and build out higher-priority tests first. You can always add additional tests after you roll out to production.
Build UI tests using one of the available UI testing frameworks. There are a number of free and commercial UI testing frameworks that can be run in an automated fashion. Build a core set of tests first that address critical work-flows.
Setup at least one remote location from which to run tests. You want to test every aspect of your system, which means testing it remotely. Is the internet connection up? Is the web server running? Is the connection to the database server working? Etc, etc. If you test remotely you make sure you system is available to the outside world which means it is most likely working end-to-end. You can also run these tests internally, but I think it is critical to run them externally.
Make sure your solution includes both reporting and notification. If one of your critical work-flow tests fails, you want someone to know about it to fix the problem ASAP. If a non-critical task fails, perhaps you only want reporting so that you can fix problems out-of-band.
This end-user testing should not eliminate monitoring of system in your data-center, but I want to reiterate that end-user testing is the most important type of testing you can do for a web application.
Ahhh, monitoring. How I love thee and your vibrations at 3am.
Essentially, you need a way to inspect the internal state of your application, both at a specific moment, as well as over spans of time (the latter is very important for detecting problems before they occur). Another way to think of it is as glorified unit-testing.
We have our own (very nice) monitoring system, so I can't comment on Nagios or other apps. Our use case is similar to yours, though (cgi app on apache).
Add a logging.monitor() type method, which will log information to disk. This should support, at the least, logging simple numbers and dicts of numbers (the key=>value association can be incredibly handy).
Have a process that scrapes the monitoring logs and stores them into a database.
Have a process that takes the database information, checks them against rules, and sends out alerts. Keep in mind that somethings can be flaky. Just because you got a 404 once doesn't mean the app it down.
Have a way to mute alerts (very useful for maintenance or to read your email).
Thats all pretty high level. The important thing is that you have a history of the state of the application over time. From this, you can then create rules (perhaps just raw sql queries you put into a config somewhere), that say "If the queries per second doubled, send a SlashDotted alert", or "if 50% of responses are 404, send an alert". It also bedazzles management because you can quantify any comment about whether its up, down, fast, or slow.
Things to monitor include (others probably mentioned these as well): http status, port accessible, http load, database load, open connection, query latency, server accessibility (ssh, ping), queries per second, number of worker processes, error percentage, error rate.
Simple end-to-end tests are also very handy, though they can be brittle. Its best to keep them simple, but you should have one that tries to touch core pieces of the app (caching, database, authentication).
I use Munin and Monit, and have been very happy with both of them.
Internal logging is fine and dandy but when your whole app goes down or your box/enviro crashes you need an outside check too. http://www.pingdom.com/ has been very reliable for me.
My only other advice is I wouldnt spent too much time on this. my best example is twitter, how much energy did they put into the system being able to half-die instead of just investing that time and energy into throwing more hardware / scaling it out.
Chances are what ends up taking you down, your logging and health systems will have missed anyway.
The single most important way to monitor any online site is to monitor externally. The goal should be to monitor your site in a way that most closely reflects how your users use the site. In 99% of cases, as soon as you know that your site is down externally, it's relatively easy to find the root cause. The most important thing is to know as soon as possible that your customers are unable to load your site.
This generally means using an external performance monitoring service. They very from the very low end (mon.itor.us, pingdom) to the high end (Webmetrics, Gomez, Keynote). And as always, you get what you pay for. The things to look for when shopping around for a monitoring service include:
The size and distribution of the monitoring network
Whether or not the monitoring solution is able to monitor your site using a real browser (otherwise you aren't testing your site like a real user would)
The scripting language (to script the transactions against your site)
The support department, to help you along the way, and provide expertise on how to monitor correctly
Good luck!
Web monitoring by IP Patrol or SiteSentry have been useful for us. The second is a bit like site confidence but slightly prettier lol.
Have you thought about monitoring the functionality as well? A script (either in a scripting language like Perl or Pyton or using some tool like WebTest) that talks to your application and does some important steps like logging in, making a purchase, etc is very nice to have.
Aside from what to monitor, which has already been answered, you need to make sure - whatever system you use - that you get only one notification of an error that happens multiple times, on each request. Or your inbox will run out of memory :) Plus, it's plain annoying...
Divide the standby shifts among the support/dev team, so one person does not have to be on call every single evening. That will wear people down. Monitoring is a good thing, but everyone needs to get a chance to have a life once in a while. Your cellphone buzzing at 2AM for a few nights will get very old pretty soon, trust me. And not every developer is used to 24/7 support, so you need to find the balance between using monitoring and abusing monitoring.
Basically, have distinct escalation levels, and if the sky is not falling, define a "serenity now" window at night where smaller escalation levels don't go out.
I've been using Nagios + CruiseControl + Selenium for running high-level tests on mission critical web applications. I got burned pretty hard by a simple jquery error that stopped users from proceding through an online signup form.
http://www.agileatwork.com/the-holy-trinity-of-web-2-0-application-monitoring/
You can take a look at AlertGrid. This web application allows you to filter and forward alerts to your team (worldwide). It has also nice ability to monitor if something did not happen.
To paraphrase Richard Levasseur: ah, monitoring tools, how your imperfections frustrate me. There doesn't seem to be a perfect tool out there; Nagios is pretty easy to set up but the UI is kinda old fashioned and you have to have a daemon running on each server being monitored. Zenoss has a much nicer UI including trend graphs of resource usage, but it uses SNMP so you have to have some familiarity with that to get it working properly, and the documentation is not the best - there are hundreds of pages but it's really hard to find just the info you need to get started.
Friends of mine have also recommended Cacti and Hyperic, but I don't have personal experience with those.
One last thing - one of the other answers suggested running a tool that stresses your site. I wouldn't recommend doing that on your live site unless you have a reliable quiet period when nobody is hitting it; even then you might bring it down unexpectedly. Much better to have a staging server where you can run load tests before putting changes into production.
One of our clients uses Techout (www.techout.com) and is very pleased with the service.
There is no charge for alerts, no matter what kind or how many, and they offer email, voicemail and SMS alerts -- and if something major happens, a phone call from a live person to help you out.
It's all based on service -- you don't install the software and you have a consultant who works with you to determine the best approach for your business. It's one of the most convenient web application monitoring services because they take care of everything.
I would just add that you can predict error likelihood somewhat based on history of past errors and having fixed them. With smaller scale internal testing if you were to graph the frequency and severity of problems that have been corrected to this point you'll have an overview of predictable new problems. If everything has been running error free for some time now, then the two sources of trouble would be recent changes or scalability issues.
From the above it sounds like scalability is your only worry, but I just mention the past-error frequency test because the teams I've been on invariably think they got the last error fixed and there are no more. Until there is.
Changing the line a little bit, something I really think is useful and changed a lot how I monitor my apps is to log javascript exceptions somewhere. There's a very nice implementation that logs that directly from user browsers to Google Analytics.
This is a must for Javascript centered web applications, and can give you results based directly on users browsers what can lead to very unexpected errors (iE and mobile browser are pain)
Disclaimer: My post bellow
http://www.directperformance.com.br/en/javascript-debug-simples-com-google-analytics
For the internet presence monitoring, I would suggest the service that I am working on: Sucuri NBIM (Network-based integrity monitor).
It does availability and integrity checks, looking for changes on your internet presence (sites, DNS, WHOIS, headers, etc) and loss of connectivity. It is free and you can try it out here.

Django -- I have a small app ready, Should I go on private VPS or Google App Engine?

I have my first app, not that big, but it is the first step. (next big one on the way)
Now if I want to put it on my own Linode VPS, I have to configure mod_python or mod_wsgi, as well as memcache, Ngix, mySQL or Postgresql, etc. to make it work. If I put it GAE, All I have to do is convert the models to use GAE's API.
What I like about GAE is scaling. (if they can really do it)
Then I'd only worry about developing my apps and doing SEO work on them instead of worrying about load share/balance, cache, db / IO redundancy, etc.
I don't want to do any porting later on. (I have to decide now and stick with it)
So, if you have any experience on this, what do you recommend:
1- Use VPS(s) for everthing
2- Use VPS(s) plus Amazon S3
3- Use VPS(s) plus Amazon S3 & SimpleDB
4- Use GAE
Also: Would I be able to get away with not having JOIN rights when using the BigTable?
Note: I don't have any spatial need now, but for a location table I might need that later on.
I'd like to know what do you think!
There's business risk and technical risk.
Business risk is that you might have to move hosts later for some external reason. VPS's, EC2, etc require more upfront investment, but keep you independent. Tools like Chef can help with the configuration effort.
Technical risk is that your application may not be easily implemented on the platform. Since most VPS options allow you to install arbitrary software, they minimize this, again at the cost of more configuration effort on your part. AFAIK, the largest constraint GAE enforces on you is it's difficult to do long running background tasks. (Working without JOINs and other aspects of de-normalized data requires a different way of thinking, but this approach is fairly common in web applications no matter where they run once the SQL database is larger than a single host can support.)
If you can live with both these risks, GAE would appear to save you a substantial amount of effort. If you cannot live with these risks, you should tailor your own environment.
As an aside, I find S3 to be worth it no matter your environment. It's far simpler than ensuring your local server static file storage is reliably backed up, and you never have to worry about capacity. It's best if you use it for data that is uploaded but rarely overwritten or deleted (think facebook photo albums).
I don't want to do any porting later on. (I have to decide now and stick with it)
If that's the case, wouldn't you prefer to control deployment from the outset? It could be a great pain to port back from GAE later down the line if you hit its limits (whether they be technological limits or simply business decisions by Google that run counter to your plans for the future of your app).
Also configuring mod_wsgi, installing postgres etc. isn't particularly difficult, and you don't have to worry about things like load balancing and db redundancy for a while yet.
If it were me, I'd prefer the long-term certainty of a traditional server over the quick win of GAE. It all depends on your vision for the app, however.
I may be biased, but if you can live with GAE's limitations it really saves you a lot of work and worry about system administration issues (and to some extent scaling) -- plus, it's free as long as your resource consumption is low (basically meaning your traffic is low).
Can you do without joins? I don't know, as I don't know your app -- I'm a SQL fanatic, myself, yet for simple enough needs I haven't found it too hard to adapt. As I see it, the main limitation of non-relational DBs is that they're nowhere as nice as relational ones for "ad hoc" queries... you typically have to write a lot of procedural code instead of a nice SELECT or two:-(. But, that's more of a "data mining later" issue than one connected with serving your web app -- probably best solved by regularly bulk-downloading data from the web app's online storage to a "data warehouse" kind of setup, anyway, even if such storage was relational in the first place;-).
Before deciding, it might be worth a quick prototype adaptation of your app to GAE. You might run into stoppers that force the decision. Possible stopper issues include
Your schema doesn't make the transition to BigTable
You're depending on some C-based library that GAE doesn't support
You have a few long-running requests that exceed the thresholds that GAE imposes
The answer depends on the complexity and nature of your model layer, really. If it's complex or tightly bound to the rest of your code, porting is likely to be a significant effort. If it's fairly straightforward, or easy to tear out and replace, I would say go for it.
These days, I mostly write new code for GAE, but the fact that I can simply deploy with a single command has really lowered the barrier I feel towards writing cool new apps. Not having to worry about deployment and hosting is quite liberating.
All I have to do is convert the models to use GAE's API.
I am sorry, you are totally mistaken.
You also need to rewrite all the views code that uses the ORM. There are no joins. So you have to deal with and write a lot of procedural code instead of the nifty SQL that provides U whatever you want.
Querying is slow. You need to override save method of each model to store additional information of that model which may take a lot of time to compute when need. You also need to work on memcache to make the queries fast enough.
And then, Guido has said Django 1.1 is going to be included in a future version of Appengine. I am hoping they will have an out of the box generic ORM to BigTable mapper.
That said, if your app is simple without many joins needed, you could use the appengine patch project to use the current version of django on Appengine. Here is how.

Difference between frontend, backend, and middleware in web development

I was wondering if anyone can compare/contrast the differences between frontend, backend, and middleware ("middle-end"?) succinctly.
Are there cases where they overlap?
Are there cases where they MUST overlap, and frontend/backend cannot be separated?
In terms of bottlenecks, which end is associated with which type of bottlenecks?
Here is one breakdown:
Front-end tier -> User Interface layer usually consisting of a mix of HTML, Javascript, CSS, Flash, and various server-side code like ASP.Net, classic ASP, PHP, etc. Think of this as being closest to the user in terms of code.
Middleware, middle-tier -> One tier back, generally referred to as the "plumbing" part of a system. Java and C# are common languages for writing this part that could be viewed as the glue between the UI and the data and can be webservices or WCF components or other SOA components possibly.
Back-end tier -> Databases and other data stores are generally at this level. Oracle, MS-SQL, MySQL, SAP, and various off-the-shelf pieces of software come to mind for this piece of software that is the final processing of the data.
Overlap can exist between any of these as you could have everything poured into one layer like an ASP.Net website that uses the built-in AJAX functionality that generates Javascript while the code behind may contain database commands making the code behind contain both middle and back-end tiers. Alternatively, one could use VBScript to act as all the layers using ADO objects and merging all three tiers into one.
Similarly, taking middleware and either front or back-end can be combined in some cases.
Bottlenecks generally have a few different levels to them:
1) Database or back-end processing -> This can vary from payroll or sales or other tasks where the throughput to the database is bogging things down.
2) Middleware bottlenecks -> This would be where some web service may be hitting capacity but the front and back ends have bandwidth to handle more traffic. Alternatively, there may be some server that is part of a system that isn't quite the UI part or the raw data that can be a bottleneck using something like Biztalk or MSMQ.
3) Front-end bottlenecks -> This could client or server-side issues. For example, if you took a low-end PC and had it load a web page that consisted of a lot of data being downloaded, the client could be where the bottleneck is. Similarly, the server could be queuing up requests if it is getting hammered with requests like what Amazon.com or other high-traffic websites may get at times.
Some of this is subject to interpretation, so it isn't perfect by any means and YMMV.
EDIT: Something to consider is that some systems can have multiple front-ends or back-ends. For example, a content management system will likely have a way for site visitors to view the content that is a front-end but what about how content editors are able to change the data on the site? The ability to pull up this data could be seen as front-end since it is a UI component or it could be seen as a back-end since it is used by internal users rather than the general public viewing the site. Thus, there is something to be said for context here.
Generally speaking, people refer to an application's presentation layer as its front end, its persistence layer (database, usually) as the back end, and anything between as middle tier. This set of ideas is often referred to as 3-tier architecture. They let you separate your application into more easily comprehensible (and testable!) chunks; you can also reuse lower-tier code more easily in higher tiers.
Which code is part of which tier is somewhat subjective; graphic designers tend to think of everything that isn't presentation as the back end, database people think of everything in front of the database as the front end, and so on.
Not all applications need to be separated out this way, though. It's certainly more work to have 3 separate sub-projects than it is to just open index.php and get cracking; depending on (1) how long you expect to have to maintain the app (2) how complex you expect the app to get, you may want to forgo the complexity.
There are in fact 3 questions in your question :
Define frontend, middle and back end
How and when do they overlap ?
Their associated usual bottlenecks.
What JB King has described is correct, but it is a particular, simple version, where in fact he mapped front, middle and bacn to an MVC layer.
He mapped M to the back, V to the front, and C to the middle.
For many people, it is just fine, since they come from the ugly world where even MVC was not applied, and you could have direct DB calls in a view.
However in real, complex web applications, you indeed have two or three different layers, called front, middle and back. Each of them may have an associated database and a controller.
The front-end will be visible by the end-user. It should not be confused with the front-office, which is the UI for parameters and administration of the front. The front-end will usually be some kind of CMS or e-commerce Platform (Magento, etc.)
The middle-end is not compulsory and is where the business logics is. It will be based on a PIM, a MDM tool, or some kind of custom database where you enrich your produts or your articles (for CMS). It'll also be the place where you code business functions that need to be shared between differents frontends (for instance between the PC frontend and the API-based mobile application). Sometimes, an ESB or tool like ActiveMQ will be your middle-end
The back-end will be a 3rd layer, surrouding your source database or your ERP. It may be jsut the API wrting to and reading from your ERP. It may be your supplier DB, if you are doing e-commerce. In fact, it really depends on web projects, but it is always a central repository. It'll be accessed either through a DB call, through an API, or an Hibernate layer, or a full-featured back-end application
This description means that answering the other 2 questions is not possible in this thread, as bottlenecks really depend on what your 3 ends contain : what JB King wrote remains true for simple MVC architectures
at the time the question was asked (5 years ago), maybe the MVC pattern was not yet so widely adopted. Now, there is absolutely no reason why the MVC pattern would not be followed and a view would be tied to DB calls.
If you read the question "Are there cases where they MUST overlap, and frontend/backend cannot be separated?" in a broader sense, with 3 different components, then there times when the 3 layers architecture is useless of course. Think of a simple personal blog, you'll not need to pull external data or poll RabbitMQ queues.
Here is a real world example which shows front/mid/back end.
General description:
Frontend is responsible for presenting data to user. Please note interesting quirk that you may have two different front ends associated with single backend
Backend provides business logic/data persistence.
Middleware (activemq in the picture) is responsible for system to system. integration between backends. Usually it is installed as separate application
Overlapping:
It is possible to have overlapping between frontend and backend. This usually leaads to long-term issues with application maintenance and scalability. Fairly common in legacy applications.
Most modern technology stacks encourage developers to have strict separation. For example in the picture you can see that backend of the first system has rest web service which is a clear separation line.
Bottlenecks
Most bottlenecks in large are caused by database/network. Databases are located in backend. As for network issues every connection goes through netowrk, so every connection has potential for being slow. With good application design these issues are avoidable to large extend.
In terms of networking and security, the Backend is by far the most (should be) secure node.
The middle-end portion, usually being a web server, will be somewhat in the wild and cut off in many respects from a company's network. The middle-end node is usually placed in the DMZ and segmented from the network with firewall settings. Most of the server-side code parsing of web pages is handled on the middle-end web server.
Getting to the backend means going through the middle-end, which has a carefully crafted set of rules allowing/disallowing access to the vital nummies which are stored on the database (backend) server.
Frontend refers to the client-side, whereas backend refers to the server-side of the application. Both are crucial to web development, but their roles, responsibilities and the environments they work in are totally different. Frontend is basically what users see whereas backend is how everything works
Frontend -> these are the client side of a website from where a user can interact with the server through User Interface. generally built using Html and CSS.
Middleware -> Middleware are the software or service which is responsible for the system to communicate and manage the data. it handles the communication between components and input/output
Backend -> Backend are the server side of any application which consist of all functioning and operations performed on data. this part is considered to be most essential part of any application. Only the server admin have access to this. it mainly consist of database and servers.