Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
We are finishing up our web application and planning for deployment. Very important aspect of deployment to production is monitoring the health of the system. Having a small team of developers/support makes it very critical for us to get the early notifications of potential problems and resolve them before they have impact on users.
Using Nagios seams like a good option, but wanted to get more opinions on what are the best monitoring tools/practices for web application in general and specifically for Django app? Also would welcome recommendations on what should be monitored aside from the obvious CPU, memory, disk space, database connectivity.
Our web app is written in Django, we are running on Linux (Ubuntu) under Apache + Fast CGI with PostgreSQL database.
EDIT
We have a completely virtualized environment under Linode.
EDIT
We are using django-logging so we have a way separate info, errors, critical issues, etc.
Nagios is good, it's good to maybe have system testing (Selenium) running regularily.
Edit: Hyperic and Groundwork also look interesting.
There is probably a test suite system that can keep pressure testing everything as well for you. I can't remember the name off the top of my head, maybe someone can mention one below.
Other things I like to do:
The best motto for infrastructure is always fix, detect, repair. Get it up, get to the root of it, and cure/prevent it if you can.
Since a system exists at many levels, we should test at many levels:
Edit: Have all errors or warnings posted directly to your case manager via email. That way you can track occurrences in one place.
1) Connection : monitor your internet connectivity from the server and from the outside. Log this somewhere
2) Server : monitor all the processes that you need to to ensure they are running and not pinning the server. Use a HP Server or something equivalent with hardware failure notification that it can do from a bios level. Notify and log if they are.
3) Software : Identify the key software that always needs to be running. Set the performance levels if any and then monitor them. Nagios should be able to help with this. On windows it can be a bit more. When an exception occurs, you should be able to run a script from it to restart processes automatically. My dream system is allowing me to interact with servers via SMS if the server sees it as an exception that I have to either permit, or one that will happen automatically unless I cancel by sms. One day..
4) Remote Power : Ensure Remote power-reset capabilities are in your hand. You might want to schedule weekly reboots if you ever use windows for anything.
5) Business Logic Testing : Have regularly running scripts testing the workflow of your system. Selenium can probably achieve some of this, but I like logging the results as well to say this ran at this time and these files had errors. If possible anywhere, have the system monitor itself through your scripts.
6) Backups : Make a backup that you can set and forget. If you can get things into virtual machines it would be ideal as you can scale, move, or deploy any part of your infrastructure anywhere. I have had instances where I moved a dead server onto my laptop, let it run in vmware while I fixed a problem.
Monitoring the number of connections to your Web server and your database is another good thing to track. Chances are if one shoots through the roof, something is starving for resources and the site is about to go down.
Also make sure you have a regular request for a URL that is a reasonable end-to-end test of the system. If your site supports search, then have nagios execute a search - that should make sure the search index is healthy, the Web server and the database server.
Also, make sure that your applications sends you email anytime your users see an error, or there is an unhandled exception. That way you know how the application is failing in the field.
If I had to pick one type of testing it would be to test the end-user functionality of the system. The important thing to consider is the user. While testing things like database availability, server up-time, etc, are all important, testing work-flows through your system via a remote UI testing system covers all these bases. If you know that the critical parts of your system are available to the end-user, then you know your system is prolly Ok.
Identify the important work-flows in your system. For example, if you wrote an eCommerce site you might identify a work-flow of "search for a product, put product in shopping cart, and purchase product".
Prioritize the work-flows, and build out higher-priority tests first. You can always add additional tests after you roll out to production.
Build UI tests using one of the available UI testing frameworks. There are a number of free and commercial UI testing frameworks that can be run in an automated fashion. Build a core set of tests first that address critical work-flows.
Setup at least one remote location from which to run tests. You want to test every aspect of your system, which means testing it remotely. Is the internet connection up? Is the web server running? Is the connection to the database server working? Etc, etc. If you test remotely you make sure you system is available to the outside world which means it is most likely working end-to-end. You can also run these tests internally, but I think it is critical to run them externally.
Make sure your solution includes both reporting and notification. If one of your critical work-flow tests fails, you want someone to know about it to fix the problem ASAP. If a non-critical task fails, perhaps you only want reporting so that you can fix problems out-of-band.
This end-user testing should not eliminate monitoring of system in your data-center, but I want to reiterate that end-user testing is the most important type of testing you can do for a web application.
Ahhh, monitoring. How I love thee and your vibrations at 3am.
Essentially, you need a way to inspect the internal state of your application, both at a specific moment, as well as over spans of time (the latter is very important for detecting problems before they occur). Another way to think of it is as glorified unit-testing.
We have our own (very nice) monitoring system, so I can't comment on Nagios or other apps. Our use case is similar to yours, though (cgi app on apache).
Add a logging.monitor() type method, which will log information to disk. This should support, at the least, logging simple numbers and dicts of numbers (the key=>value association can be incredibly handy).
Have a process that scrapes the monitoring logs and stores them into a database.
Have a process that takes the database information, checks them against rules, and sends out alerts. Keep in mind that somethings can be flaky. Just because you got a 404 once doesn't mean the app it down.
Have a way to mute alerts (very useful for maintenance or to read your email).
Thats all pretty high level. The important thing is that you have a history of the state of the application over time. From this, you can then create rules (perhaps just raw sql queries you put into a config somewhere), that say "If the queries per second doubled, send a SlashDotted alert", or "if 50% of responses are 404, send an alert". It also bedazzles management because you can quantify any comment about whether its up, down, fast, or slow.
Things to monitor include (others probably mentioned these as well): http status, port accessible, http load, database load, open connection, query latency, server accessibility (ssh, ping), queries per second, number of worker processes, error percentage, error rate.
Simple end-to-end tests are also very handy, though they can be brittle. Its best to keep them simple, but you should have one that tries to touch core pieces of the app (caching, database, authentication).
I use Munin and Monit, and have been very happy with both of them.
Internal logging is fine and dandy but when your whole app goes down or your box/enviro crashes you need an outside check too. http://www.pingdom.com/ has been very reliable for me.
My only other advice is I wouldnt spent too much time on this. my best example is twitter, how much energy did they put into the system being able to half-die instead of just investing that time and energy into throwing more hardware / scaling it out.
Chances are what ends up taking you down, your logging and health systems will have missed anyway.
The single most important way to monitor any online site is to monitor externally. The goal should be to monitor your site in a way that most closely reflects how your users use the site. In 99% of cases, as soon as you know that your site is down externally, it's relatively easy to find the root cause. The most important thing is to know as soon as possible that your customers are unable to load your site.
This generally means using an external performance monitoring service. They very from the very low end (mon.itor.us, pingdom) to the high end (Webmetrics, Gomez, Keynote). And as always, you get what you pay for. The things to look for when shopping around for a monitoring service include:
The size and distribution of the monitoring network
Whether or not the monitoring solution is able to monitor your site using a real browser (otherwise you aren't testing your site like a real user would)
The scripting language (to script the transactions against your site)
The support department, to help you along the way, and provide expertise on how to monitor correctly
Good luck!
Web monitoring by IP Patrol or SiteSentry have been useful for us. The second is a bit like site confidence but slightly prettier lol.
Have you thought about monitoring the functionality as well? A script (either in a scripting language like Perl or Pyton or using some tool like WebTest) that talks to your application and does some important steps like logging in, making a purchase, etc is very nice to have.
Aside from what to monitor, which has already been answered, you need to make sure - whatever system you use - that you get only one notification of an error that happens multiple times, on each request. Or your inbox will run out of memory :) Plus, it's plain annoying...
Divide the standby shifts among the support/dev team, so one person does not have to be on call every single evening. That will wear people down. Monitoring is a good thing, but everyone needs to get a chance to have a life once in a while. Your cellphone buzzing at 2AM for a few nights will get very old pretty soon, trust me. And not every developer is used to 24/7 support, so you need to find the balance between using monitoring and abusing monitoring.
Basically, have distinct escalation levels, and if the sky is not falling, define a "serenity now" window at night where smaller escalation levels don't go out.
I've been using Nagios + CruiseControl + Selenium for running high-level tests on mission critical web applications. I got burned pretty hard by a simple jquery error that stopped users from proceding through an online signup form.
http://www.agileatwork.com/the-holy-trinity-of-web-2-0-application-monitoring/
You can take a look at AlertGrid. This web application allows you to filter and forward alerts to your team (worldwide). It has also nice ability to monitor if something did not happen.
To paraphrase Richard Levasseur: ah, monitoring tools, how your imperfections frustrate me. There doesn't seem to be a perfect tool out there; Nagios is pretty easy to set up but the UI is kinda old fashioned and you have to have a daemon running on each server being monitored. Zenoss has a much nicer UI including trend graphs of resource usage, but it uses SNMP so you have to have some familiarity with that to get it working properly, and the documentation is not the best - there are hundreds of pages but it's really hard to find just the info you need to get started.
Friends of mine have also recommended Cacti and Hyperic, but I don't have personal experience with those.
One last thing - one of the other answers suggested running a tool that stresses your site. I wouldn't recommend doing that on your live site unless you have a reliable quiet period when nobody is hitting it; even then you might bring it down unexpectedly. Much better to have a staging server where you can run load tests before putting changes into production.
One of our clients uses Techout (www.techout.com) and is very pleased with the service.
There is no charge for alerts, no matter what kind or how many, and they offer email, voicemail and SMS alerts -- and if something major happens, a phone call from a live person to help you out.
It's all based on service -- you don't install the software and you have a consultant who works with you to determine the best approach for your business. It's one of the most convenient web application monitoring services because they take care of everything.
I would just add that you can predict error likelihood somewhat based on history of past errors and having fixed them. With smaller scale internal testing if you were to graph the frequency and severity of problems that have been corrected to this point you'll have an overview of predictable new problems. If everything has been running error free for some time now, then the two sources of trouble would be recent changes or scalability issues.
From the above it sounds like scalability is your only worry, but I just mention the past-error frequency test because the teams I've been on invariably think they got the last error fixed and there are no more. Until there is.
Changing the line a little bit, something I really think is useful and changed a lot how I monitor my apps is to log javascript exceptions somewhere. There's a very nice implementation that logs that directly from user browsers to Google Analytics.
This is a must for Javascript centered web applications, and can give you results based directly on users browsers what can lead to very unexpected errors (iE and mobile browser are pain)
Disclaimer: My post bellow
http://www.directperformance.com.br/en/javascript-debug-simples-com-google-analytics
For the internet presence monitoring, I would suggest the service that I am working on: Sucuri NBIM (Network-based integrity monitor).
It does availability and integrity checks, looking for changes on your internet presence (sites, DNS, WHOIS, headers, etc) and loss of connectivity. It is free and you can try it out here.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
this question is not about which tools to use to do load testing ...
It is a more general question about what should be tested.
background:
years ago, most enterprise web application used to be designed that front end and backend are messed together, and any kinds of business logic is done at server side. while front end's responsibility is only to show the UI.
when load testing such application, whether it is Oracle EBS or SAP's things, or application built using framework like Struts.
What we do is to:
1. design a user test case. (like login, search , put sth to shopping car, then check log)
2. write testing script to archive such test case (underline, it's pure http call to server side )
3. using LT tool to generate lots of virtual user together to run such test.
(it will just behave like that there are lots of user do the same business logic together)
While now , the whole thing changed after after front end and backend totally separated.
Lot of business logic can be put in front end, and server end only has separate restful web service...
What i found is that we can't do load testing like before.
Just because user's test case can't be mapped to http communication. As some task is totally done in front end...
BUT in such new world, how to do performance testing? Just performance test the JSON restful web service? How about the static resource transferred from service side to browser ?
I need suggestions from these who done lot of performance/load testing for such JSON restful web service application..
Your load testing should not be limited to backend only. Well-behaved load test needs to represent real user sitting behind real browser as close as possible including (but not limited to):
HTTP Headers like User-Agent which identifies the browser, Accept-Encoding which is used for content compression, etc.
HTTP Cookies which are being used for authentication purposes and tracking user session
Caching mechanisms - browsers use to download images, scripts and styles which are embedded in the webpage and store them locally so on subsequent requests these "heavy" entities won't be re-downloaded but rather returned from browser's disk cache
AJAX requests which are asynchronous JavaScript-driven calls which are being used for updating page content on the fly without reloading the full page. The tricky point is that AJAX requests are being executed in parallel (i.e. one main requests "spawns" several parallel AJAX requests) and the majority of load testing tools cannot handle this situation
So your test target needs to be frontend, not the backend as backend-only testing doesn't tell the full story and you can miss something important. See How to make JMeter behave more like a real browser article for more detailed explanation and example configuration applicable to Apache JMeter free and open source load testing tool (however the approach should be tool-agnostic, you should follow the same steps no matter of what load testing tool is being used under the hood)
I agree that it gets harder to test "realistic" traffic patterns with rich clients. It is also the case that as more things move onto the client side, scalability testing becomes more about testing the backend API.
Like Dmitry writes, testing the whole content delivery system is always useful, but given the choice between testing your own API and the CDN that delivers your images, I'd say that testing the API is much more important. That is where you'll most likely find scalability issues.
We have realized that a reason many don't load test at all is because it is simply "too much work" and requirements for realistic testing are a major contributing factor to this. We've written a "developer-centric load testing" manifesto, soon to be released, that argues that "Simple tests are better than no tests at all". May seem obvious, but still most people don't load test at all today. For this reason, we advocate starting out very simple, with small "unit load tests" that hit a single API end point. Write a couple of those really simple tests and start running them regularly (ideally in nightly builds on a CI system), just exercising API end points individually and see how many requests per second (or whatever metrics you want to measure) they can take.
These simple tests are going to provide you with a lot of value for little effort - the 80/20 rule in action. They test the most relevant parts of the whole system, and any smoke you see coming out as a result of this testing will likely tell you where your system is going to blow up when put under "real" (as in "realistic") pressure. You should also be able to detect most performance regressions as an added bonus. Finally, these small "unit load tests" are easier to maintain due to their simplicity; they don't get obsolete as fast as a complex/realistic test does, and if they do, they're easier to update.
When you have a "unit load test" test suite that provides value, you can add complexity and realism iteratively as you go, until the ROI on your time spent becomes too low to motivate any further effort. There is really no difference between this approach and the standard approach for building a regular unit test suite.
Apologies for the rather open nature of the question, but I think its a very valuable area of discussion.
Following the recent AWS outage and the huge number of horror stories that followed it, I was really impressed by the Chaos Monkey 'technique' applied by Netflix (one of the few to survive pretty much without a scratch.
For those who don't know the concept, it is essentially a little bot that goes around your infrastructure, causing chaos along the way, as a way of continuously testing resilience.
Besides Jeff Atwood's Chaos Monkey post I've been able to find little on this being employed anywhere else.
Whilst I appreciate that good test-driven development is a solid foundation, I think that this would be a great addition to the arsenal of any company/organisation that wants to stay up.
Has anyone else approached this topic before?
Are there particular areas other than connectivity and security vulnerability that you would see such a piece of code hitting?
Any other thoughts/feelings on this approach?
There are several tests you could do to stress your system. I like to use apache bench to load test a page that writes to the database. I test it both for number of hits and concurrent users
500 concurrent users making a total of 5000 requests
$ ab -n 5000 -c 500 url
I know my webserver can stand up to this, but I found a problem with how I was logging information. You could point that a different aspects of your site.
If you use caching you could clear the cache in the middle of the testing to see that everything recovers quickly.
If you can replicate your server in a VM, change amount of RAM, unmount a hard disk, run out of disk space, disconnect network interface, etc.
You could try to brute force a password and make sure your system only allows n login attempts before rate limiting that user.
is there someone who has expirience in hosting django applications on ep.io?
Waht are the pros/cons on it?
I'm currently using ep.io, I'm still in development with my app but I have an app deployed and running.
When you use a service like this you go into it knowing that it isn't going to be the perfect solution for every case. Knowing the pros and cons before hand will help set your expectations so that you aren't disappointed later on.
ep.io is still very young and I believe still in beta, and isn't available to the general public. To be totally fair to them, it is still a work in progress and some of these pros and cons may change as they roll out new features. I will try and come back and update this post as the new versions become available, and my experience with the service continues.
So far I am really pleased with what they have, they took the most annoying part of developing an application and made it better. If you have a simple blog app, it should be a breeze to deploy it, and probably not cost that much to host.
Pros:
Server Management: You don't have to worry about your server setup at all, it handles everything for you. With a VPS, you would need to worry about making sure the server is up to date with security patches, and all that fun stuff, with this, you don't worry about anything, they take care of all that for you.
deployment: It makes deploying an app and having it up and running really quickly. deploying a new version of an app is a piece of cake, I just need to run one maybe two commands, and it handles everything for me.
Pricing: you are only charged for what you use, so if you have a very low traffic website, it might not cost you anything at all.
Scaling: They handle scaling and load balancing for you out of the box, no need for you to worry about that. You still need to write your application so that it can scale efficiently, but if you do, they will handle the rest.
Background tasks: They have support for cronjobs as well as background workers using celery.
Customer support: I had a few questions, sent them an email, and had an answer really fast, they have been great, so much better then I would have expected. If you run your own VPS, you really don't have anyone to talk to, so this is a major plus.
Cons:
DB access: You don't have direct access to the database, you can get to the psql shell, but you can't connect an external client gui. This makes doing somethings a little more difficult or slow. But you can still use the django admin or fixtures to do a lot of things.
Limited services available: It currently only supports Postgresql and redis, so if you want to use MySQL, memcached, mongodb,etc you are out of luck.
low level c libs: You can't install any dependencies that you want, similar to google app engine, they have some of the common c libs installed already, and if you want something different that isn't already installed you will need to contact them to get it added. http://www.ep.io/docs/runtime/#python-libraries
email: You can't send or recieve email, which means you will need to depend on a 3rd party for that, which is probably good practice anyway, but it just means more money.
file system: You have a more limited file system available to you, and because of the distributed nature of the system you will need to be very careful when working from files. You can't (unless i missed it) connect to your account via (s)ftp to upload files, you will need to connect via the ep.io command line tool and either do an rsync or a push of a repo to get files up there.
Update: for more info see my blog post on my experiences with ep.io : http://kencochrane.net/blog/2011/04/my-experiences-with-epio/
Update: Epio closed down on May 31st 2012
Sorry for the very involved question, but this is something I've been researching for a while now and it is really frustrating me. I feel like in today's age we have a million and one ways to implement services tat are cross-platform (SOAP) and easy to build (thanks to .NET, java, and other frameworks). However, these technologies have been in the community for 5-10 years, but we are (or at least I am) constantly plagued with the same issues:
Identification (Tracking services) - UDDI; e.g., had to remind a co-worker the 3 times this month where a service is at, despite the fact there is a wiki that discusses the service and a PDF version of the same documentation that lives in a repository where we keep our service docs.
Scalability - Out of the box clustering; As organizations, we spend a lot of money on paying our admins just to watch the utilization of our services and make decisions like, does this service need more RAM, more CPU, more interfaces? How do I load balance this?
Monitoring - error logging, etc; I can't count how many times I have to set up tracing on services in order to see why a bug is happening that only seems to affect one customer, or have to code logic into the service to serialize exceptions, log exceptions to dbs, fail gracefully, etc.
Deployment - easy to deploy; none of this deploying DLLs to 5 load balanced servers
Each one of these problems requires some type of custom solution implemented by the organization. Documentation and UDDIs for #1. Virtualization and load balancing hardware / software for #2. Tracing, writing exceptions to databases / logs, etc for #3. Custom deployment software for #4. I work for a mid-sized organization. I can't even imagine how a company the size of Sun, Google, or Microsoft would tackle these dilemmas.
Maybe my vision is unrealistic, but I dream of having a Framework per se that lives on top of a server cluster that manages all of the above. I was ecstatic to read about Microsoft's AppFabric since it really seems to extend some of the functionality of BizTalk to WCF service implementors: Caching, Hosting, Monitoring, etc. However, from what I've seen, I still don't feel it lives up to my dream for an all-in-one solution that assists the developer and organization in writing services that are scaled across clusters easily, deployed into the cluster easily, and identifiable, possibly even version-able.
So, I don't mean this post to be about my dream. I do actually have a question. For starters, is my dream / want completely unrealistic? Furthermore, what solutions are there available that attempt to solve these problems without confining us to a new and more proprietary way (BizTalk) of developing services? An lastly, in concern to a complete SOA / ESB solution, where do we see the most potential in the market right now or in the future?
I think that you are talking about different kinds of problems here.
1). Developers who don't read documentation. This is an endemic problem, not limited to SOA - just look at some questions on StackOverflow. At least the developer is asking you whether there is a service, rather then just duplicating logic in their own code. I don't see any technical solution to these kinds of problems, you've already provided good registries and documentation, but some developers prefer to talk to people. Maybe, even, this is actually a good thing - human interaction has value above the technical content of the interaction. Or maybe, you're too nice: "No, I won't answer that question, look it up."
2). Scaling. There are technologies addressing this issue. (Disclaimer I work for IBM, who sell some, so I'll reference these - I'm not intending to imply that IBM are the only vendor with solutions in this space.) There are products such as this that can provision a new machine, install a software stack and add it to a cluster to address workload changes. Then at a finer grained level of control in the Java EE world the Application Server can dynamically shape traffic and adjust clusters. See WebSphere Virtual Enterprise
3). Monitoring. I don't "get" what you expect here. In all likelyhood such tricky bugs will require application level trace. For some problems such as finding memory leaks and performance bottlenecks there are very good tools, at least in the Java EE world.
4). I can't speak to the .Net world, but I'd say that Java EE app servers do a reasonable job of deploying the apps across clusters smoothly, and in the cases where we use JNI and need DLLs deploying then we can use products such as the Tivoli stack I mention to manage this.
So, in summary, I do think that vendors are trying to address these issues. And I don't think your life would be simpler without SOA. Imagine instead the same problems applied to myriad separate, independent applications.
Here's my two cents.
I've been a developer at a company that used SOA incorrectly. The worst solution they implemented was field level validation of form elements on a desktop app using SOA. To perform acceptably these require very low latency. A 2-4 second wait to change to a new field gets old fast. The service ran over the network on a biztalk server. Everyone hated it.
If you're going to do this you really need to spend a lot of time dealing with network latency, service failure, timing, and timeout issues.
Don't get carried away and think SOA is the solution to every problem. Used at a high level it's great, used at a low level it makes your applications fragile, slow, and impossible to debug.
If you talk to IBM or one of the big SOA vendors, they got a products that cover each scenario.
Identification (Tracking services) - UDDI; e.g., had to remind a co-worker the 3 times this month where a service is at, despite the fact there is a wiki that discusses the service and a PDF version of the same documentation that lives in a repository where we keep our service docs.
Registry and Repository server. Nice thing is that it does governance (promotion, demotion, versioning, approval) and your ESB typically does a "lookup" for the latest and greatest against the register server.
Scalability - Out of the box clustering; As organizations, we spend a lot of money on paying our admins just to watch the utilization of our services and make decisions like, does this service need more RAM, more CPU, more interfaces? How do I load balance this?
Transaction monitoring software like IBM Tivoli Composite Application Manager for SOA. Basically, it tracks things from a horizontal point of view and to see if there is a service disruption from a end user/end app point of view.
As far as your clustering.... you have to pick good middleware and architecture. Personally speaking, get stuff that is "cloud" ready. App Servers with NoSQL connected by MOM.
Monitoring - error logging, etc; I can't count how many times I have to set up tracing on services in order to see why a bug is happening that only seems to affect one customer, or have to code logic into the service to serialize exceptions, log exceptions to dbs, fail gracefully, etc.
Enterprise standards for your developers and for your vendors. Integration of all business and system events into a single dashboard. (Most companies spilt them). This is done already at most enterprise shops.
Deployment - easy to deploy; none of this deploying DLLs to 5 load balanced servers
Ahh.. Microsoft IIS Web Deployment Tool 2.0. You can sync 100s of MS servers by just updating the master. It's really easy.
I’m thinking about building an offline-enabled web application.
The architecture I’m considering is as follows:
Web server (remote) <--> Web server/cache (local) <--> Browser/Prism
The advantages I envision for this model are:
Deployment is web-based, with all the advantages of this approach
Offline-enabled
UI (html/js) synchronization is a non-issue
Data synchronization can be mostly automated
as long as I stay within a RESTful paradigm
I can break this as required but manual synchronization would largely remain surgical
The local web server is started as a service; I can run arbitrary code, including behind-the-scene data synchronization
I have complete control of the data (location, no size limit, no possibility of user deleting unknowingly)
Prism with an extension could allow to keep the javascript closed source
Any thoughts on this architecture? Why should I / shouldn’t I use it? I'm particularly looking for success/horror stories.
The long version
Notes:
Users are not very computer-literate.
For instance, even superficially
explaining how Gears works is totally
out of the question.
I WILL be held liable if data is loss, even if it’s really the users fault (short of him deleting random directories on his machine)
I can require users to install something on their machine. It doesn’t have to be 100% web-based and/or run in a sandbox
The common solutions to this problem don’t feel adequate somehow. Here is a short analysis of each.
Gears/HTML5:
no control over data, can be deleted
by users without any warning
no
control over location of data (not
uniform across browsers and
platforms)
users need to open application in browser for synchronization to happen; no automatic, behind-the-scene synchronization
different browsers are treated differently, no uniform view of data on a single machine
limited disk space available
synchronization is completely manual, sql-based storage makes this a pain (would be less complicated if sql tables were completely replicated but it’s not so in my case). This is a very complex problem.
my code would be almost completely open sourced (html/js)
Adobe AIR:
some of the above
no server-side includes (!)
can run in the background, but not windowless
manual synchronization
web caching seems complicated
feels like a kludge somehow, I’ve had trouble installing on some machines
My requirements are:
Web-based (must). For a number of
reasons, sharing data between users
for instance.
Offline (must). The application must be fully usable offline (w/ some rare exceptions).
Quick development (must). I’m a single developer going against players with far more business resources.
Closed source (nice to have). Yes, I understand the open source model. However, at this point I don’t want competitors to copy me too easily. Again, they have more resources so they could take my hard work and make it better in less time than I could myself. Obviously, they can still copy me developing their own code -- that is fine.
Horror stories from a CRM product:
If your application is heavily used, storing a complete copy of its data on a user's machine is unfeasible.
If your application features data that can be updated by many users, replication is not simple. If three users with local changes synch, who wins?
In reality, this isn't really what users want. They want real-time access to the most current data from anywhere. We had better luck offering a mobile interface to a single source of truth.
The part about running the local Web server as a service appears unwise. Besides the fact that you are tied to certain operating environments that are available in the client, you are also imposing an additional burden of managing the server, on the end user. Additionally, the local Web server itself cannot be deployed in a Web-based model.
All in all, I am not too thrilled by the prospect of a real "local Web server". There is a certain bias to it, no doubt since I have proposed embedded Web servers that run inside a Web browser as part of my proposal for seamless off-line Web storage. See BITSY 0.5.0 (http://www.oracle.com/technology/tech/feeds/spec/bitsy.html)
I wonder how essential your requirement to prevent data loss at any cost is. What happens when you are offline and the disk crashes? Or there is a loss of device? In general, you want the local cache to be the least farther ahead of the server, but be prepared to tolerate loss of data to the extent that the server is behind the client. This may involve some amount of contractual negotiation or training. In practice this may not be a deal-breaker.
The only way to do this reliably is to offer some sort of "check out and lock" at the record level. When a user is going remote they must check out the records they want to work with. This check out copied the data to a local DB and prevents the record in the central DB from being modified while the record is checked out.
When the roaming user reconnects and check their locked records back in the data is updated on the central DB and unlocked.