How may seconds may an API call take until I need to worry? - django

I have to ask a more or less non-typical SO question and hope you don't mind. Right now I am developing my very first web application. I did set up an AJAX function that requests some data from a third party API and populates my html containers with the returned data.
Right now I query one single object and populate 3 html containers with around 15 lines of Javascript code. When i activate the process/function by clicking a button on my frontend, it needs around 6-7 seconds until the html content is updated.
Is this a reasonable time? The user experience honestly will be more than bad considering that I will have to query and manipulate far more data (I build a one-site dashboard related to soccer data).
There might be very controversal answers to that question, but what would be a fair enough time for the process to run using standard infrastructure? 1-2 seconds? (I will deploy the app on heroku or digitalocean and will implement a proper caching environment to handle "regular visitors").
Right now
I use a virtualenv and django server for the development
and a demo server from the third party API which might be slowed down for whatever reason (how to check this?)
which might effect the current time needed (there will be many more variables obv.).
Looking forward to your answers.

I personally think (probably a lot people might too) 6-7 secs is a significant delay for rendering a small page. The cause of this issue might not came from django directly. Check for the following:
I use a virtualenv and django server for the development
you may be running django devserver, production server might make things bit faster (use django-debug-toolbar to find what causing the delay)
Do db index in your model.
a demo server from the third party API which might be slowed down for whatever reason
use chrome developer tools 'network' tab to watch how long that third party call takes. it might not visible there if you call api in your view.py. in that case, add some timing code there to calculate how long it takes to return.

Related

Is there a way to compute this amount of data and still serve a responsive website?

Currently I am developing a django + react website, that will (I hope) serve a decent number of users. The project demo is mostly complete, and I am starting to think about the scale required to put this thing into production
The website essentially does three things:
Grab data from external APIs (i.e. Twitter) for 50,000 unique keywords (the keywords dont change). This process happens every 30 minutes
Run computation on all of the data, and save that computation to the database. Assume that the algorithm is as optimized as possible
When a user visits the website it should serve a pretty graph/chart of all of the computational data per keyword
The issue being, this is far too intense a task to be done by the same application that serves the website, users would be waiting decades to see their data. My current plan is to have a separate API made that services the website with the data, that the website can then store in it's database. This separate API would process the data without fear of affecting users, and it should be able to finish its current computation in under 30 minutes, in time for the next round of data.
Can anyone help me understand how I can better equip my project to handle the scale? I'd love some ideas.
As a 4th year CS Student I figured it's time to put a real project out into the world and I am very excited about it and the progress I've made so far. My main worry is that the end users will be negatively effected, if I don't figure out some kind of pipeline to make this process happen.
To re-iterate my idea:
Django + React - This is the forward facing website
External API - Grabs the data off the internet and processes it, and waits for a GET request from the website
Is there a better way to do this? Or on the other hand am I severely overestimating how computationally heavy this is.
Edit: Including current research
Handling computationally intensive tasks in a Django webapp
Separation of business logic and data access in django
What you want is to have the computation task to be executed by a different process in the "background".
The most straight-forward and popular solution is to use Celery, see here.
The Celery worker(s) - which performs the background task - can either run on the same machine as the web-application or (when scale becomes an issue), you can change the configuration so that it will run on an entirely different machine.

Java web application for multiple users

I need to design and implement a Java web application that can be used by multiple users at the same time. The data that is handled by this application is going to be huge and may take about 5 minutes for a page to display the results(database records).
I had designed this application using HTML, Servlets and JSP. But when two users would try to get the records, only one user was able to view the results while the other faced an error.
I always thought a web application would take care of handling multiple users but this is not the case.
Any insights on this would be highly appreciated.
Thanks.
I always thought a web application would take care of handling multiple users but this is not the case.
They do if they're written correctly. Obviously yours is not. That's all we can tell you unless you give more information, most importantly details of the error shown to the second user.
One possibility is that everything is OK on the web layer but your DB access for the first user causes an exclusive lock so that the second user cannot access the data at the same time. This could be fixed by using non-exclusive read locks. How to do that depends mainly on what DB you're using.
Getting concurrency right requires you to choose the correct tools and use them correctly. It doesn't just happen magically because it's a web app.
What are are using to develop this web-application? If you are developing it in your own way from the start I must say you are trying to re-invent the same wheel which has been already created and enhanced by very solid frameworks.
I suggest you analyze your requirements thoroughly and study some available frameworks. Let them handle the things like multi threading and other aspects in the best possible manner.
Handling multiple request at a time is a container work and as an application developer we have to concentrate how we are handling and processing those requret being forwarded by the container.
I must suggest you to get some insight how web-application work and how request -response cycle happens

How can I scale a webapp with long response time, which currently uses django

I am writing a web application with django on the server side. It takes ~4 seconds for server to generate a response to the user. It makes use of a weather api. My application has to make ~50 query to that api for each user request.
Server side uses urllib of python for using the weather api. I used pythons threading to speed up the process because urllib is synchronous. I am using wsgi with apache. The problem is wsgi stack is fully synchronous and when many users use my application, they have to wait for one anothers request to finish. Since each request takes ~4 seconds, this is unacceptable.
I am kind of stuck, what can I do?
Thanks
If you are using mod_wsgi in a multithreaded configuration, or even a multi process configuration, one request should not block another from being able to do something. They should be able to run concurrently. If using a multithreaded configuration, are you sure that you aren't using some locking mechanism on some resource within your own application which precludes requests running through the same section of code? Another possibility is that you have configured Apache MPM and/or mod_wsgi daemon mode poorly so as to preclude concurrent requests.
Anyway, as mentioned in another answer, you are much better off looking at caching strategies to avoid the weather lookups in the first place, or offloading to client.
50 queries to an outside resource per request is probably a bad place to be, and probably not neccesary at all.
The weather doesn't change all that quickly, and so you can probably benefit enormously by just caching results for a while. Then it doesn't matter how many requests you're getting, you don't need to do more than a few queries per day
If that's not your situation, you might be able to get the client to do the work for you. Refactor the code so that the weather api aggregation happens on the client in javascript, rather than funneling it all through the server.
Edit: based on comments you've posted, what you are asking for probably cannot be optimized within the constraints of the API you are using. The problem is that the service is doing a good job of abstracting away the differences in the many sources of weather information they aggregate into a nearest location query. after all, weather stations provide only point data.
If you talk directly to the technical support people that provide the API, you might find that they are willing to support more complex queries (bounding box), for which they will give you instructions. More likely, though, they abstract that away because they don't want to actually reveal the resolution that their API actually provides, or because there is some technical reason in the way that they model their data or perform their calculations that would make such queries too difficult to support.
Without that or caching, you are just out of luck.

How to set up website periodic tasks?

I'm not sure if the topic is appropiate to the question... Anyway, suppose I've a website, done with PHP or JSP and a cheap hosting with limited functionalities. More precisely I don't own the server and I can't run daemons or services at my will. Now I want to run periodic (say every minute or two) tasks to do certain statistics. They are likely to be time consuming so I just can't repeat the calculation for every user accessing a page. I could do the calculation once when a user loads a page and I calculate that enough time has passed, but in this situation if the calculation gets very long the response time may be excessive and timeout (it's not likely I don't mean to run such long task, but I'm considering the worst case).
So given these costraints, what solutions would you suggest?
Each cheap hosting will have support of crontabs. Check out the hosting packages first.
If not, load in the page, and launch the task by AJAX. This way your response time doesn't suffer and you do in a different thread the work.
If you choose to use crontab, you're going to have to know a bit more to execute your PHP scripts from them. Depending on if your PHP executes as CGI or an apache module has an effect. There's a good article on how to do this at:
http://www.htmlcenter.com/blog/running-php-scripts-with-cron/
If you don't have access to crontab on your hosting provider (find a new one) there are other options. For example:
http://www.setcronjob.com/
Will call a script on your site, remotely every X period .. you have to renew it once a month I think. If you take their paid service ($5/year according to the front page) I think the jobs you set up last until you cancel them or your paid term runs out.
In Java, you could just create a Timer. It can create a background thread that will perform a given function every so often.
My preference would be to start the Timer object in a ContextListener.contextInitialized() method, but you may have a more appropriate place.

Web application monitoring best practices [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
We are finishing up our web application and planning for deployment. Very important aspect of deployment to production is monitoring the health of the system. Having a small team of developers/support makes it very critical for us to get the early notifications of potential problems and resolve them before they have impact on users.
Using Nagios seams like a good option, but wanted to get more opinions on what are the best monitoring tools/practices for web application in general and specifically for Django app? Also would welcome recommendations on what should be monitored aside from the obvious CPU, memory, disk space, database connectivity.
Our web app is written in Django, we are running on Linux (Ubuntu) under Apache + Fast CGI with PostgreSQL database.
EDIT
We have a completely virtualized environment under Linode.
EDIT
We are using django-logging so we have a way separate info, errors, critical issues, etc.
Nagios is good, it's good to maybe have system testing (Selenium) running regularily.
Edit: Hyperic and Groundwork also look interesting.
There is probably a test suite system that can keep pressure testing everything as well for you. I can't remember the name off the top of my head, maybe someone can mention one below.
Other things I like to do:
The best motto for infrastructure is always fix, detect, repair. Get it up, get to the root of it, and cure/prevent it if you can.
Since a system exists at many levels, we should test at many levels:
Edit: Have all errors or warnings posted directly to your case manager via email. That way you can track occurrences in one place.
1) Connection : monitor your internet connectivity from the server and from the outside. Log this somewhere
2) Server : monitor all the processes that you need to to ensure they are running and not pinning the server. Use a HP Server or something equivalent with hardware failure notification that it can do from a bios level. Notify and log if they are.
3) Software : Identify the key software that always needs to be running. Set the performance levels if any and then monitor them. Nagios should be able to help with this. On windows it can be a bit more. When an exception occurs, you should be able to run a script from it to restart processes automatically. My dream system is allowing me to interact with servers via SMS if the server sees it as an exception that I have to either permit, or one that will happen automatically unless I cancel by sms. One day..
4) Remote Power : Ensure Remote power-reset capabilities are in your hand. You might want to schedule weekly reboots if you ever use windows for anything.
5) Business Logic Testing : Have regularly running scripts testing the workflow of your system. Selenium can probably achieve some of this, but I like logging the results as well to say this ran at this time and these files had errors. If possible anywhere, have the system monitor itself through your scripts.
6) Backups : Make a backup that you can set and forget. If you can get things into virtual machines it would be ideal as you can scale, move, or deploy any part of your infrastructure anywhere. I have had instances where I moved a dead server onto my laptop, let it run in vmware while I fixed a problem.
Monitoring the number of connections to your Web server and your database is another good thing to track. Chances are if one shoots through the roof, something is starving for resources and the site is about to go down.
Also make sure you have a regular request for a URL that is a reasonable end-to-end test of the system. If your site supports search, then have nagios execute a search - that should make sure the search index is healthy, the Web server and the database server.
Also, make sure that your applications sends you email anytime your users see an error, or there is an unhandled exception. That way you know how the application is failing in the field.
If I had to pick one type of testing it would be to test the end-user functionality of the system. The important thing to consider is the user. While testing things like database availability, server up-time, etc, are all important, testing work-flows through your system via a remote UI testing system covers all these bases. If you know that the critical parts of your system are available to the end-user, then you know your system is prolly Ok.
Identify the important work-flows in your system. For example, if you wrote an eCommerce site you might identify a work-flow of "search for a product, put product in shopping cart, and purchase product".
Prioritize the work-flows, and build out higher-priority tests first. You can always add additional tests after you roll out to production.
Build UI tests using one of the available UI testing frameworks. There are a number of free and commercial UI testing frameworks that can be run in an automated fashion. Build a core set of tests first that address critical work-flows.
Setup at least one remote location from which to run tests. You want to test every aspect of your system, which means testing it remotely. Is the internet connection up? Is the web server running? Is the connection to the database server working? Etc, etc. If you test remotely you make sure you system is available to the outside world which means it is most likely working end-to-end. You can also run these tests internally, but I think it is critical to run them externally.
Make sure your solution includes both reporting and notification. If one of your critical work-flow tests fails, you want someone to know about it to fix the problem ASAP. If a non-critical task fails, perhaps you only want reporting so that you can fix problems out-of-band.
This end-user testing should not eliminate monitoring of system in your data-center, but I want to reiterate that end-user testing is the most important type of testing you can do for a web application.
Ahhh, monitoring. How I love thee and your vibrations at 3am.
Essentially, you need a way to inspect the internal state of your application, both at a specific moment, as well as over spans of time (the latter is very important for detecting problems before they occur). Another way to think of it is as glorified unit-testing.
We have our own (very nice) monitoring system, so I can't comment on Nagios or other apps. Our use case is similar to yours, though (cgi app on apache).
Add a logging.monitor() type method, which will log information to disk. This should support, at the least, logging simple numbers and dicts of numbers (the key=>value association can be incredibly handy).
Have a process that scrapes the monitoring logs and stores them into a database.
Have a process that takes the database information, checks them against rules, and sends out alerts. Keep in mind that somethings can be flaky. Just because you got a 404 once doesn't mean the app it down.
Have a way to mute alerts (very useful for maintenance or to read your email).
Thats all pretty high level. The important thing is that you have a history of the state of the application over time. From this, you can then create rules (perhaps just raw sql queries you put into a config somewhere), that say "If the queries per second doubled, send a SlashDotted alert", or "if 50% of responses are 404, send an alert". It also bedazzles management because you can quantify any comment about whether its up, down, fast, or slow.
Things to monitor include (others probably mentioned these as well): http status, port accessible, http load, database load, open connection, query latency, server accessibility (ssh, ping), queries per second, number of worker processes, error percentage, error rate.
Simple end-to-end tests are also very handy, though they can be brittle. Its best to keep them simple, but you should have one that tries to touch core pieces of the app (caching, database, authentication).
I use Munin and Monit, and have been very happy with both of them.
Internal logging is fine and dandy but when your whole app goes down or your box/enviro crashes you need an outside check too. http://www.pingdom.com/ has been very reliable for me.
My only other advice is I wouldnt spent too much time on this. my best example is twitter, how much energy did they put into the system being able to half-die instead of just investing that time and energy into throwing more hardware / scaling it out.
Chances are what ends up taking you down, your logging and health systems will have missed anyway.
The single most important way to monitor any online site is to monitor externally. The goal should be to monitor your site in a way that most closely reflects how your users use the site. In 99% of cases, as soon as you know that your site is down externally, it's relatively easy to find the root cause. The most important thing is to know as soon as possible that your customers are unable to load your site.
This generally means using an external performance monitoring service. They very from the very low end (mon.itor.us, pingdom) to the high end (Webmetrics, Gomez, Keynote). And as always, you get what you pay for. The things to look for when shopping around for a monitoring service include:
The size and distribution of the monitoring network
Whether or not the monitoring solution is able to monitor your site using a real browser (otherwise you aren't testing your site like a real user would)
The scripting language (to script the transactions against your site)
The support department, to help you along the way, and provide expertise on how to monitor correctly
Good luck!
Web monitoring by IP Patrol or SiteSentry have been useful for us. The second is a bit like site confidence but slightly prettier lol.
Have you thought about monitoring the functionality as well? A script (either in a scripting language like Perl or Pyton or using some tool like WebTest) that talks to your application and does some important steps like logging in, making a purchase, etc is very nice to have.
Aside from what to monitor, which has already been answered, you need to make sure - whatever system you use - that you get only one notification of an error that happens multiple times, on each request. Or your inbox will run out of memory :) Plus, it's plain annoying...
Divide the standby shifts among the support/dev team, so one person does not have to be on call every single evening. That will wear people down. Monitoring is a good thing, but everyone needs to get a chance to have a life once in a while. Your cellphone buzzing at 2AM for a few nights will get very old pretty soon, trust me. And not every developer is used to 24/7 support, so you need to find the balance between using monitoring and abusing monitoring.
Basically, have distinct escalation levels, and if the sky is not falling, define a "serenity now" window at night where smaller escalation levels don't go out.
I've been using Nagios + CruiseControl + Selenium for running high-level tests on mission critical web applications. I got burned pretty hard by a simple jquery error that stopped users from proceding through an online signup form.
http://www.agileatwork.com/the-holy-trinity-of-web-2-0-application-monitoring/
You can take a look at AlertGrid. This web application allows you to filter and forward alerts to your team (worldwide). It has also nice ability to monitor if something did not happen.
To paraphrase Richard Levasseur: ah, monitoring tools, how your imperfections frustrate me. There doesn't seem to be a perfect tool out there; Nagios is pretty easy to set up but the UI is kinda old fashioned and you have to have a daemon running on each server being monitored. Zenoss has a much nicer UI including trend graphs of resource usage, but it uses SNMP so you have to have some familiarity with that to get it working properly, and the documentation is not the best - there are hundreds of pages but it's really hard to find just the info you need to get started.
Friends of mine have also recommended Cacti and Hyperic, but I don't have personal experience with those.
One last thing - one of the other answers suggested running a tool that stresses your site. I wouldn't recommend doing that on your live site unless you have a reliable quiet period when nobody is hitting it; even then you might bring it down unexpectedly. Much better to have a staging server where you can run load tests before putting changes into production.
One of our clients uses Techout (www.techout.com) and is very pleased with the service.
There is no charge for alerts, no matter what kind or how many, and they offer email, voicemail and SMS alerts -- and if something major happens, a phone call from a live person to help you out.
It's all based on service -- you don't install the software and you have a consultant who works with you to determine the best approach for your business. It's one of the most convenient web application monitoring services because they take care of everything.
I would just add that you can predict error likelihood somewhat based on history of past errors and having fixed them. With smaller scale internal testing if you were to graph the frequency and severity of problems that have been corrected to this point you'll have an overview of predictable new problems. If everything has been running error free for some time now, then the two sources of trouble would be recent changes or scalability issues.
From the above it sounds like scalability is your only worry, but I just mention the past-error frequency test because the teams I've been on invariably think they got the last error fixed and there are no more. Until there is.
Changing the line a little bit, something I really think is useful and changed a lot how I monitor my apps is to log javascript exceptions somewhere. There's a very nice implementation that logs that directly from user browsers to Google Analytics.
This is a must for Javascript centered web applications, and can give you results based directly on users browsers what can lead to very unexpected errors (iE and mobile browser are pain)
Disclaimer: My post bellow
http://www.directperformance.com.br/en/javascript-debug-simples-com-google-analytics
For the internet presence monitoring, I would suggest the service that I am working on: Sucuri NBIM (Network-based integrity monitor).
It does availability and integrity checks, looking for changes on your internet presence (sites, DNS, WHOIS, headers, etc) and loss of connectivity. It is free and you can try it out here.