Use of Jetty Request Statistics in Production - jetty

From the Jetty documentation - http://www.eclipse.org/jetty/documentation/current/statistics-handler.html, it looks like enabling Connector statistics should be very cautiously used and should NOT be enabled for production use.
However, I wanted to know whether Request statistics (enabled by adding statistics handler) should be enabled or NOT for production use ?

This question is better suited for serverfault.com, as its not a programming question (which stackoverflow is for).
That documentation is written with a broad definition of what "Production" means.
The statistics gathering could work just fine on your production environment, but that depends on your environment, your http requests profile, networking capability, etc.
The group that maintains Jetty, tends to err on the side of high-traffic, high-load, websites, where that kind of statistics gathering is problematic, so a general warning is the most prudent approach when documenting it. (Note: even the terms "high-traffic" and "high-load" mean different things to different people)
There are many that consider any performance decrease (even as little as 10ns per request) to be out of the question, while others feel its ok.
Know however, that the statistics are just long values that just keep increasing. If you have a high-traffic server that stays up for months on end, then these statistics will eventually be invalid/bad (as you exceed the Long.MAX_LONG)

Related

Why bother using Apache or Nginx, etc?

I've been assigned a project which requires me to add some HTML page serving. This embedded system (running Linux CentOS 6.3) has some extra juice available, but also already has numerous responsibilities.
I considered Apache but tossed it due to bloat, I looked into Nginx but am now shying from that too. It just seems that I'm getting way more 'functionality' and as a result, more CPU usage than I need.
Can someone enlighten me as to why I wouldn't just implement the HTTP protocol myself using async sockets?
My specific needs are:
Receive and decode GETs and POSTs.
Send CSS, JS and JPG files as requested.
Output header, cookie, head and body data based upon the decode of the GETs/POSTs.
Given that I don't need the myriad things these webservers offer, am I being naive in assuming this course of doing it myself? What would you suggest or warn against?
Basically, you use a web server because then you get the functionality you want in a form that's already been tested, is more reliable than your first code is likely to be, and is supported by a large community of others. If Apache and nginx are too heavyweight for you (although nginx is pretty much characterized by how lightweight it is for heavy loads) and especially if the load you expect is very light, then look around for other options.
Wiki has a whole page of comparisons of lightweight web servers.
An easy trap to fall into: thinking "I don't need all the functionality in Product X, I'll just write my own with just the functionality that I need" only to end up reimplementing Product X entirely, one newly-discovered requirement at a time.
I sort of doubt that an embedded system that can run CentOS okay is so resource-starved that it can't run Nginx comfortably (or even Apache, which people run on the Raspberry Pi just fine with appropriate configuration tweaks), given reasonable assumptions about how many pages you are actually serving. I ran it on a Pentium 266 with something like 256MB of RAM serving a few simple PHP apps that served roughly a page every two seconds, with no issues. As I recall, it's fairly modular, so you can just choose not to load the functionality you don't think you need. And, later, when your requirements change and you find out you do need it, you can just plug it back in :)
If you are really and truly concerned about resource consumption, look into web servers designed for embedded applications. I hear Cherokee is quite nice. Mongoose looks promising as well.
Go further you can, I began with this http://www.w3.org/Protocols/HTTP/HTTP2.html

How should a Windows 8 Metro Application connect to a central database?

How should a Windows 8 Metro application connect to a central database?
I've read about local storage, but I haven't read anything about connecting to a central database.
Obviously, this architectural design decision needs to support the disconnected scenario.
WCF web services seem to make sense.
But even if they do make sense, should we really create separate methods for all read/write operations?
Or are OData WCF services the way to go?
It seems like tablet software architecture should be able to borrow a lot from smartphone software architecture (but I am new to both).
Has Microsoft made any recommendations in its app samples?
It appears that others are asking similar questions on the Microsoft Developer Forums.
Here is what I've found:
According to Tim Heuer:
...You cannot directly have a SQL db embedded in your app or use
something like ADO.NET. This is more of an async/services
infrastructure. So if your data was exposed via services, then of
course you could connect that way. There are some other light-weight
methods you could use for local storage as well using things like the
Windows.Storage namespace (which is similar to Isolated Storage in
.NET).
Morten Nielsen agrees:
You can use HttpClient to download pretty much anything from the web.
Why don't you configure your WCF service to return data as JSON, and
use the DataContractJsonSerializer to deserialize the results?
Also, Tim Heuer cautions:
...Please note that while awesome, the SQLWinRT project on codeplex is a
wrapper to communicate with the classic SQLite engine...which uses
APIs that would not pass store validation currently.
Generic Object Storage Helper for WinRT and WinRTFile Based Database seem to have some promise.
But Daniel Stolt raises some good points:
It's awesome that there is good support for building OData clients and
other REST clients - but this only addresses the online scenario. The
"structured" part of Windows.Storage is a very limited model,
essentially limited to name/value pairs, insufficient for all but the
most basic scenarios. Yes there is local file storage, which is great
of course. But forcing every app developer out there to build her own
DBMS on top of local file storage will simply not cut it, especially
with all of System.Data having been removed from the profile. If local
file storage was sufficient for most device apps, then things like
SQLCE would have no purpose today already. And SQLCE clearly has a
purpose, and has played a very important role for occasionally
connected device apps for a very long time. There is also a tremendous
need for synchronization with a server-side database such as SQL
Azure, mostly to be able to roam data between devices. Yes there is
the roaming storage model in WinRT, but it shares the same limitations
of local storage mentioned above, and on top of this is very limited
in capacity (currently 30KB if memory serves). It is simply
insufficient for all but the simplest roaming data needs. Again,
forcing every app developer to design and implement her own
synchronization solution is very bad. You can do much better to enable
developers.
Many people are disappointed that the System.Data namespace is not supported in WinRT.
Richard Bethell said:
I don't even have words for this. This is astonishing. Leave aside for
the moment they want to force you to abstract to middleware for
database connectivity - I don't agree, but I can quasi understand a
rationale for that. I can even see pathways for developing like that.
But no System.Data.... at all? Do you even understand what you've done
to us?
What System.Data can do, outside of just having providers for Sql,
OleDb and other custom providers like Oracle, is provide a rich
abstraction of XML datasets that allow you to very quickly build a
data oriented Service Oriented Architecture.
For instance, I can easily create a web service using SOAP or WCF that
returns DataSets or DataTables, and then consume those objects easily
and directly. Being able to do this allows very rapid construction of
n-tier architectures, even without direct data connections available.
Without System.Data, and the power of DataViews, DataTables, etc. this
gets a lot harder. Sure you can custom create structs, put data in
there, and serve up structs, and use Linq to do whatever sorting,
filtering, etc. you want to do.... but it ends up being twice the
work, and makes code reuse a lot harder. And it means using our
existing service oriented architecture is impossible (without a big
overhaul.)
The withdrawal of System.Data is as big a thing for developers to deal
with as the loss of the Printer object in VB6 to vb.net 1.0 was. What
is harder to understand in this case is why it is necessary -
re-enabling it in the Metro profile can't possibly be a technical
difficulty of the product, can it?
It is valuable enough that I would seriously consider including Mono's
System.Data classes as part of any app I create (which would obviously
have to be open source.)
I think that this is another of those "it depends" questions...
The first and most obvious issue is that it very much depends on the context in which the application is running as to whether, to take the first case "Obviously...support...disconnected" is actually true - if the app is an internal corporate app then quite possibly not in that case no db == not work.
Secondly you could look (hmm, rash... one assumes you could look, this could be a bad assumption) at database synchronisation between a local SQL database and the remote db and so on and so forth.
Taking a step back... yes - you're absolutely right, look at it as being the same as phone or silverlight (although I don't know if there is yet RIA support) - but the thing is at this point its very hard to be prescriptive because given a general purpose platform one can therefore write applications to suit all sorts of purposes.
Not a hugely helpful answer really - but a start.
Having read #Jim G's answer it seems that I should probably withdrawn mine?

How to choose server for production release of my Django application?

My company is at the very end of development process of some (awesome:)) web application. This app will be available as a online service for (hopefully) some significant number of users. This is our biggest django release so far and as we are preparing to release some question about deployment have to be answered.
Q1: how to determine required server parameters for predicted X number of users/Y hits per minute or other factor?
Q2: what hosting solution (shared/vps/dedicated) is worth considering?
Q3: what optimizations can be done at a first place?
I know that this is very subjective and dependent of size of a site, code quality and other factors but I'm very interested in your experiences with django (and not only django) deployment. Any hints, links, advices are kindly welcome. Thanks in advance.
What hosting solution you want to have depends also on how much you want to take of your server yourself (from upgrades etc to backup...), and you should decide if you want to have the responsibility or leave it to someone else.
I think you can only really determine the necessary requirements and bottlenecks in your applications through testing with the estimated load! Try to simulate as many requests.... as you expect - think about caching (where memcached is the best option you have)! If you try to cache things one great tool is the django debug toolbar (http://github.com/robhudson/django-debug-toolbar) which can show you also much about how many database hits you have (dont take the times it shows for that for granted, but analyse them and keep an eye on the number of hits) and eg. how many templates are rendered....
If your system grows, you can first of all think about serving your static media files from another location! Coming to the web server I have made great experiences using lighttpd instead of the fat apache, but I guess you have to evaluate that for yourself!
Also take in consideration what database backend to use, in shared envionments there's in most cases a bigger load on the mysql than on the postgres servers, but also evaluate what works best for you!
You can get some guesses here, but to get a halfway reasonable performance estimate you have to measure the performance of your application yourself. You should then be able to roughly extrapolate the performance on different hardware.
Most of the time the bottleneck is the database, you should get enough RAM to keep it in memory if possible.
"Web application" can encompass so many different things, we can really do no more than guess here.
As for optimization, if it fits to your needs implement some caching (e.g. with memchached), that can give you huge speed improvements.

how to perform profiling for a website?

I currently have a django site, and it's kind of slow, so I want to understand what's going on. How can I profile it so to differentiate between:
effect of the network
effect of the hosting I'm using
effect of the javascript
effect of the server side execution (python code) and sql access.
any other effect I am not considering due to the massive headache I happen to have tonight.
Of course, for some of them I can use firebug, but some effects are correlated (e.g. javascript could appear slow because it's doing slow network access)
Thanks
client side:
check with firebug if/which page components take long to load, and how long the browser needs to render the page after loading is completed. If everything is fast but rendering takes its time, then probably your html/css/js is the problem, otherwise it's server side.
server side (i assume you sit on some unix-alike server):
check the web server with a small static content (a small gif or a little html page), using apache bench (ab, part of the apache webserver package) or httperf, the server should be able to answerat least 100 requests per second (of course this depends heavily on the size of your test content, webserver type, hardware and other stuff, so dont take that 100 to seriously). if that looks good,
test django with ab or httperf on a "static view" (one that doesnt use a database object), if thats slow it's a hint that you need more cpu power. check cpu utilization on the server with top. if thats ok, the problem might be in the way the web server executes the python code
if serving semi-static content is ok, your problem might be the database or IO-bound. Database problems are a wide field, here is some general advice:
check i/o throughput with iostat. if you see lot's of writes then you have get a better disc subsystem, faster raid, SSD hard drives .. or optimize your application to write less.
if its lots of reads, the host might not have enough ram dedicated as file system buffer, or your database queries might not be optimized
if i/o looks ok, then the database might be not be suited for your workload or not correctly configured. logging slow queries and monitoring database activity, locks etc might give you some idea
if you let us know what hardware/software you use i might be able to give more detailed advice
edit/PS: forgot one thing: of course your app might have a bad design and does lots of unnecessary/inefficient things ...
Take a look at the Django debug toolbar - that'll help you with the server side code (e.g. what database queries ran and how long they took); and is generally a great resource for Django development.
The other non-Django specific bits you could profile with yslow.
There are various tools, but problems like this are not hard to find because they are big.
You have a problem, and when you remove it, you will experience a speedup. Suppose that speedup is some factor, like 2x. That means the program is spending 50% of its time waiting for the slow part. What I do is just stop it a few times and see what it's waiting for. In this case, I would see the problem 50% of the times I stop it.
First I would do this on the client side. If I see that the 50% is spent waiting for the server, then I would try stopping it on the server side. Then if I see it is waiting for SQL queries, I could look at those.
What I'm almost certain to find out is that more work is being requested than is actually needed. It is not usually something esoteric like a "hotspot" or an "algorithm". It is usually something dumb, like doing multiple queries when one would have been sufficient, so as to avoid having to write the code to save the result from the first query.
Here's an example.
First things first; make sure you know which pages are slow. You might be surprised. I recommend django_dumpslow.

Which web framework incurs the least overhead?

I'm playing around with Django on my website hosting service.
I found out that a simple Django page, which has only some static text, and is rendered from a very simple template I created takes a significant time to render. When compared to a static HTML page, I am getting ~2 seconds difference in the load times. Keep in mind this is a simple test of mine with nothing complicated. Also note that my web hosting is on a shared server (not dedicated), so I might be hitting some CPU limitations.
Seems to me that either:
I have some basic CGI/Apache/Django configuration wrong
Django takes significant overhead, at least in this specific scenario.
I find #1 not probable since I followed my web hosting service wiki on how to set up Django. So we are left with the overhead problem.
My question is which web framework do you find the best to use in scenarios where the website is hosted on a shared server, and CPU/memory overhead must be kept to minimum?
Edit: seems that my configuration is something I might want to look at, and perhaps later on I'll be opening a question on how to best configure Django.
For now, I would appreciate answers focusing on your experience, in general, with web frameworks, and which of those you found to be the best in terms of performance in the aforementioned scenario.
"I have some basic CGI/Apache/Django configuration wrong"
Correct.
First. The very first time Django returns a page, it takes forever. A lot of initialization happens for the first request.
Second. What specific configuration are you using. We just switched from mod_python to mod_wsgi in daemon mode and are very happy with the performance changes.
Third. What database are you using?
Fourth. What test configuration are you using?
Fifth. What caching parameters and reverse proxy are you using?
Odds are good that you have a lot of degrees of freedom in your configuration.
Edit
The question "which of those you found to be the best in terms of performance" is largely impossible to answer.
See http://wiki.python.org/moin/WebFrameworks
There are dozens of frameworks. Few people can examine more than a few to do head-to-head comparison.
The best possible performance is achieved through static content. A Python app that makes static pages (for instance a collection of Jinja templates) is fastest.
After that, it's largely impossible to say. Even http://werkzeug.pocoo.org/ involves some processing overheads that may be unacceptable in the above scenario. Python can be slow.
Django, with a modicum of effort, is often fast enough. Serving static content separately from dynamic content, for example, can be a huge speedup.
Since Django does so much automatically, there's a huge victory in not having to write every little administrative page.
I'd say there has to be something funky with your setup there to get such a large performance difference. Try mod_wsgi (if you're not already) and follow the excellent suggestions by the posters above. If Django genuinely was this slow in all cases, there's just no way companies would be able to use it for production applications. It's more than likely not to be Django that is holding the request up. Once you have the .pyc files all sorted (automatically generated bytecode), then the execution should be fairly zippy.
However, if you don't actually need all Django has to offer, then why use it? I'm using it in quite a large production application, and we're not using all of its features… if you're doing something fairly simple, you may want to consider using something like web.py or Werkzeug (or something non-Python-based if you'd rather).
Frameworks like Django or Ruby on Rails grew out of real world needs. As different as these needs were, as different they turned out.
Here is my Experience:
As a former PHP programmer, I prefered CakePHP for simple stuff and Symfony for more advanced applications. I had a look into Ruby, but the documentation sucked back then. Now I'm using Django. Django works very well for me. In contrast to Symfony I feel like Django brings less flexibility out of the Box, but its easier to extend.
Another approach would be to use 'no framework' CherryPy
I think the host may be an issue. I do Django development on my localhost (Mac) and it's way better. I like WebFaction for cheap hosting and Amazon ec2 for premium hosting.
The framework is strong and it can handle heavy sites - don't obsess about that stuff. The important thing is to create a clean product, Django can handle it. There are about a thousand steps to take when you see how the application handles in the wild, but for now, just trust us that you don't need to worry about the inherent speed of the framework before exhausting a whole slew of parameters including a dedicated VPS/instance when you need it.
Also, following on your edit - I personally don't think performance is a major issue in programming. Here are the issues in terms of concern:
UI/UX efficiency
UI/UX speed (application caching)
Well designed models/views
Optimization of the system (n-tier architecture, etc...)
Optimization of the process (good QA to reduce failures/bottlenecks from deployment)
Optimization of the subsystems (database, etc...)
Hardware
Framework internal optimization
Don't waste time with comparing framework speeds. Their advantage is in extensible code, smart architectures, etc...
On a side note, DO NOT NOT USE A FRAMEWORK FOR A NEW WEB APPLICATION. I'm sorry I can't say it loud enough, but it's an absolute requirement nowadays. It's not even a debate about not using one, just which one to use.
I personally chose Django, which is great. But I can't definitively knock the others out there.
It's possibly both. Django does have stuff for caching built-in, which would be worth trying. Regardless, any non-cached page will nearly always take longer than a static file. A file has to be read in both cases, and in the case of a dynamic page, it also must be executed. And then, in both cases, sent over to the client.
Definetely shared hosting is not the best choice to run heavy frameworks such as Django or CakePHP. If you can afford it, buy VPS.
As for performance, probably your host uses Python with mod-python, which is not recommended now. WSGI is preferred standard for Python powered webapps.