Why bother using Apache or Nginx, etc? - c++

I've been assigned a project which requires me to add some HTML page serving. This embedded system (running Linux CentOS 6.3) has some extra juice available, but also already has numerous responsibilities.
I considered Apache but tossed it due to bloat, I looked into Nginx but am now shying from that too. It just seems that I'm getting way more 'functionality' and as a result, more CPU usage than I need.
Can someone enlighten me as to why I wouldn't just implement the HTTP protocol myself using async sockets?
My specific needs are:
Receive and decode GETs and POSTs.
Send CSS, JS and JPG files as requested.
Output header, cookie, head and body data based upon the decode of the GETs/POSTs.
Given that I don't need the myriad things these webservers offer, am I being naive in assuming this course of doing it myself? What would you suggest or warn against?

Basically, you use a web server because then you get the functionality you want in a form that's already been tested, is more reliable than your first code is likely to be, and is supported by a large community of others. If Apache and nginx are too heavyweight for you (although nginx is pretty much characterized by how lightweight it is for heavy loads) and especially if the load you expect is very light, then look around for other options.
Wiki has a whole page of comparisons of lightweight web servers.

An easy trap to fall into: thinking "I don't need all the functionality in Product X, I'll just write my own with just the functionality that I need" only to end up reimplementing Product X entirely, one newly-discovered requirement at a time.
I sort of doubt that an embedded system that can run CentOS okay is so resource-starved that it can't run Nginx comfortably (or even Apache, which people run on the Raspberry Pi just fine with appropriate configuration tweaks), given reasonable assumptions about how many pages you are actually serving. I ran it on a Pentium 266 with something like 256MB of RAM serving a few simple PHP apps that served roughly a page every two seconds, with no issues. As I recall, it's fairly modular, so you can just choose not to load the functionality you don't think you need. And, later, when your requirements change and you find out you do need it, you can just plug it back in :)
If you are really and truly concerned about resource consumption, look into web servers designed for embedded applications. I hear Cherokee is quite nice. Mongoose looks promising as well.

Go further you can, I began with this http://www.w3.org/Protocols/HTTP/HTTP2.html

Related

Is it an efficient method to use a framework to code a single interactive webpage in python?

This is an open source contributor project for Wikidata's Chronic Pain project.
I would like to create a webpage that :
Have inputboxes where the user select several wikipedia page titles (with suggestions)
Can also take these parameters via the URL
Get items metadata from Wikidata.
Makes a SPARQL request to gather scholarly articles.
Render data from Wikidata and Wikipedia, linking to various wiki pages.
The webpage will be hosted on Wikimedia fundation server. I have access to a linux container as well as a Jupyter Notebook (not sure this one is suitable for this project). It has to be coded in Python 3 since I will use Pywikibot framework to interact with Wikidata.
I'm new to programming so that I don't really know what is the best approach. I heard that it was difficult to code webpages in Python without using a framework like Django. However this page is very simple so that it may not be the most efficient to deploy Django for this ?
NB : your question is bordering on "primarily opinion based" (which doesn't mean it's a bad question by itself but that answers might be more, well, opinions than hard facts).
This being said, "a single interactive page" doesn't mean the server code behind is just loading a static html file and sending it to the client. For example, the main UI part of our product is, technically speaking, "a single interactive page", but this "single" page is full react app and is backed by a dedicated API with a dozen entry points, which the dispatch to a whole load of backend code including database access, celery tasks etc. It would of course be technically possible to code all this with only pure wsgi or even plain old cgi code, but well, it could also be possible to write it directly in C or even assembly and no one would ever consider this a viable solution.
To make a long story short: do not even waste your time trying to code this project with plain wsgi (and let's not talk cgi), you will end up reinventing the squared wheel and everyone will hate you for this (stakeholders because you'll never deliver a robust, working product in due time and budget, and other devs because they'll now have to port the whole darn thing to a stable, mature and maintained framework). Now if Django appears to be overkill for this project there are much lighter frameworks like flask. Actually both are the "industry standard" and safe choices.

What are the gotchas with ColdFusion?

Background:
I have a new site in the design phase and am considering using ColdFusion. The Server is currently set-up with ColdFusion and Python (done for me).
It is my choice on what to use and ColdFusion seems intriguing with the tag concept. Having developed sites in PHP and Python the idea of using a new tool seems fun but I want to make sure it is as easy to use as my other two choices with things like URL beautification and scalability.
Are there any common problems with using ColdFusion in regards to scalability and speed of development?
My other choice is to use Python with WebPy or Django.
ColdFusion 9 with a good framework like Sean Cornfeld's FW/1 has plenty of performance and all the functionality of any modern web server development language. It has some great integration features like exchange server support and excel / pdf support out of the box.
Like all tools it may or may not be the right one for you but the gotchas in terms of scalability will usually be with your code, rarely the platform.
Liberally use memcached or the built in ehache in CF9, be smart about your data access strategy, intelligently chunk returned data and you will be fine performance wise.
My approach with CF lately involves using jQuery extensively for client side logic and using CF for the initial page setup and ajax calls to fill tables. That dramatically cuts down on CF specific code and forces nice logic separation. Plus it cuts the dependency on any one platform (aside from the excellent jQuery library).
To specifically answer your question, if you read the [coldfusion] tags here you will see questions are rarely on speed or scalability, it scales fine. A lot of the questions seem to be on places where CF is a fairly thin layer on another tool like Apache Axis (web services) and ExtJs (cfajax) - neither of which you need to use. You will probably need mod-rewrite or IIS rewrite to hide .cfm
Since you have both ColdFusion and Python available to you already, I would carefully consider exactly what it is you're trying to accomplish.
Do you need a gradual learning curve, newbie-friendly language (easy for someone who knows HTML to learn), great documentation, and lots of features that make normally difficult tasks easy? That sounds like a job for ColdFusion.
That said, once you get the basics of ColdFusion down, it's easy to transition into an Object Oriented approach (as others have noted, there are a plethora of MVC frameworks available: FW/1, ColdBox, Fusebox, Model-Glue, Mach-ii, Lightfront, and the list goes on...), and there are also dependency management (DI/IoC) frameworks (my favorite of which is ColdSpring, modeled after Java's Spring framework), and the ability to do Aspect-Oriented Programming, as well. Lastly, there are also several ORM frameworks (Transfer, Reactor, and DataFaucet, if you're using CF8 or earlier, or add Hibernate to the list in CF9+).
ColdFusion also plays nicely with just about everything else out there. It can load and use .Net assemblies, provides native access to Java classes, and makes creating and/or consuming web services (particularly SOAP, but REST is possible) a piece of cake. (I think it even does com/corba, if you feel like using tech from 1991...)
Unfortunately, I've got no experience with Python, so I can't speak to its strengths. Perhaps a Python developer can shed some light there.
As for url rewrting, (again, as others have noted) that's not really done in the language (though you can fudge it); to get a really nice looking URL you really need either mod_rewrite (which can be done without .htaccess, instead the rules would go into your Apache VHosts config file), or with one of the IIS URL Rewriting products.
The "fudging" I alluded to would be a url like: http://example.com/index.cfm/section/action/?search=foo -- the ".cfm" is in the URL so that the request gets handed from the web server (Apache/IIS) to the Application Server (ColdFusion). To get rid of the ".cfm" in the URL, you really do have to use a URL rewriting tool; there's no way around it.
From two years working with CF, for me the biggest gotchas are:
If you're mainly coding using tags (rather than CFScript) and formatting for readability, be prepared for your output to be filled with whitespace. Unlike other scripting languages, the whitespace between statements are actually sent to the client - so if you're looping over something 100 times and outputting the result, all the linebreaks and tabs in the loop source code will appear 100 times. There are ways around this but it's been a while - I'm sure someone on SO has asked the question before, so a quick search will give you your solution.
Related to the whitespace problem, if you're writing a script to be used with AJAX or Flash and you're trying to send xml; even a single space before the DTD can break some of the more fussy parsing engines (jQuery used to fall over like this - I don't know if it still does and flash was a nightmare). When I first did this I spent hours trying to figure out why what looked like well formed XML was causing my script to die.
The later versions aren't so bad, but I was also working on legacy systems where even quite basic functionality was lacking. Quite often you'll find you need to go hunting for a COM or Java library to do the job for you. Again, though, this is in the earlier versions.
CFAJAX was a heavy, cumbersome beast last time I checked - so don't bother, roll your own.
Other than that, I found CF to be a fun language to work with - it has its idiosyncracies like everything else, but by and large it was mostly headache free and fast to work with.
Hope this helps :)
Cheers
Iain
EDIT: Oh, and for reasons best known to Adobe, if you're running the trial version you'll get a lovely fat HTML comment before all of your output - regardless of whether or not you're actually outputting HTML. And yes, because the comment appears before your DTD, be prepared for some browsers (not looking at any one in particular!) to render it like crap. Again - perhaps they've rethought this in the new version...
EDIT#2: You also mentioned URL Rewriting - where I used to work we did this all the time - no problems. If you're running on Apache, use mod_rewrite, if you're running on IIS buy ISAPI Rewrite 3.
do yourself the favor and check out the CFWheels project. it has the url rewriting support and routes that you're looking for. also as a full stack mvc framework, it comes with it's own orm.
It's been a few years, so my information may be a little out of date, but in my experience:
Pros:
Coldfusion is easy to learn, and quick to get something up and running end-to-end.
Cons:
As with many server-side scripting languages, there is no real separation between persistence logic, business logic, and presentation. All of these are typically interwoven throughout a typical Coldfusion source file. This can mean a lot more work if you want to make changes to the database schema of a mature application, for example.
There are some disciplines that can be followed to make things a little more maintainable; "Fusebox" was one. There may be others.

how to perform profiling for a website?

I currently have a django site, and it's kind of slow, so I want to understand what's going on. How can I profile it so to differentiate between:
effect of the network
effect of the hosting I'm using
effect of the javascript
effect of the server side execution (python code) and sql access.
any other effect I am not considering due to the massive headache I happen to have tonight.
Of course, for some of them I can use firebug, but some effects are correlated (e.g. javascript could appear slow because it's doing slow network access)
Thanks
client side:
check with firebug if/which page components take long to load, and how long the browser needs to render the page after loading is completed. If everything is fast but rendering takes its time, then probably your html/css/js is the problem, otherwise it's server side.
server side (i assume you sit on some unix-alike server):
check the web server with a small static content (a small gif or a little html page), using apache bench (ab, part of the apache webserver package) or httperf, the server should be able to answerat least 100 requests per second (of course this depends heavily on the size of your test content, webserver type, hardware and other stuff, so dont take that 100 to seriously). if that looks good,
test django with ab or httperf on a "static view" (one that doesnt use a database object), if thats slow it's a hint that you need more cpu power. check cpu utilization on the server with top. if thats ok, the problem might be in the way the web server executes the python code
if serving semi-static content is ok, your problem might be the database or IO-bound. Database problems are a wide field, here is some general advice:
check i/o throughput with iostat. if you see lot's of writes then you have get a better disc subsystem, faster raid, SSD hard drives .. or optimize your application to write less.
if its lots of reads, the host might not have enough ram dedicated as file system buffer, or your database queries might not be optimized
if i/o looks ok, then the database might be not be suited for your workload or not correctly configured. logging slow queries and monitoring database activity, locks etc might give you some idea
if you let us know what hardware/software you use i might be able to give more detailed advice
edit/PS: forgot one thing: of course your app might have a bad design and does lots of unnecessary/inefficient things ...
Take a look at the Django debug toolbar - that'll help you with the server side code (e.g. what database queries ran and how long they took); and is generally a great resource for Django development.
The other non-Django specific bits you could profile with yslow.
There are various tools, but problems like this are not hard to find because they are big.
You have a problem, and when you remove it, you will experience a speedup. Suppose that speedup is some factor, like 2x. That means the program is spending 50% of its time waiting for the slow part. What I do is just stop it a few times and see what it's waiting for. In this case, I would see the problem 50% of the times I stop it.
First I would do this on the client side. If I see that the 50% is spent waiting for the server, then I would try stopping it on the server side. Then if I see it is waiting for SQL queries, I could look at those.
What I'm almost certain to find out is that more work is being requested than is actually needed. It is not usually something esoteric like a "hotspot" or an "algorithm". It is usually something dumb, like doing multiple queries when one would have been sufficient, so as to avoid having to write the code to save the result from the first query.
Here's an example.
First things first; make sure you know which pages are slow. You might be surprised. I recommend django_dumpslow.

Web 2.0 and dial-up: how make it as painless as possible?

I'm trying to put a workable plan together for a charity that could really make good use of a forum and a wiki, but a crucial part of its operations happen in parts of the world where dial-up connection dominates and probably will continue to do so for the foreseeable future.
This site was recommended as one that behaves well even on a dial-up connection, so I thought I'd ask for some help here!
The site I want to hook this on to is using Drupal. Anyone out there with experiences like this who could maybe help?
Behaving well on dial-up involves sitting down and optimizing your HTML, CSS, and images to be as small as possible, and then ensuring that your server is sending sane HTTP headers for caching. Make sure your CSS stylesheet is external, and shared across all pages. If dial-up is a major issue, you'll want to stick to a single stylesheet if possible. Avoid JavaScript, because those computers usually don't have the processing power for it either. If you must use JavaScript, jQuery is extremely small and very fast and highly recommended, but I suspect that for most content-oriented websites, it won't be necessary.
To be honest, if you produce valid XHTML/HTML5, valid CSS, and you follow all of the usual best practices for standards-based web design (no table layouts, semantic markup, etc), dial-up really won't be an issue. It'll just work.
To tweak the maximum performance out of your site you might want to install this and use it on your site when you are done with the initial development- ySlow - this will analyse your pages and highlight all the areas you can improve. It's really a great tool for optimising site download speeds.
You should be able to accomplish this, but to be honest you are going to loose a lot in the way of user experience by creating a dial-up friendly site. It basically means you have to do the following to optimize for the experience:
Keep JS to a minimum
Make sure the JS is minified.
Reduce large image requirements w/ CSS and some optimal planning of layout
Make sure caching is enabled in the headers so that new files only get downloaded when nessisary.
If you do all this, you should have a site that is acceptable on dialup.
There are already some hints on how to keep page sizes and load times down.
To complement this, you could use a software that simulates limited bandwith. This helps you test the speed of your site on dialup.
There are several available (just google "simulate dialup").
Sloppy e.g. seems quite usable.
You could also do what Google does for Gmail, i.e. provide 2 versions of your view, one for slow connections that uses plain old HTML, and one for faster connections. You could make the default one the slow one, but provide a link to enable the faster one.
Gmail also has a built-in mechanism that detects when you load the page whether it's going fast or not and will automatically revert to the plain HTML view if it's too slow, which is another fancier alternative.
Your main goal should be minimum page size (keep only HTML in pages, all styling information should be externalized in css files for caching, same for JavaScript in js files) and minimum round trips to server (full requests and post backs). Contrary to popular belief a JS heavy site could work like a charm if you perform a lot of heavy duty client side and keep the server roundtrips clean with the minimum amount of data needed (think JQuery and AJAX here with small partial renderings).
P.S. If u'r using .NET throw ViewState away.

Which web framework incurs the least overhead?

I'm playing around with Django on my website hosting service.
I found out that a simple Django page, which has only some static text, and is rendered from a very simple template I created takes a significant time to render. When compared to a static HTML page, I am getting ~2 seconds difference in the load times. Keep in mind this is a simple test of mine with nothing complicated. Also note that my web hosting is on a shared server (not dedicated), so I might be hitting some CPU limitations.
Seems to me that either:
I have some basic CGI/Apache/Django configuration wrong
Django takes significant overhead, at least in this specific scenario.
I find #1 not probable since I followed my web hosting service wiki on how to set up Django. So we are left with the overhead problem.
My question is which web framework do you find the best to use in scenarios where the website is hosted on a shared server, and CPU/memory overhead must be kept to minimum?
Edit: seems that my configuration is something I might want to look at, and perhaps later on I'll be opening a question on how to best configure Django.
For now, I would appreciate answers focusing on your experience, in general, with web frameworks, and which of those you found to be the best in terms of performance in the aforementioned scenario.
"I have some basic CGI/Apache/Django configuration wrong"
Correct.
First. The very first time Django returns a page, it takes forever. A lot of initialization happens for the first request.
Second. What specific configuration are you using. We just switched from mod_python to mod_wsgi in daemon mode and are very happy with the performance changes.
Third. What database are you using?
Fourth. What test configuration are you using?
Fifth. What caching parameters and reverse proxy are you using?
Odds are good that you have a lot of degrees of freedom in your configuration.
Edit
The question "which of those you found to be the best in terms of performance" is largely impossible to answer.
See http://wiki.python.org/moin/WebFrameworks
There are dozens of frameworks. Few people can examine more than a few to do head-to-head comparison.
The best possible performance is achieved through static content. A Python app that makes static pages (for instance a collection of Jinja templates) is fastest.
After that, it's largely impossible to say. Even http://werkzeug.pocoo.org/ involves some processing overheads that may be unacceptable in the above scenario. Python can be slow.
Django, with a modicum of effort, is often fast enough. Serving static content separately from dynamic content, for example, can be a huge speedup.
Since Django does so much automatically, there's a huge victory in not having to write every little administrative page.
I'd say there has to be something funky with your setup there to get such a large performance difference. Try mod_wsgi (if you're not already) and follow the excellent suggestions by the posters above. If Django genuinely was this slow in all cases, there's just no way companies would be able to use it for production applications. It's more than likely not to be Django that is holding the request up. Once you have the .pyc files all sorted (automatically generated bytecode), then the execution should be fairly zippy.
However, if you don't actually need all Django has to offer, then why use it? I'm using it in quite a large production application, and we're not using all of its features… if you're doing something fairly simple, you may want to consider using something like web.py or Werkzeug (or something non-Python-based if you'd rather).
Frameworks like Django or Ruby on Rails grew out of real world needs. As different as these needs were, as different they turned out.
Here is my Experience:
As a former PHP programmer, I prefered CakePHP for simple stuff and Symfony for more advanced applications. I had a look into Ruby, but the documentation sucked back then. Now I'm using Django. Django works very well for me. In contrast to Symfony I feel like Django brings less flexibility out of the Box, but its easier to extend.
Another approach would be to use 'no framework' CherryPy
I think the host may be an issue. I do Django development on my localhost (Mac) and it's way better. I like WebFaction for cheap hosting and Amazon ec2 for premium hosting.
The framework is strong and it can handle heavy sites - don't obsess about that stuff. The important thing is to create a clean product, Django can handle it. There are about a thousand steps to take when you see how the application handles in the wild, but for now, just trust us that you don't need to worry about the inherent speed of the framework before exhausting a whole slew of parameters including a dedicated VPS/instance when you need it.
Also, following on your edit - I personally don't think performance is a major issue in programming. Here are the issues in terms of concern:
UI/UX efficiency
UI/UX speed (application caching)
Well designed models/views
Optimization of the system (n-tier architecture, etc...)
Optimization of the process (good QA to reduce failures/bottlenecks from deployment)
Optimization of the subsystems (database, etc...)
Hardware
Framework internal optimization
Don't waste time with comparing framework speeds. Their advantage is in extensible code, smart architectures, etc...
On a side note, DO NOT NOT USE A FRAMEWORK FOR A NEW WEB APPLICATION. I'm sorry I can't say it loud enough, but it's an absolute requirement nowadays. It's not even a debate about not using one, just which one to use.
I personally chose Django, which is great. But I can't definitively knock the others out there.
It's possibly both. Django does have stuff for caching built-in, which would be worth trying. Regardless, any non-cached page will nearly always take longer than a static file. A file has to be read in both cases, and in the case of a dynamic page, it also must be executed. And then, in both cases, sent over to the client.
Definetely shared hosting is not the best choice to run heavy frameworks such as Django or CakePHP. If you can afford it, buy VPS.
As for performance, probably your host uses Python with mod-python, which is not recommended now. WSGI is preferred standard for Python powered webapps.