node.js concurrency - concurrency

I'm new in node.js. I'm testing socket.io for real time messaging. I love it and I want to use. I have a question. How many concurrency can run in Node.js server ? Our program will be approximately 100 concurrency. So, I'm worry about that.
I found another real time messaging server , APE. Which one is better ? I love node.js because it's easy to learn and easy to write. But I couldn't find discussion about concurrency in node.js server. My friend company is using APE and it can control around 2000. So, I want to know about node.js server.

Without having any benchmarks to back this up--since both are event-driven (i.e. epoll on Linux), I would imagine that you'll see comparable results for both (at least 10K concurrent users). That being said, performance is likely to be much more affected by the frequency of messages than the number of concurrent connections, since that's where the implementations really differ.
For a real-world example and discussion about node.js Comet performance, see Amir Salihefendic's excellent blog post here: http://amix.dk/blog/post/19577 (you can follow the links in that post to other posts that are fantastic as well).
Notice that one of the versions he wrote was in C using libevent (epoll) which is what APE uses as well. Also, note that APE's website claims that it can handle over 100,000 concurrent users.
If you really want to understand the problems associated, you might find the famous "C10K problem" article interesting (do a google search for "C10K problem").
In the end, it probably comes down to how many requests per second you expect, and how many machines you have, and which language you prefer to code in. If you're only expecting around 100 concurrent users, I think you'll be just fine using any platform you want. That being said, I highly recommend using node.js--just for sheer enjoyment if nothing else. :-)

Related

Client / server reactive sync in Clojure / Clojurescript

Is there an idiomatic way to do reactive data synchronization between browser and server with Clojure and Clojurescript? What are the pros and cons of one technique vs another?
Having used Meteor.js in the past this sort of reactive database sync is highly preferable to manually writing routes and polling for updates. A pub/sub system lets web developers write less boilerplate code to move data around. Clojure seems like it would be a natural fit for such a technique. I have been unable to determine if this is a solved problem in the clj/cljs ecosystem.
The short answer is "no".
I've used Meteor.js in production, and also looked for the same when getting into CLJ(S). The closest I'm aware of are:
Datsync
Uses Datomic and Datoms as the sync protocol
Seems to be alpha
Fulcro
Has help for optimistic updates, but not direct data sync like Meteor has with Minimongo, closer to Meteor methods (ie a client and server implementation of the same call)
Production-ready, widely used
Why hasn't a Meteor-killer hasn't been written in CLJ(S)?
It's hard to do (at all, performantly)
The company behind Meteor raised tens of millions of dollars over the last decade to work in this space and still didn't resolve all the scalability or semantic rough edges in the data sync design
The datastores Clojure people tend to use (Datomic or relational) likely make data sync even harder, and certainly quite different (it wouldn't be a port from Meteor's implementation).
Clojure developers tend to prefer assembling a system around the needs of a particular problem/solution/requirement rather than assembling a solution around a particular feature/framework (eg Mongo documents that sync full-stack).
People are still thinking about this though, both inside the Clojure community and out. Check out:
Baqend (and their blog comparing with Meteor.js, Firebase, RechinkDB)
"The Web After Tomorrow" (by the developer of DataScript)
Reactive Datalog
The only meaningful way to compare is to consider them both in your context. I remember giving up Meteor's magic felt significant at the time but haven't found myself wanting to go back. I see it as trading magic for simplicity and flexibility.

Will web development in c++ cgi really a huge performance gain?

I'm asking the question after reading this article
http://stevehanov.ca/blog/index.php?id=95
Also isn't it a penalty to use cgi instead of fastcgi ?
Update: why some people do pretend like in answer "that you get 20-30% performance improvement" ? Is it pure guess or is this number coming from solid benchmark ? I have looked at HipHop performance is more in the scale of 10 times.
I've done webdev in a few languages and frameworks, including python, php, and perl. I host them myself and my biggest sites get around 20k hits a day.
Any language and framework that has reasonable speed can be scaled up to take 20k hits a day just by throwing resources at it. Some take more resources than others. (Plone, Joomla. I'm looking at you).
My Witty sites (none in production yet) take a lot more (from memory around 5000% more) pounding (using seige) than for example my python sites. Ie. When I hit them as hard as I can with seige, the witty sites serve a lot more pages per second.
I know it's not a true general test though.
Other speed advantages that witty gives you:
Multi threading
If you deploy with the built in websrever (behind ha-proxy for example) and have your app be multi-threaded .. it'll load a lot less memory than say a perl or php app.
Generally with php and perl apps, you'll have Apache fire up a process for each incoming connection, and each process loads the whole php interpreter, all the code and variables and objects and what not. With heavy frameworks like Joomla and Wordpress (depending on the number of plugins), each process can get pretyy humungous on memory consumption.
With the Wt app, each session loads a WApplication instance (a C++ object) and it's whole tree of widgets and stuff. But the memory the code uses stays the same, no matter how many connections.
The inbuilt Web2.0 ness
Generally with traditional apps, they're still built around the old 'http request comes in' .. 'we serve a page' .. 'done' style of things. I know they are adding more and more AJAXy kind of thigns all the time.
With Wt, it defaults to using WebSockets where possible, to only update the part of the page that needs updating. It falls back to standard AJAX, then if that's not supported http requests. With the AJAX and WebSockets enabled clients, the same WApplication C++ object is continually used .. so no speed is lost in setting up a new session and all that.
In response to the 'C++ is too hard for webdev'
C++ does have a bit of a learning curve. In the mid nineties we did websites in Java j2ee. That was considered commercially viable back then, and was a super duper pain to develop in, but it did have a good advantage of encouraging good documentation and coding practices.
With scripting websites, it's easy to take shortcuts and not realize they're there. For example one 8 year old perl site I worked on had some code duplicated and nobody noticed. Each time it showed a list of products, it was running the same SQL query twice.
With a C++ site, I think it'd have less chance because, in the perl site, there wasn't that much programming structure (like functions), it was just perl and embedded html. In C++ you'd likely have methods with names and end up with a name clash.
Types
One time, there was a method that took an int identifier, later on we changed it to a uuid string. The Python code was great, we didn't think we needed to change it; it ran fine. However there was little line buried deep down that had a different effect when you passed it a string. Very hard to track down bug, corrupted the database. (Luckily only on dev and test machines).
C++ would have certainly complained a lot, and forced us to re-write the functions involved and not be lazy buggers.
With C++ and Java, the compiler errors and warns a lot of those sorts of mistakes for you.
I find unit testing is generally not as completely necessary with C++ apps (don't shoot me), compared to scripting language apps. This is due to the language enforcing a lot of stuff that you'd normally put in a unit test for say a python app.
Summary
From my experience so far .. Wt does take longer to develop stuff in than existing frameworks .. mainly because the existing frameworks have a lot more out of the box stuff there. However it is easier to make extremely customized apps in Wt than say Wordpress imho.
From people I've spoken with who've moved from PHP to Wt (a C++ web framework) reported significant improvements. From the small applications I've created using Wt to learn it, I've seen it run faster than the same PHP type applications I created. Take the information for what you will, but I'm sold.
This reminds me how 20-30 years ago people were putting Assembly vs C, and then 10-20 years ago C vs C++. Of course C++ will be faster than PHP/Rails but it'll take 5x more effort to build maintainable and scalable application.
The point is that you get 20-30% performance improvement while sacrificing your development resources. Would you rather have you app work 30% faster or have 1/2 of the features implemented?
Most web applications are network-bound instead of processor-bound. Writing your application in C++ instead of a higher-level language doesn't make much sense unless you're doing really heavy computation. Also, writing correct C++ programs is difficult. It will take longer to write the application and it is more likely that the program will fail in spectacular ways due to misused pointers, memory errors, undefined behavior, etc. In general, I would say it is not worth it.
Whenever you eliminate a layer of interpretive or OS abstraction, you are bound to get some performance gain. That being said, the language or technology itself does not automatically mean all your problems are solved. I've fixed C++ code that took many hours to process a relatively simple set of records. The problem was in the implementation, and the fix was not related to the language's features or limitations.
Assuming things are all implemented correctly, you're sure to get better performance. The problem will be in finding the bugs. One of the problems with C++ is that many developers are currently "trained" or accustomed to having a lot of details related to memory management behind objects. This eliminates the need to consider things like, "What can happen if I pass this pointer around to several threads?" Sometimes it works well, but not always. You still have some subtleties of the language that you need to consider regardless of how the objects hide the nasty details.
In my experience, you'll need several seasoned C++ developers watching over the code to be able to keep the bugs and memory leaks from getting out of hand.
I'm certainly not sold on this. If you want a performance gain over PHP why not use a Java (or better yet Scala) framework? These are much better for web development, have nice, relatively easy to use frameworks and avoid a lot of the headaches of C++. I've always seen one of the main pluses of web-development (and most modern non-scientific/high performance applications) as being able to avoid the headaches that come along with C/C++ development.

C++ Server-Side-Scripting

For once, I have come across a lot of stuff about the use of C++ being not advisable for SSS and recommending the use of so called interpreted languages like PERL and PHP for the same. But I need the advanced OO features and flexibility of C++ to ensure a scalable and more manageable code.
I have tried many internet articles and searches and none where helpful to the point that I still have no idea if it is possible to write SS-Scripts in C++ and if yes, then how.
I have thought of couple ideas, including writing a web-server in C++ and responding accordingly after parsing the HTTP request. But it would be re-inventing the wheel and I'll end up deviating from my main project and dedicating a lot of work to ensure a functional-cum-secure HTTP server.
I have also considered PHP extensions but again the approach also comes with its own baggage and overhead.
My questions are:
Is it possible to program SSS in C++?
If yes, then what are the approaches at my disposal.
Thanks!
Ignoring, for the moment, the advisability of using C++ for SSS, your first choice would probably be Wt. Contrary to the implications in some of the other answers, no development time is not likely to increase by 10x (or anywhere close to it). No, you're not missing all the nice infrastructure features you'd expect in things like PHP, Perl or Python either.
In fact, my own experience is rather the opposite: while PHP (for example) makes it pretty easy to get a web site up and running fairly quickly, producing a web site that's really stable, secure, and responsive is a whole different story. With Wt, rather the opposite seems to be the case (at least in my, admittedly limited, experience). Getting the initial site up and running will probably take a little longer -- but about as soon as it looks, acts, and feels the way you want, it's likely to need only rather minor tweaks to be ready for public use.
Getting back to the advisability question: developing in C++ may be a bit more complex than in some languages that are more common in the SSS market -- but it's still a piece of cake compared to doing security well. If somebody has even the slightest difficulty writing C++ (e.g., tracking and freeing memory when it's no longer needed), I definitely don't want them getting close to the code for my web site.
I wouldn't recommend it, but you can certainly write CGI scripts in C++ (or in C, or in FORTRAN). But why bother? Languages like PHP do a much better job more easily, and they seem to scale well for some pretty major sites.
CGI is the "standard" way to have C or C++ code handling web requests, but you might also look into writing a module that gets linked into the web server at runtime. Google for "apache module API" (if using Apache) or "IIS module" (if using IIS).
Can you afford 10x as much development time? All the infrastructure-ish bits that you take for granted in php, perl, python are non existent or much harder to use in C++.
I see only two valid reasons to do this:
1. You only have C++ on your platform.
2. The server really has very high performance needs that would benefit from problem specific optimizations.
You can write a CGI application in C++ using an appropriate framework (like this one). But I'd recommend just going with perl or php. It will save you much time. Those tools are just better suited for this kind of job.
EDIT: corrected the link
I couldn't understand your exact requirements (license, etc) but this might be what you are looking for http://cppcms.sourceforge.net.

Node.js or Erlang

I really like these tools when it comes to the concurrency level it can handle.
Erlang/OTP looks like much more stable solution but requires much more learning and a lot of diving into functional language paradigm. And it looks like Erlang/OTP makes it much better when it comes to multi-core CPUs (correct me if I am wrong).
But which should I choose? Which one is better in the short and long term perspective?
My goal is to learn a tool which makes scaling my Web projects under high load easier than traditional languages.
I would give Erlang a try. Even though it will be a steeper learning curve, you will get more out of it since you will be learning a functional programming language. Also, since Erlang is specifically designed to create reliable, highly concurrent systems, you will learn plenty about creating highly scalable services at the same time.
I can't speak for Erlang, but a few things that haven't been mentioned about node:
Node uses Google's V8 engine to actually compile javascript into machine code. So node is actually pretty fast. So that's on top of the speed benefits offered by event-driven programming and non-blocking io.
Node has a pretty active community. Hop onto their IRC group on freenode and you'll see what I mean
I've noticed the above comments push Erlang on the basis that it will be useful to learn a functional programming language. While I agree it's important to expand your skillset and get one of those under your belt, you shouldn't base a project on the fact that you want to learn a new programming style
On the other hand, Javascript is already in a paradigm you feel comfortable writing in! Plus it's javascript, so when you write client side code it will look and feel consistent.
node's community has already pumped out tons of modules! There are modules for redis, mongodb, couch, and what have you. Another good module to look into is Express (think Sinatra for node)
Check out the video on yahoo's blog by Ryan Dahl, the guy who actually wrote node. I think that will help give you a better idea where node is at, and where it's going.
Keep in mind that node still is in late development stages, and so has been undergoing quite a few changes—changes that have broke earlier code. However, supposedly it's at a point where you can expect the API not to change too much more. So if you're looking for something fun, I'd say node is a great choice.
I'm a long-time Erlang programmer, and this question prompted me to take a look at node.js. It looks pretty damn good.
It does appear that you need to spawn multiple processes to take advantage of multiple cores. I can't see anything about setting processor affinity though. You could use taskset on linux, but it probably should be parametrized and set in the program.
I also noticed that the platform support might be a little weaker. Specifically, it looks like you would need to run under Cygwin for Windows support.
Looks good though.
Edit
Node.js now has native support for Windows.
I'm looking at the same two alternatives you are, gotts, for multiple projects.
So far, the best razor I've come up with to decide between them for a given project is whether I need to use Javascript. One existing system I'm looking to migrate is already written in Javascript, so its next version is likely to be done in node.js. Other projects will be done in some Erlang web framework because there is no existing code base to migrate.
Another consideration is that Erlang scales well beyond just multiple cores, it can scale to a whole datacenter. I don't see a built-in mechanism in node.js that lets me send another JS process a message without caring which machine it is on, but that's built right into Erlang at the lowest levels. If your problem isn't big enough to need multiple machines or if it doesn't require multiple cooperating processes, this advantage isn't likely to matter, so you should ignore it.
Erlang is indeed a deep pool to dive into. I would suggest writing a standalone functional program first before you start building web apps. An even easier first step, since you seem comfortable with Javascript, is to try programming JS in a more functional style. If you use jQuery or Prototype, you've already started down this path. Try bouncing between pure functional programming in Erlang or one of its kin (Haskell, F#, Scala...) and functional JS.
Once you're comfortable with functional programming, seek out one of the many Erlang web frameworks; you probably shouldn't be writing your app directly to something low-level like inets at this late stage. Look at something like Nitrogen, for instance.
While I'd personally go for Erlang, I'll admit that I'm a little biased against JavaScript. My advice is that you evaluate few points:
Are you reusing existing code in either of those languages (both in terms of source code, and programmer experience!)
Do you need/want on-the-fly updates without stopping the application (This is where Erlang wins by default - its runtime was designed for that case, and OTP contains all the tools necessary)
How big is the expected traffic, in terms of separate, concurrent operations, not bandwidth?
How "parallel" are the operations you do for each request?
Erlang has really fine-tuned concurrency & network-transparent parallel distributed system. Depending on what exactly is the project, the availability of a mature implementation of such system might outweigh any issues regarding learning a new language. There are also two other languages that work on Erlang VM which you can use, the Ruby/Python-like Reia and Lisp-Flavored Erlang.
Yet another option is to use both, especially with Erlang being used as kind of "hub". I'm unsure if Node.js has Foreign Function Interface system, but if it has, Erlang has C library for external processes to interface with the system just like any other Erlang process.
It looks like Erlang performs better for deployment in a relatively low-end server (512MB 4-core 2.4GHz AMD VM). This is from SyncPad's experience of comparing Erlang vs Node.js implementations of their virtual whiteboard server application.
There is one more language on the same VM that erlang is -> Elixir
It's a very interesting alternative to Erlang, check this one out.
Also it has a fast-growing web framework based on it-> Phoenix Framework
whatsapp could never achieve the level of scalability and reliability without erlang https://www.youtube.com/watch?v=c12cYAUTXXs
I will Prefer Erlang over Node.
If you want concurrency, Node can be substituted by Erlang or Golang because of their light weight processes.
Erlang is not easy to learn so requires a lot of effort but its community is active so can get help from that, this is only the reason why people prefer Node .

Practical SOA for a newbie

I am a total newbie to the world of SOA. As such, I am looking at some "SOA frameworks/technologies", and trying to understand how to utilize them to build a highly scalable (Facebook class) website.
There are several "pains" I am trying to solve here:
Composability (+ managing dependencies, Pub/Sub)
Language-independence of services
Scalability & Performance
High availability
I looked into some technologies that answer a subset of the above criteria:
Thrift - Facebook's cross-platform RPC platform
WCF - supports SOAP, JSON & REST, so it can be considered language-interoperable. Generates WSDL files that can be use to generate java proxies.
Microsoft DSS - just inclued it in my survey, but it doesn't seem relevant as it is highly state-driven and .NET specific.
Web Services
Now, I understand how I get some aspects of composability and language-independence out of the above. But, I haven't found much concrete information (not buzz) about how to use the above / other tools for scalability and high availability. So finally I get to my question:
How does one leverage SOA technologies to solve the pains I defined above? Where can I find technical guides for that? I am looking for more than just system diagrams, but rather actual libraries, code samples, APIS...
I think the question has more to do with the concepts involved than the tools. Answers to the items:
Understand and internalize bounded context. Keeping unrelated pieces separate its important to get real reuse on different services. Related technologies won't help you on this, its you that separate the app in appropriate contexts, which you can appropriately reuse for different services.
Clear endpoint for communication based on known protocols allows implementing different pieces with different technologies
Having the operations flow like independent actions based on different protocols, gives you a lot of places where you can add tiers. Is a particular sub-process of the overall process using lots of resources and the server can't take it anymore, just move to a separate server. Load kept growing, and that server isn't taking it anymore, add an additional server and load balance. You also have more opportunity to use caching and connection pooling.
Have a critical sub-process that needs to be available all the time, add a server so you can have a fail over. Have an overall process that needs to be "available" all the time, use queues for pieces that can be processed later.
Do you really need to support that type of load? Set appropriate performance/load/interoperability targets that relate to the specific scenario. If you really need to support that type of load, I recommend you get someone on board who has dealt with it.
If it is something for a load that might eventually be, identify the bounded contexts and design the interaction between those with a SOA mindset. Keeping the code clean is all you have to do for the rest, use TDD, loose coupling, focused integration tests, etc in your code base. With good code, if you later need to separate pieces of the system, it will be a lot easier.
There are interesting and relevant things said about services and architecture by Amazon's CTO - Werner Vogels:
An interview about the Amazon technology platform
A post on his blog - "Eventually Consistent - Revisited"
A 50 min presentation about Availability & Consistency
IDesign.net has a bunch of great downloads for WCF.
Worth a look at: http://www.manning.com/davis/ ?
Check out the Mule project which bundles the CXF services stack and also the Mule REST pack which provides RESTful alternatives. I think you'll see it addresses all of your pains and there are lots of examples in the documentation as well as the distribution.
May I recommend the book: Enterprise Service Oriented Architectures published by Springer Verlag.
All of the advice here is well and good, but don't worry about it until you really need to.
Focus on building a usable, functional application that people really like. When you start running into problems, then start handling bottlenecks.
You will never be able to foresee every way an application will fail, so how can you tell if [[insert tech here]] is your answer?