Clojure Web (HttpKit, Manifold) vs Elixir/Pheonix - clojure

I am currently weighing up the use of Elixir vs Clojure for running a web server to handle many concurrent Web socket connections. Now Elixir/Phoenix seems a natural fit for this and you see benchmarks demonstrating how far it scales (I doubt this has an bearing on real life load). But, most of our infrastructure is written in Clojure.
In our case, the websocket handler is almost completely independent from the rest of the existing code base.
So the question is - would you consider taking on another language/ecosystem because it is a better fit for a particular job? vs using a tool that is already a large part of your existing ecosystem.
And, does  Elixir/Phoenix exceed Clojure/JVM by a significant margin under real world loads for this kind of task to justify using it?

You may find this conversation helpful.
In summary, the "speed of your webserver" is not going to be the bottleneck in your development. Coding, testing, debugging, refactoring, and other human-related tasks are more important by a ratio of 100x or 1000x for just about any modern project.
Since you are already comfortable with Clojure and using it for most of your project, I would strongly suggest sticking with it rather than split your codebase into 2 incompatible parts.

Related

Client / server reactive sync in Clojure / Clojurescript

Is there an idiomatic way to do reactive data synchronization between browser and server with Clojure and Clojurescript? What are the pros and cons of one technique vs another?
Having used Meteor.js in the past this sort of reactive database sync is highly preferable to manually writing routes and polling for updates. A pub/sub system lets web developers write less boilerplate code to move data around. Clojure seems like it would be a natural fit for such a technique. I have been unable to determine if this is a solved problem in the clj/cljs ecosystem.
The short answer is "no".
I've used Meteor.js in production, and also looked for the same when getting into CLJ(S). The closest I'm aware of are:
Datsync
Uses Datomic and Datoms as the sync protocol
Seems to be alpha
Fulcro
Has help for optimistic updates, but not direct data sync like Meteor has with Minimongo, closer to Meteor methods (ie a client and server implementation of the same call)
Production-ready, widely used
Why hasn't a Meteor-killer hasn't been written in CLJ(S)?
It's hard to do (at all, performantly)
The company behind Meteor raised tens of millions of dollars over the last decade to work in this space and still didn't resolve all the scalability or semantic rough edges in the data sync design
The datastores Clojure people tend to use (Datomic or relational) likely make data sync even harder, and certainly quite different (it wouldn't be a port from Meteor's implementation).
Clojure developers tend to prefer assembling a system around the needs of a particular problem/solution/requirement rather than assembling a solution around a particular feature/framework (eg Mongo documents that sync full-stack).
People are still thinking about this though, both inside the Clojure community and out. Check out:
Baqend (and their blog comparing with Meteor.js, Firebase, RechinkDB)
"The Web After Tomorrow" (by the developer of DataScript)
Reactive Datalog
The only meaningful way to compare is to consider them both in your context. I remember giving up Meteor's magic felt significant at the time but haven't found myself wanting to go back. I see it as trading magic for simplicity and flexibility.

Internet connection speed vs. Programming language speed for HTTP Requests?

I know how to program in Python but I am also interested in learning C++. I have heard that it is much faster than python and for the programs I am writing currently, I would prefer them to run as quickly and efficiently as possible. I know that a lot of that comes from just writing good code but I was also wondering if using another language, such as C++, would help.
While I was pondering this, I realized that since most of my programs will be mainly using the internet (as in implementing Google APIs and using the information from them to submit data to other websites) then maybe the speed of the language doesn't matter if the speed of my internet connection is always going to be relatively the same. I have two ways I am connecting to the internet: Selenium (or some kind of automated browser) for things that require a browser, and just HTTP requests.
How much difference would I see between python and a different language even though the major focus of my programs is on the internet?
Thanks.
Scenarios
The main benefit you would get from using language that is compiled to machine code is that you can do lots of byte and bit-magic. Lets say, modifying image data, transforming audio, analysing indices of a genomic sequence database.
Typical tasks
Serving web-pages you typically have problems if a completely different sort: You will be loading a resource from hard disk, serve them directly if its an image or audio, or you will be executing different transformation steps on a text resource until it becomes the final HTML document. The latter will be using template engines, database queries, and so on.
If you look at that you can see that most of the things, say 90-99% are pretty high-level stuff -- in Python you will use an API that is optimized by many, many users for optimal performance (meaning: time and space). "Open a file" will be almost as fast in C as it is in Python, so is reading from it and serving it to some Socket. Transforming text data could be a bit faster in C++ then it is in Python, but... how fast does it have to be? A use is very likely willing to wait 200ms, isnt't he? And that is a lot of time for a nice high-level template engine to transform a bit of text.
What C++ and Python can do for you
A typical Python web-service is much faster to write and a easier to deploy then a server written in C++. If you would do it in C++ you firstly need to handle sockets and connections -- and for those would either use an existing library or write your own handling. If you use an existing library (which I strongly recommend) you are basically not doing anything differently then Python does. If you write your own handling, you have many, many low-level things you can do wrong that will burn the performance you wish for. No, that is not an option.
If you need speed, and Python and the server and template framework is not enough you should re-think your architectural approach. Then take a look at the c10k-problem and write tiny pieces in C. (Look at this c10k very hot topic, too) But I can not see many reasons not to use a high-level language like Python, if you are only looking for performance in a medium-complex web-serving application.
Summary: The difference
If you are just serving files from the hard-drive I guess your Python program would even be faster then your hand-crafted C++-server. If you use a framework written in C or C++ and just drop in your static pages, I guess you get a boost like 2-5fold against Python. Then again, if your web-application is a bit more complex then serving static content, I estimate that the difference will diminish very quickly and you will get 1-2fold speed gain at most.
It's not all about speed...
One note about another difference between C++ and Python one should not forget: Since C++ is really compiled and not as dynamic as Python you would gain a lot of static error analysis by using Python. Writing correct code is always difficult, but can be done in C++ and Python with good tests and static analysis -- the latter is simpler in C++ (my opinion). If that is an issue for you, you may think again, but you asked about speed.

Twisted Django Comet(Orbited): the interaction of upper and middle level

I'm developing a monitoring system(something like real-time web app). And the question is about system architecture.
Device connects to server and sends information about controlled parameters state. Sever should save information to Database and notify Comet server. Comet server sends message to user saying that new data avaliable. User gets new information.
What's the best way to analyze and save information(create alarm messages if needed) about device state:
Twisted app it self analyzes and interacts with DB(adbapi) and Comet server(Orbited).
Twisted pushes received data to Django(how to push?) and Django analyzes and saves data to DB and sends "NEW" flag to orbited.
Any Your suggestions, if there is a better way.
More information you can find on the pictures below:
This question is fairly open ended. Someone could probably write a dozen pages on each of the options you described, and that much again on a handful of other approaches as a bonus.
Instead of doing that, I'll take an alternate route.
Make sure you have a good understanding of your requirements. Think about which approach is going to be easiest for you (or for the developers on your team) to satisfy those requirements. Take that approach, documenting the overall idea and unit testing everything you write (preferably using TDD).
When you're done, you might not have the optimal solution, but you'll have a solution, and 99 times out of 100 that's indistinguishable from being optimal.
If I do think about your proposed approaches a little bit, then what mostly occurs to me is that they don't differ from each other very much. Your analysis is just some Python code somewhere that you're going to invoke. Whether you invoke it closer to some Twisted-using code or closer to some Django-using code doesn't seem to make a huge difference to the outcome. Perhaps some part of your requirements would make one approach better than the other. However, if you have unit tests and understand your requirements, then I expect you'll actually find it quite easy to switch between those two approaches.
After you've implemented something, you'll have a much better understanding of the trade-offs involved and you'll be in a better position to decide if one implementation is going to work better or worse than another.
Note that unit tests are a pretty essential part of this idea. Without them, you won't really know if you've implemented your requirements, you won't know if your functionality still works after any particular refactoring, and refactoring itself will be harder because your units will not be as well-defined and isolated as they would be if you were doing test-driven development.

Will web development in c++ cgi really a huge performance gain?

I'm asking the question after reading this article
http://stevehanov.ca/blog/index.php?id=95
Also isn't it a penalty to use cgi instead of fastcgi ?
Update: why some people do pretend like in answer "that you get 20-30% performance improvement" ? Is it pure guess or is this number coming from solid benchmark ? I have looked at HipHop performance is more in the scale of 10 times.
I've done webdev in a few languages and frameworks, including python, php, and perl. I host them myself and my biggest sites get around 20k hits a day.
Any language and framework that has reasonable speed can be scaled up to take 20k hits a day just by throwing resources at it. Some take more resources than others. (Plone, Joomla. I'm looking at you).
My Witty sites (none in production yet) take a lot more (from memory around 5000% more) pounding (using seige) than for example my python sites. Ie. When I hit them as hard as I can with seige, the witty sites serve a lot more pages per second.
I know it's not a true general test though.
Other speed advantages that witty gives you:
Multi threading
If you deploy with the built in websrever (behind ha-proxy for example) and have your app be multi-threaded .. it'll load a lot less memory than say a perl or php app.
Generally with php and perl apps, you'll have Apache fire up a process for each incoming connection, and each process loads the whole php interpreter, all the code and variables and objects and what not. With heavy frameworks like Joomla and Wordpress (depending on the number of plugins), each process can get pretyy humungous on memory consumption.
With the Wt app, each session loads a WApplication instance (a C++ object) and it's whole tree of widgets and stuff. But the memory the code uses stays the same, no matter how many connections.
The inbuilt Web2.0 ness
Generally with traditional apps, they're still built around the old 'http request comes in' .. 'we serve a page' .. 'done' style of things. I know they are adding more and more AJAXy kind of thigns all the time.
With Wt, it defaults to using WebSockets where possible, to only update the part of the page that needs updating. It falls back to standard AJAX, then if that's not supported http requests. With the AJAX and WebSockets enabled clients, the same WApplication C++ object is continually used .. so no speed is lost in setting up a new session and all that.
In response to the 'C++ is too hard for webdev'
C++ does have a bit of a learning curve. In the mid nineties we did websites in Java j2ee. That was considered commercially viable back then, and was a super duper pain to develop in, but it did have a good advantage of encouraging good documentation and coding practices.
With scripting websites, it's easy to take shortcuts and not realize they're there. For example one 8 year old perl site I worked on had some code duplicated and nobody noticed. Each time it showed a list of products, it was running the same SQL query twice.
With a C++ site, I think it'd have less chance because, in the perl site, there wasn't that much programming structure (like functions), it was just perl and embedded html. In C++ you'd likely have methods with names and end up with a name clash.
Types
One time, there was a method that took an int identifier, later on we changed it to a uuid string. The Python code was great, we didn't think we needed to change it; it ran fine. However there was little line buried deep down that had a different effect when you passed it a string. Very hard to track down bug, corrupted the database. (Luckily only on dev and test machines).
C++ would have certainly complained a lot, and forced us to re-write the functions involved and not be lazy buggers.
With C++ and Java, the compiler errors and warns a lot of those sorts of mistakes for you.
I find unit testing is generally not as completely necessary with C++ apps (don't shoot me), compared to scripting language apps. This is due to the language enforcing a lot of stuff that you'd normally put in a unit test for say a python app.
Summary
From my experience so far .. Wt does take longer to develop stuff in than existing frameworks .. mainly because the existing frameworks have a lot more out of the box stuff there. However it is easier to make extremely customized apps in Wt than say Wordpress imho.
From people I've spoken with who've moved from PHP to Wt (a C++ web framework) reported significant improvements. From the small applications I've created using Wt to learn it, I've seen it run faster than the same PHP type applications I created. Take the information for what you will, but I'm sold.
This reminds me how 20-30 years ago people were putting Assembly vs C, and then 10-20 years ago C vs C++. Of course C++ will be faster than PHP/Rails but it'll take 5x more effort to build maintainable and scalable application.
The point is that you get 20-30% performance improvement while sacrificing your development resources. Would you rather have you app work 30% faster or have 1/2 of the features implemented?
Most web applications are network-bound instead of processor-bound. Writing your application in C++ instead of a higher-level language doesn't make much sense unless you're doing really heavy computation. Also, writing correct C++ programs is difficult. It will take longer to write the application and it is more likely that the program will fail in spectacular ways due to misused pointers, memory errors, undefined behavior, etc. In general, I would say it is not worth it.
Whenever you eliminate a layer of interpretive or OS abstraction, you are bound to get some performance gain. That being said, the language or technology itself does not automatically mean all your problems are solved. I've fixed C++ code that took many hours to process a relatively simple set of records. The problem was in the implementation, and the fix was not related to the language's features or limitations.
Assuming things are all implemented correctly, you're sure to get better performance. The problem will be in finding the bugs. One of the problems with C++ is that many developers are currently "trained" or accustomed to having a lot of details related to memory management behind objects. This eliminates the need to consider things like, "What can happen if I pass this pointer around to several threads?" Sometimes it works well, but not always. You still have some subtleties of the language that you need to consider regardless of how the objects hide the nasty details.
In my experience, you'll need several seasoned C++ developers watching over the code to be able to keep the bugs and memory leaks from getting out of hand.
I'm certainly not sold on this. If you want a performance gain over PHP why not use a Java (or better yet Scala) framework? These are much better for web development, have nice, relatively easy to use frameworks and avoid a lot of the headaches of C++. I've always seen one of the main pluses of web-development (and most modern non-scientific/high performance applications) as being able to avoid the headaches that come along with C/C++ development.

Node.js or Erlang

I really like these tools when it comes to the concurrency level it can handle.
Erlang/OTP looks like much more stable solution but requires much more learning and a lot of diving into functional language paradigm. And it looks like Erlang/OTP makes it much better when it comes to multi-core CPUs (correct me if I am wrong).
But which should I choose? Which one is better in the short and long term perspective?
My goal is to learn a tool which makes scaling my Web projects under high load easier than traditional languages.
I would give Erlang a try. Even though it will be a steeper learning curve, you will get more out of it since you will be learning a functional programming language. Also, since Erlang is specifically designed to create reliable, highly concurrent systems, you will learn plenty about creating highly scalable services at the same time.
I can't speak for Erlang, but a few things that haven't been mentioned about node:
Node uses Google's V8 engine to actually compile javascript into machine code. So node is actually pretty fast. So that's on top of the speed benefits offered by event-driven programming and non-blocking io.
Node has a pretty active community. Hop onto their IRC group on freenode and you'll see what I mean
I've noticed the above comments push Erlang on the basis that it will be useful to learn a functional programming language. While I agree it's important to expand your skillset and get one of those under your belt, you shouldn't base a project on the fact that you want to learn a new programming style
On the other hand, Javascript is already in a paradigm you feel comfortable writing in! Plus it's javascript, so when you write client side code it will look and feel consistent.
node's community has already pumped out tons of modules! There are modules for redis, mongodb, couch, and what have you. Another good module to look into is Express (think Sinatra for node)
Check out the video on yahoo's blog by Ryan Dahl, the guy who actually wrote node. I think that will help give you a better idea where node is at, and where it's going.
Keep in mind that node still is in late development stages, and so has been undergoing quite a few changes—changes that have broke earlier code. However, supposedly it's at a point where you can expect the API not to change too much more. So if you're looking for something fun, I'd say node is a great choice.
I'm a long-time Erlang programmer, and this question prompted me to take a look at node.js. It looks pretty damn good.
It does appear that you need to spawn multiple processes to take advantage of multiple cores. I can't see anything about setting processor affinity though. You could use taskset on linux, but it probably should be parametrized and set in the program.
I also noticed that the platform support might be a little weaker. Specifically, it looks like you would need to run under Cygwin for Windows support.
Looks good though.
Edit
Node.js now has native support for Windows.
I'm looking at the same two alternatives you are, gotts, for multiple projects.
So far, the best razor I've come up with to decide between them for a given project is whether I need to use Javascript. One existing system I'm looking to migrate is already written in Javascript, so its next version is likely to be done in node.js. Other projects will be done in some Erlang web framework because there is no existing code base to migrate.
Another consideration is that Erlang scales well beyond just multiple cores, it can scale to a whole datacenter. I don't see a built-in mechanism in node.js that lets me send another JS process a message without caring which machine it is on, but that's built right into Erlang at the lowest levels. If your problem isn't big enough to need multiple machines or if it doesn't require multiple cooperating processes, this advantage isn't likely to matter, so you should ignore it.
Erlang is indeed a deep pool to dive into. I would suggest writing a standalone functional program first before you start building web apps. An even easier first step, since you seem comfortable with Javascript, is to try programming JS in a more functional style. If you use jQuery or Prototype, you've already started down this path. Try bouncing between pure functional programming in Erlang or one of its kin (Haskell, F#, Scala...) and functional JS.
Once you're comfortable with functional programming, seek out one of the many Erlang web frameworks; you probably shouldn't be writing your app directly to something low-level like inets at this late stage. Look at something like Nitrogen, for instance.
While I'd personally go for Erlang, I'll admit that I'm a little biased against JavaScript. My advice is that you evaluate few points:
Are you reusing existing code in either of those languages (both in terms of source code, and programmer experience!)
Do you need/want on-the-fly updates without stopping the application (This is where Erlang wins by default - its runtime was designed for that case, and OTP contains all the tools necessary)
How big is the expected traffic, in terms of separate, concurrent operations, not bandwidth?
How "parallel" are the operations you do for each request?
Erlang has really fine-tuned concurrency & network-transparent parallel distributed system. Depending on what exactly is the project, the availability of a mature implementation of such system might outweigh any issues regarding learning a new language. There are also two other languages that work on Erlang VM which you can use, the Ruby/Python-like Reia and Lisp-Flavored Erlang.
Yet another option is to use both, especially with Erlang being used as kind of "hub". I'm unsure if Node.js has Foreign Function Interface system, but if it has, Erlang has C library for external processes to interface with the system just like any other Erlang process.
It looks like Erlang performs better for deployment in a relatively low-end server (512MB 4-core 2.4GHz AMD VM). This is from SyncPad's experience of comparing Erlang vs Node.js implementations of their virtual whiteboard server application.
There is one more language on the same VM that erlang is -> Elixir
It's a very interesting alternative to Erlang, check this one out.
Also it has a fast-growing web framework based on it-> Phoenix Framework
whatsapp could never achieve the level of scalability and reliability without erlang https://www.youtube.com/watch?v=c12cYAUTXXs
I will Prefer Erlang over Node.
If you want concurrency, Node can be substituted by Erlang or Golang because of their light weight processes.
Erlang is not easy to learn so requires a lot of effort but its community is active so can get help from that, this is only the reason why people prefer Node .